Lexurgy Notes
Lexurgy is a feature-rich sound change applier developed by Graham Hill with an easy-to-read syntax. Although it has detailed documentation for writing sound changes, there’s not much literature on usage and setup. That’s what I’m changing with this README.
Lexurgy consists of 2 components, a CLI+API repository and a Next.js web app. I will go through how to setup in this document.
Obtaining Source Code
Build
This code requires Java and Kotlin to compile. I have tested this on a few versions of Java and only Java 11 and 17 seem to work.
Now that you are done, run ./gradlew
or ./gradlew.bat
depending on your OS. You’ll know which one works by the error message.
./gradlew build # Linux or MacOS
# OR
./gradlew.bat build # Windows
Running
If the build functions, then running the resulting code will work.
# List all tasks
./gradlew tasks --all
# Run lexurgy
./gradlew cli:run
# Run lexurgy api
./gradlew api:run
Packaging
Although lexurgy works as-is right now, you might want to package the build files eventually.
# Package CLI
./gradlew cli:installDist
cd cli/build/install
tar -czf lexurgy-cli.tar.gz lexurgy
# Package API
./gradlew api:installDist
cd api/build/install
tar -czf lexurgy-api.tar.gz api
CLI
The Lexurgy CLI can be used to run sound changes offline. This is especially helpful when you have a large number of words and a complex set of sound changes, as this can significantly speed up processing and avoid timeouts on the official Lexurgy website. There is already documentation on the command line arguments for the CLI app by the original author himself, so I’m only going to provide example usage of the app based on my own use.
./gradlew build # Build the CLI
./gradlew installDist # Create the shell script entry point
# Run a sound change with intermediate romaniser output
# ps2pd.lsc has one intermediate romaniser, so the output files should be:
# ps2pd_ev.wli, ps2pd_ev.wlm, ps2pd_pd.wli
cli/build/install/lexurgy/bin/lexurgy sc -m ps2pd.lsc ps2pd.wli
API
Lexurgy supplies a backend API that can be used to run sound changes for a front end and can be accessed via some fancy POST requests. The source code for the API can be found in api/
directory of the Lexurgy repository.
By default, the API runs at http://0.0.0.0:8080
.
When running the API, the following environment variables must be set so that the server does not exit with errors: API_KEY
, SINGLE_STEP_TIMEOUT
, REQUEST_TIMEOUT
and TOTAL_TIMEOUT
. Example values are shown below.
# Run API
cd lexurgy # Go to the lexurgy repository
API_KEY=exampleApiKey SINGLE_STEP_TIMEOUT=1 REQUEST_TIMEOUT=5 TOTAL_TIMEOUT=60 gradle api:run
Alternatively, you can run the API in a docker container. The api/docker-local.sh
script supplies default environment variables which you can edit.
cd lexurgy
cd api
bash docker-local.sh
Routes
/scv1
/scv1
allows you to run sound changes on a bunch of words. It takes in a set of sound changes and input words and returns the resulting words.
The data that is posted is a JSON object with the following fields:
interface ScV1Request {
// Sound change file as one single string
changes: string;
// List of words, can be dumped from '*.wli' with lines split
inputWords: string[];
// List of words to trace
// All changes that were applied to the word show up in the response
traceWords: string[] = [];
// Start at this rule
startAt?: string = null;
// Stop before this rule
stopBefore?: string = null;
// Allow polling the API server, for example, if it's taking a while
allowPolling: boolean = false;
}
The output JSON has the following type signature (assuming response 200 and no parsing errors):
interface ScV1Response {
// Names of the rule lists
ruleNames: string[];
// List of output words
// The index of the input word should match that of its corresponding output word
// For example, the 100th word should correspond to the 100th output word
outputWords: string[];
// Output of each intermediate romaniser,
// where <index> is the name of each intermediate romaniser
// Each intermediate romaniser will have a list of words for its output
intermediateWords: {[index: string]: string[]};
// The sound changes that were applied to each word in traceWords
// For this field, <index> is the input word,
// and each input word has a list of sound changes applied to it
// and the corresponding output of each sound change applied
// Note that sound changes that did not apply will not appear in the list
traces?: {[index: string]: {rule: string, output: string}[]};
}
Here is an example using Python:
import requests
from pathlib import Path
response = requests.post(
url="http://0.0.0.0:8080/scv1",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": "exampleApiKey"
},
json={
"changes": Path(changes_fp).read_text(),
"inputWords": Path(inputWords_fp).read_text().split("\n"),
"traceWords": [],
"startAt": None,
"stopAt": None,
"allowPolling": True
}
)
print(response.json())
While it’s possible to use command line tools like curl
to send requests to the API server, trying make multi-line strings work in bash is a bit of a nightmare so I prefer just using Python instead.
CHANGES="$(<ps2pd.lsc)"
# Convert file to JSON array
# https://stackoverflow.com/a/28006220
WORDS="$(jq -R -s -c 'split("\n")' < ps2pd.wli)"
DATA="
{
\"changes\": \"$CHANGES\",
\"inputWords\": $WORDS,
\"traceWords\": [],
\"startAt\": null,
\"stopBefore\": null,
\"allowPolling\": true
}
"
curl \
--header "Content-Type: application/json" \
--header "Authorization: exampleApiKey" \
--request POST \
--data "$DATA" \
http://0.0.0.0:8080/scv1
/inflectv1
/inflectv1
seems to be a prototype of some inflection applier. The only evidence of its existence is in the source code as it is not reflected anywhere on the official Lexurgy website and documentation. It is extremely basic, as the only actions that it seems to be capable of are replacing the entire stem with a fixed form or making multiple copies of the original stem.
As with /scv1
, the only way to access this route is through a POST request. The JSON object should conform to the following:
interface InflectV1Request {
// list of rules to apply
rules: RequestCategoryTree;
// List of stems and their corresponding categories
// Examples of categories include but are not limited to parts of speech,
// grammatical gender or declension class
stemsAndCategories: {stem: string, categories: string[]}[];
}
// RequestCategoryTree decides how the stem should be inflected
type RequestCategoryTree = RequestForm | RequestFormulaForm | RequestCategorySplit;
// Replace entire stem with the word in <form>
interface RequestForm {
"type": "form";
form: string;
}
// List of sound changes to apply based on <formula>
interface RequestFormulaForm {
"type": "formula";
formula: RequestFormula;
}
// Allows the inflection engine to apply different changes based on the
// category of the word
interface RequestCategorySplit {
"type": "split";
branches: {[index: string]: RequestCategoryTree};
}
// RequestFormula are the atomic "sound changes" made by RequestFormulaForm
type RequestFormula = RequestStem | RequestConcat;
// One copy of the original stem
interface RequestStem {
"type": "stem";
}
// Concatenates parts of the word according to the order in <parts>
interface RequestConcat {
"type": "concat";
parts: RequestFormula[];
}
The output JSON looks like:
interface InflectV1Response {
// The resultant inflected forms, one for each stem under
// stemsAndCategories
inflectedForms: string[]
}
Here are a few examples in Python:
import requests
response = requests.post(
url="http://0.0.0.0:8080/inflectv1",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": "exampleApiKey"
},
json={
"rules": {"type": "form", "form": "replacement"},
"stemsAndCategories": [
{"stem": "cat", "categories": ["noun"]},
{"stem": "dog", "categories": ["noun"]}
]
}
)
print(response.json())
# {"inflectedForms": ["replacement", "replacement"]}
response = requests.post(
url="http://0.0.0.0:8080/inflectv1",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": "exampleApiKey"
},
json={
"rules": {
"type": "formula",
"formula": {
"type": "concat",
"parts": [
{"type": "stem"},
{"type": "concat", "parts": [{"type": "stem"}, {"type": "stem"}]}
]
}
},
"stemsAndCategories": [
{"stem": "tripled", "categories": ["participle"]},
]
}
)
print(response.json())
# {"inflectedForms": ["tripledtripledtripled"]}
response = requests.post(
url="http://0.0.0.0:8080/inflectv1",
headers={
"Content-Type": "application/json; charset=utf-8",
"Authorization": "exampleApiKey"
},
json={
"rules": {
"type": "split",
"branches": {
# Nouns are duplicated
"noun": {
"type": "formula",
"formula": {"type": "concat", parts: [{"type": "stem"}, {"type": "stem"}]}
},
# Everthing else becomes the string "somethingelse"
"others": {"type": "form", "form": "somethingelse"}
}
},
"stemsAndCategories": [
{"stem": "cat", "categories": ["noun"]},
{"stem": "ohno", "categories": ["others"]}
]
}
)
print(response.json())
# {"inflectedForms": ["catcat", "somethingelse"]}
Web App
The web app is a Next.js application using the page router. It does not contain the code to apply sound changes to, this is instead done by the API from the Lexurgy CLI. In addition, the web app connects (through Neo4J) to a database which stores .lsc and .wli files contributed by others. I have not figured out the database part yet but I got the web app running and able to apply sound changes.
Your first step is to run the API and copy the API_KEY
somewhere. Let’s say the API is running at http://0.0.0.0:8080
.
# In a new terminal
cd lexurgy-web
npm install
LEXURGY_SERVICES_URL="http://0.0.0.0:8080" LEXURGY_SERVICES_API_KEY=exampleApiKey npm run dev
When the server is up and running, go to the link printed out by Next.js to visit the server.