Generation Pipeline
The generation pipeline turns canonical source text in csl-orig into the XML, search
indices, and web displays served on the live site. The tooling lives in
csl-pywork.
Generate a dictionary
cd csl-pywork/v02
sh generate_dict.sh {dict} tempparent/{dict}
This reads csl-orig/v02/{dict}/{dict}.txt and produces a complete, self-contained
dictionary installation under the target directory:
outdir/
orig/ ← source text copied from csl-orig
pywork/ ← scripts for headwords, XML, SQLite
web/ ← display scripts (from csl-websanlexicon)
downloads/ ← zip-archive generation scripts
What the four stages do
| Stage | Script | Role |
|---|---|---|
| 1 | generate_orig.sh | Copy {dict}.txt, _hwextra, header.xml, -meta2 from csl-orig |
| 2 | generate_pywork.sh | Assemble pywork/ by rendering Mako templates with the dict's parameters |
| 3 | generate_web.sh | Assemble web/ (the four displays) from csl-websanlexicon; emit the SQLite build scripts |
| 4 | execute | Run the assembled scripts in pywork/ (below) |
The stage-4 scripts produce the actual artifacts:
| Script | Output |
|---|---|
redo_hw.sh | {dict}hw.txt — the headword list |
redo_xml.sh | {dict}.xml — full XML, validated against {dict}.dtd |
redo_postxml.sh | web/sqlite/{dict}.sqlite + abbreviation / tooltip / bibliography databases |
downloads/redo_all.sh | the txt / xml / web zip archives |
See Data Formats for the shape of the generated XML and the derived SQLite/JSON/StarDict formats.
Validate the XML
sh xmlchk_xampp.sh {dict}
On a machine without XAMPP/xmllint (typical on Windows), run the Python XML builder and
treat "All records parsed by ET" from make_xml.py as a passing signal.
Prerequisites (local Windows setup)
-
python3available onPATH— a thin wrapper forwarding topythonworks:printf '#!/bin/bash\npython "$@"\n' > /tmp/pybin/python3chmod +x /tmp/pybin/python3export PATH="/tmp/pybin:$PATH" -
makoinstalled:pip install mako. -
csl-websanlexiconchecked out as a sibling ofcsl-pywork.
Windows / encoding notes
- Every Python script should call
sys.stdout.reconfigure(encoding='utf-8')and the same forstderr. - Pass
encoding='utf-8'to everysubprocess.runthat readsgh apioutput. - Never write source files with a BOM — use
open(f, 'w', encoding='utf-8'), notutf-8-sig.
Where this fits
This step sits between Corrections Workflow (producing corrected source) and the published site. See Architecture for the end-to-end diagram.
Deploying to the live server
CDSL is not deployed by CI — the site is generated and served directly on the
University of Cologne web server. Per
csl-websanlexicon/readme_cologne.org:
- Each dictionary's web application (the B/L/A/M displays, in PHP) is produced by a
generate.pystep using mako templates, run in thev00working directory on the Cologne host. - Output is written into the server's scan tree at
…/{CODE}Scan/{year}/web/— the very paths the live URLs use (e.g./scans/MWScan/2020/web/…). The{year}segment marks the generation vintage (2013–2014 originally; 2020 for the current build). - The Cologne host then serves those generated PHP/JS/CSS files directly (Apache + PHP;
XAMPP is referenced for the equivalent local validation in
xmlchk_xampp.sh).
So regeneration is the deploy: running the generator on the server refreshes what visitors see. The cadence is manual — maintainers regenerate after a batch of corrections — not triggered automatically.
readme_cologne.org is dated 2018 and references the Python 2 toolchain; the current
generation (csl-pywork, Python 3 + mako) is newer, but the publish model is
unchanged — generated files live in the server's …/{CODE}Scan/{year}/web/ tree and are
served from there.