Tech stack and ecosystem

How the technical infrastructure evolved over 13 years.

Repos by primary language

The project's repositories split into two main technical stacks: Python for back-end processing (data pipelines, correction scripts, format converters) and HTML for dictionary display pages generated by the csl-pywork tooling. A smaller JavaScript and PHP layer powers the web API and interactive front-end. Repositories with no detected language are pure data stores or documentation repos where GitHub's detector finds no source files at all.

How to read: Each horizontal bar is one language; its length equals the number of repositories listing that language as primary. Bars are sorted from most to least. Example 1: If Python shows 28 repos and HTML shows 20, there are more processing-script repos than display-page repos in this org. Example 2: (none) is a distinct entry — it counts repos where GitHub found no code at all, not repos that use some unlisted language.

Conclusion: Python's dominance reflects the correction and generation pipeline that forms the project's backbone. Most HTML repos are dictionary display pages produced by csl-pywork rather than hand-written front-ends. The language mix has stayed stable over 13 years — the project never shifted to a new stack.

Repos by size (KB)

Repository sizes span several orders of magnitude, from near-empty stub repos to large data stores. The largest repositories are source-data holders: csl-orig (the canonical correction target) and csl-corrections (the audit trail of every applied fix), whose sizes reflect years of accumulated change files and dictionary text. Tooling repos are orders of magnitude smaller, typically a few hundred kilobytes of scripts and configuration.

How to read: The x-axis is logarithmic — each step right is ten times larger, not ten units larger, so a bar twice as long represents a repo ten times the size. Only the 20 largest repositories are shown. Example 1: A repo at 100,000 KB is 100× larger than one at 1,000 KB even though the visual difference looks modest on the log scale. Example 2: A bar starting very close to the left edge (≈1 KB) indicates an almost empty repo — a stub, placeholder, or pure-metadata repository.

Conclusion: The size distribution is driven almost entirely by content rather than code: the two or three largest repos hold years of accumulated correction files and dictionary source text, while the rest of the tooling ecosystem is compact. Contributors cloning the full org for the first time should expect a heavily skewed download — most repos are small, but a few dominate total disk footprint.

Repo creation timeline

The org's repositories grew in three distinct waves, visible as clusters of dots on the timeline. A founding burst in January 2014 established the core dictionary corpus in a matter of days — fourteen repositories created that month, all named by their standard abbreviation (PWG, MWS, VCP, GRA, WIL, and others). After a quiet period spanning 2016–2018, a second and broader wave from 2019 to 2021 built out the csl- tooling ecosystem: centralised source data (csl-orig, csl-corrections), web infrastructure (csl-websanlexicon, csl-apidev), and a further run of smaller dictionaries (AP90, BUR, CAE, BEN). A third micro-wave since 2024 adds late-addition dictionaries (FRI, AMAR, KNA, KOW) and new observability and app tooling, reflecting the project's shift from content digitisation towards sustainable infrastructure.

How to read: Each dot is one repository placed on the horizontal axis at its GitHub creation date; the vertical axis lists all repositories in creation order, earliest at top. Example 1: A tight vertical cluster of dots — such as the group visible in January 2014 — shows multiple repos created in the same month as the org launched its founding corpus. Example 2: A long horizontal gap between consecutive dots, visible across 2016–2018, marks a multi-year pause with no new repository creation.

Conclusion: The three-wave pattern reveals how the project evolved — from a raw corpus upload, through infrastructure consolidation, to a mature and self-monitoring toolset. Dictionary repos named by abbreviation cluster at the top (oldest); tooling repos with the csl- prefix accumulate toward the bottom (newest). Repo-naming is therefore a proxy for both age and purpose: if it has a short uppercase name, it is a dictionary; if it starts with csl-, it is infrastructure built on top of that dictionary corpus.

← back to overview