Thirteen years, one scroll
The Cologne Digital Sanskrit Lexicon (CDSL) was not built in a grant cycle. It was built in the open, one correction at a time, by a handful of volunteers over thirteen years. This page tells that story from the record the work itself left behind — every dictionary edit, every issue, every commit — with the numbers embedded rather than filed away in a dashboard. Read it start to finish; it takes about five minutes.
Between
Corrections recorded
Dictionaries touched
Hands on the work
The spine: a backlog that tells the whole story
The single most honest summary of the project's history is the shape of its open-issue backlog — the count of unresolved issues carried into each year. It rises through the campaign years, holds, and then, in 2026, drops sharply as the taxonomy-and-observatory era brings the ledger under active management. Every turning point in the prose below is a bend in this one line.
What this proves: the project has always generated far more work than any small team could close, and the backlog is the accumulated evidence. What changed in 2026 is not that the work got smaller — it is that the org finally began measuring and draining it.
2014–2016 · Founding, and the correction ledger
The project's first act was pure text repair. Before there was a git workflow, there was cfr.tsv — a flat form-correction file into which volunteers poured fixes to OCR and transcription errors in the scanned dictionary text: a ṭ read as द, a dropped conjunct, a mis-segmented compound. These are the form-layer corrections, and they dominate the early years: the correction ledger peaks in 2015–2016, before the project had any of the machinery it later built.
What this proves: the founding era was not about infrastructure or display — it was about getting the text right. The gold-brown bars (2014–2016) are the
cfr.tsvform-correction era; everything after is the project working through the same dictionaries again with better tools.
2019 · Git arrives
For its first five years the project's corrections lived in flat files and hand-edited text. 2019 is the year the pull request appears — the first PRs in the org's history are recorded here, and the number of distinct people active in a year jumps to
2021 · The volume peak
If any single year was the project at full stretch, it was
Peak breadth ( )
Commits that year
Issues opened
What this proves: the project's ceiling is a dozen people, not a hundred. Even at its most active it was a small circle working intensively — an important fact when reading everything that follows about concentration and continuity.
2025 · The correction wave, and the reckoning
2025 is the year the backlog crested. Issues were opened in bulk —
2026 · Taxonomy, and the observatory
The response was to start measuring. 2026 is the taxonomy-and-observatory era: a shared issue taxonomy pushed org-wide (pooled conformance now
The arc above is the encouraging reading. But the same record carries four harder facts, and an honest story has to state them.
The work rests on one person
Across all thirteen years, a single contributor —
Most of the backlog was never answered
Of the open issues still on the books,
Issues that survive early tend to survive forever
The 2014 cohort makes the point starkly:
What this proves: the 2014 cohort's survival curve flattens well above zero — it never approaches full resolution. An issue's fate is largely sealed in its first months.
One thing did get fixed: licensing
The record is not only decline. When the observatory surfaced that 41 of the org's repositories carried no license at all — a FAIR-reuse violation that made the data legally unsafe to build on — the project acted. After the RH1 license rollout, only
What this proves: the observatory is not a mirror the project looks into and sighs at — the licensing repair (41 →
) is the template. Surface a fact, act on it, re-measure. That is the loop this whole site exists to enable.
Where a new contributor starts
If this story leaves you wanting to help rather than only to cite, the most valuable thing you can do is the least glamorous: answer a silent issue. The
- Triage the silence — the Issue Lifecycle and Taxonomy Triage pages surface the unanswered and unlabelled backlog, repo by repo.
- See where the work is — the Ops Command view ranks repositories by open pressure and metadata blockers, so a first contribution lands where it counts.
- Reuse the data — every figure on this page is downloadable from the Data page under CC-BY-4.0; the error-typology corpus is a published language resource in its own right.
Thirteen years of one small circle's careful work are now legible, citable, and open. The next chapter is whether that circle widens.
Every figure on this page is computed live from the committed datasets — snapshot