Thirteen years, one scroll

The Cologne Digital Sanskrit Lexicon (CDSL) was not built in a grant cycle. It was built in the open, one correction at a time, by a handful of volunteers over thirteen years. This page tells that story from the record the work itself left behind — every dictionary edit, every issue, every commit — with the numbers embedded rather than filed away in a dashboard. Read it start to finish; it takes about five minutes.

Between and , the project logged individual, reconstructable corrections to dictionaries — a public ledger of philological repair with no real parallel in Sanskrit lexicography. This is what those thirteen years look like.

Corrections recorded

–

Dictionaries touched

from Apte to Böhtlingk-Roth

Hands on the work

correctors over 13 years

The spine: a backlog that tells the whole story

The single most honest summary of the project's history is the shape of its open-issue backlog — the count of unresolved issues carried into each year. It rises through the campaign years, holds, and then, in 2026, drops sharply as the taxonomy-and-observatory era brings the ledger under active management. Every turning point in the prose below is a bend in this one line.

What this proves: the project has always generated far more work than any small team could close, and the backlog is the accumulated evidence. What changed in 2026 is not that the work got smaller — it is that the org finally began measuring and draining it.

2014–2016 · Founding, and the correction ledger

The project's first act was pure text repair. Before there was a git workflow, there was cfr.tsv — a flat form-correction file into which volunteers poured fixes to OCR and transcription errors in the scanned dictionary text: a ṭ read as द, a dropped conjunct, a mis-segmented compound. These are the form-layer corrections, and they dominate the early years: the correction ledger peaks in 2015–2016, before the project had any of the machinery it later built.

What this proves: the founding era was not about infrastructure or display — it was about getting the text right. The gold-brown bars (2014–2016) are the cfr.tsv form-correction era; everything after is the project working through the same dictionaries again with better tools.

2019 · Git arrives

For its first five years the project's corrections lived in flat files and hand-edited text. 2019 is the year the pull request appears — the first PRs in the org's history are recorded here, and the number of distinct people active in a year jumps to . This is the quiet inflection: the moment a personal correction practice became a reviewable, git-native workflow that a newcomer could join.

2021 · The volume peak

If any single year was the project at full stretch, it was : distinct authors active — the widest the contributor base has ever been — and commits, more than any year before it. The correction ledger surges again as the git-era workflow lets several dictionaries be reworked in parallel.

Peak breadth ()

distinct active authors

Commits that year

an all-time high

Issues opened

campaign in full flow

What this proves: the project's ceiling is a dozen people, not a hundred. Even at its most active it was a small circle working intensively — an important fact when reading everything that follows about concentration and continuity.

2025 · The correction wave, and the reckoning

2025 is the year the backlog crested. Issues were opened in bulk — of them, far more than any prior year — largely as a tracking mechanism for a fresh correction campaign, while closings lagged. The open-issue count carried into the next year reached its all-time high of . The project had, in effect, catalogued how much unfinished work it was actually carrying.

2026 · Taxonomy, and the observatory

The response was to start measuring. 2026 is the taxonomy-and-observatory era: a shared issue taxonomy pushed org-wide (pooled conformance now %), and closings finally outpacing openings — issues closed against opened — dropping the backlog from its 1,742 peak to . This observatory is itself a product of that era: the project turning its own thirteen-year record into citable, reproducible data.

The arc above is the encouraging reading. But the same record carries four harder facts, and an honest story has to state them.

The work rests on one person

Across all thirteen years, a single contributor — — accounts for % of every recorded contribution in the organisation. That is not a criticism of anyone; it is a structural risk. A project this concentrated is one departure away from stalling, and no amount of tooling changes that. It is the first thing a would-be funder, host institution, or successor needs to know. Community analysis →

Most of the backlog was never answered

Of the open issues still on the books, have never received a single reply — not a triage label, not a comment, nothing. Silence, not disagreement, is the dominant failure mode: work is filed and then quietly outlives everyone's attention. The backlog is not a queue being worked down in order; it is a sediment, and most of it has never been touched since the day it was opened. Issue lifecycle →

Issues that survive early tend to survive forever

The 2014 cohort makes the point starkly: % of the issues opened that year were still open four years later. Once an issue clears its first weeks unresolved, its odds of ever being closed collapse. This is why the -issue silence matters — the backlog does not decay on its own; unattended issues become permanent.

What this proves: the 2014 cohort's survival curve flattens well above zero — it never approaches full resolution. An issue's fate is largely sealed in its first months.

One thing did get fixed: licensing

The record is not only decline. When the observatory surfaced that 41 of the org's repositories carried no license at all — a FAIR-reuse violation that made the data legally unsafe to build on — the project acted. After the RH1 license rollout, only repositories remain unlicensed, and those are the archive candidates intentionally held back for a separate cleanup. A measured problem became a closed one. Repository health →

What this proves: the observatory is not a mirror the project looks into and sighs at — the licensing repair (41 → ) is the template. Surface a fact, act on it, re-measure. That is the loop this whole site exists to enable.

Where a new contributor starts

If this story leaves you wanting to help rather than only to cite, the most valuable thing you can do is the least glamorous: answer a silent issue. The never-answered open issues are where a single reply — a triage label, a clarifying question, a "this is fixed" — has the highest marginal value in the entire organisation.

Triage the silence — the Issue Lifecycle and Taxonomy Triage pages surface the unanswered and unlabelled backlog, repo by repo.
See where the work is — the Ops Command view ranks repositories by open pressure and metadata blockers, so a first contribution lands where it counts.
Reuse the data — every figure on this page is downloadable from the Data page under CC-BY-4.0; the error-typology corpus is a published language resource in its own right.

Thirteen years of one small circle's careful work are now legible, citable, and open. The next chapter is whether that circle widens.

Every figure on this page is computed live from the committed datasets — snapshot , refreshed monthly from the GitHub API and the public correction record. Nothing is hand-typed; download the underlying CSV/JSON from the Data page to check any number here.