Chapter 4

The Archive and the Voice

2,474 words · ~10 min

There is a botanical archive in Kew Gardens, outside London, that contains specimens of plants collected from every accessible corner of the world since the eighteenth century. The specimens are pressed, dried, labeled in Latin, mounted on acid-free paper, stored in climate-controlled cabinets. Some of them are the last known physical evidence of species that no longer exist anywhere on Earth. The species are gone. The pressed specimens remain — their leaves, their seeds, their morphology described in catalog entries, their DNA sometimes still extractable from dried tissue a century and a half after collection.

The archive is invaluable. It tells us what existed. It allows us to trace evolutionary relationships, to reconstruct the ecological communities of vanished landscapes, to understand what we have lost. It is genuinely indispensable to botany.

But you cannot grow a forest from a herbarium. The archive is the record of a living thing that is no longer living. This distinction — between the record and the thing, between the archive and the voice — is the central problem of endangered language documentation, and it is a problem that the last twenty years of intensive work on Soqotri have not resolved.

···

**The Corpus and Its Funders**

The Russian linguist Vitaly Naumkin has been working on Soqotri since the 1970s. His engagement with the language is not superficial — by multiple accounts, he is one of the very few Western linguists who has learned to speak Soqotri with any fluency. His 2021 corpus of Soqotri oral literature represents decades of fieldwork, recording, transcription, and analysis. It is a major scholarly achievement by any standard.

The 2021 corpus was funded, in part, by the UAE Embassy in Moscow.

By 2024, HSE University in Moscow — the Higher School of Economics, a leading Russian research institution — had established a Centre for South Arabian Studies. The Centre is structured as a trilateral Russia-UAE academic partnership, focused on the languages and cultures of southern Arabia, with Soqotri as a significant component of the research program.

The engineering is real. The scholarship is genuine. The linguists producing this work are doing the work that needs to be done. The archive they are building contains material that would otherwise be lost. None of this is in dispute.

What is also true is that language documentation is soft power, and the people funding it are aware of this. The UAE's interest in Socotra is geopolitical — access, control, the island's strategic position between the Gulf of Aden and the Arabian Sea. The UAE has built roads and an airport there. It deployed military forces and backed a separatist political council. It installed its telecom infrastructure and kept it running after the military left. It has also, via the Moscow diplomatic channel, funded linguistic documentation of the island's language.

These are not unconnected activities. They share a common logic: the construction of presence. Physical presence (military, infrastructure), digital presence (telecom, content ecosystem), political presence (STC backing, PLC cooperation), and now archival presence — the systematic establishment of a Russian-UAE institutional claim to the scholarly record of Soqotri.

Who documents a language determines, in practice, who owns its archive. Who controls the archive shapes how the language is framed for future researchers, for policy bodies, for the UNESCO processes that govern its recognition and classification. An archive held in Moscow and Abu Dhabi is not, by definition, an archive in the service of the 70,000 Soqotri speakers on a Yemeni island in the Arabian Sea.

The Russian dictionary project — an online Soqotri lexicon being built toward an eventual printed Soqotri-English-Arabic reference dictionary — explicitly notes that its Arabic translations will make Soqotri's vocabulary "accessible to the Arabic-reading public in the UAE, the Gulf region more broadly, and, eventually, throughout the Arab and Islamic world." This is not a description of community-centered language documentation. It is a description of a resource designed for external consumption. It frames the language as an object to be accessed by outsiders rather than a medium to be used by insiders.

···

**What Documentation Is Not**

A language fully documented in a university archive is still dead if no children speak it.

This is not a criticism of documentation. Documentation is necessary. It is the difference between a language dying with a record and dying without one. It is the difference between linguists in 2150 being able to reconstruct something of Soqotri's grammatical structure and having nothing at all. Documentation is preservation in the paleontological sense — not of the organism, but of the fossil.

What documentation is not is revitalization. A language revitalizes — survives as a living thing — only if children acquire it as a mother tongue, use it as a medium of daily life, find it economically and socially and emotionally functional in their current circumstances. The conditions for revitalization are not primarily linguistic. They are social, economic, political, infrastructural. Documentation addresses none of these.

The Welsh language underwent revitalization — imperfect, contested, still ongoing — because the Welsh government mandated Welsh-medium education, funded Welsh-language television, required Welsh signage on public infrastructure, created employment contexts in which Welsh was valuable rather than merely nostalgic. The revitalization required decades of sustained political will and institutional investment, and Welsh began from a position with millions of speakers, an existing writing system, continuous literary tradition, and a state apparatus sympathetic to its survival.

Soqotri has approximately 60,000 to 70,000 speakers. No official status. No government on its side — Yemen's official language is Arabic, and the government has had other concerns since 2015. No indigenous writing system of its own (the various proposed scripts are adaptations of Arabic orthography, contested among specialists). No Welsh Assembly. No Soqotri Broadcasting Corporation. What it has is a corpus in Moscow, a keyboard with 50 downloads, and a fifteen-to-twenty year window before the mountain dialect spoken in the island's interior becomes available only in recordings.

···

**The Poetry Festival**

In 2011, Stanford University Press documented a Soqotri Poetry Festival. The competitors were pastoralists — herders of goats and camels from across the island, many of them described in the documentation as semiliterate in Arabic, many of them without formal education in any language. They competed publicly in oral verse composition, in Soqotri, with the seriousness and technical precision of a tradition that understood exactly what it was doing.

The forms were demanding. Soqotri poetic tradition has its own prosodic conventions, its own canon of imagery, its own standards of elegance and wit. The festival competitors were not preserving a museum piece. They were doing what poets do: competing to be better at something their community has always considered worth being better at. The oral verse was sophisticated. The audience was engaged. The entire occasion testified to a community with a living, creative, internally-valued relationship to its own language.

This is the thing that tends to get lost in the documentation frame: the community doesn't need to be persuaded that Soqotri is worth preserving. It knows this. The woman who composed lullabies — tendána, the specifically female voice in Soqotri tradition — and embedded ecological knowledge in them is not uncertain about whether her language matters. The men who composed hunting songs, the elders who preserved genealogical narratives, the poets who improvised at wedding feasts — this is not a community looking to outsiders for validation of its language's value.

What it lacks is structural support. The difference between a language that survives and one that doesn't is rarely the community's emotional attachment to it. It is whether the conditions exist — economic, political, technological, institutional — for that attachment to translate into intergenerational transmission. A parent who loves their language passionately will still raise children in a different language if the different language provides better access to education, employment, and connectivity. Love is not sufficient against structural asymmetry.

···

**The حكايات سقطرية Problem**

There is an app called حكايات سقطرية — *Tales of Soqotra*, roughly. Its existence is documented. What it contains — audio recordings of oral literature, text transcriptions, both, neither, something else entirely — is not clearly established in any public source I can access. This is itself a problem. If an app containing Soqotri oral literature exists and is not well-known enough for its contents to be readily described in any accessible documentation, it is not functioning as a distribution mechanism for the language. It is a digital artifact in approximately the same category as the keyboard with fifty downloads: an engineering solution that arrived without the ecosystem it required.

The app points to a gap that is not primarily technological. The technology exists. The content — or at least some content — apparently exists. What is missing is the network of platforms, communities, and distribution mechanisms that would make Soqotri-language content something people encounter without specifically seeking it out. You don't have to seek out Arabic-language content on an Etisalat network. It finds you.

···

**The Diaspora**

Approximately 18,000 Soqotri people live in Ajman, in the United Arab Emirates. The figure represents roughly a third of the island's total population — a diaspora large enough to sustain community institutions, if the conditions for them existed. There are Soqotri-owned businesses in Ajman. There are families who have maintained the language across multiple generations of residence. The community did not assimilate immediately or completely.

But the structural pressures are severe and specific. The country of emigration is also the country that occupied the island of origin. The dominant language of the host environment is the same language that is displacing Soqotri on the island. The telecom ecosystem in Ajman is Etisalat — the same network whose SIM cards are the recommended choice for tourists arriving in Socotra. The school system is Arabic-medium. The government is the one that, until January 2026, backed the political forces controlling the island's administration.

Diaspora communities sometimes maintain languages that are declining in the homeland — Irish was more robustly preserved in certain American and Australian communities than in parts of Ireland during the nineteenth century; Welsh persists in Patagonia. But these cases typically involve diaspora communities living in environments that are socially and politically distinct from the homeland, where the minority language can function as a marker of ethnic distinctiveness against a background that speaks something else entirely. The Soqotri diaspora in Ajman is living inside the cultural ecosystem of the power that occupied their island. The host country's language is the same language that is displacing theirs at home. There is no relief valve.

The 18,000 in Ajman are not lost to Soqotri — that would be too simple. They represent a population with real, ongoing connections to the island, with family networks that cross the Arabian Sea, with community gatherings where Soqotri is still spoken among people who know it. But the structural conditions for that community to sustain, transmit, and actively use the language are almost uniquely unfavorable.

···

**Who the Archive Serves**

The question of who benefits from language documentation is not cynical. It is practical. The same documentation project can serve different interests simultaneously — the community whose language it is, the linguists whose careers it advances, the funding institutions whose geopolitical interests it serves, the scholarly field that gains a richer empirical base. These interests are not mutually exclusive. They often align.

But when they diverge, the question matters. If the primary beneficiaries of Soqotri documentation are foreign linguists adding publications to their CVs, and foreign institutions adding a symbolic commitment to linguistic heritage to their soft power portfolio, and if the community whose language is being documented lacks the resources, platforms, and institutional support to use the documentation for its own revitalization — then the archive is beautiful, informative, and not serving the people it is nominally about.

The Tethyan limestone record tells us everything about what lived in that ocean. It does not restore the ocean. It does not bring back the nautiloids and rudist bivalves and reef systems that built the limestone over millions of years. It is a record of something that ended, made by the accumulation of life that was ending, preserved in the rock that replaced the water.

An archive of Soqotri in its current trajectory is the linguistic equivalent of that limestone: a record of something rich and specific, built from the accumulated transcriptions and recordings of speakers who are growing fewer and older, preserved in formats that researchers can access in perpetuity while the language itself — the living thing, the medium in which children first learn to name the world — narrows and retreats and finally vanishes into the interior of an island where the youngest people who still know the mountain dialect are already in middle age.

Documentation preserves the fossil. The fossil is not the organism. Paleontologists know this. Linguists know this. The institutions funding corpus work sometimes prefer not to dwell on it, because the implication — that documentation without revitalization is preservation of a corpse — complicates the funding case.

···

**The Gap**

The gap between the archive and the voice is not a gap that linguists alone can close. Linguists can document, describe, record, transcribe, and analyze. They cannot create the social and economic conditions that make a language worth speaking to your children. They cannot mandate Soqotri-medium education — they have no authority over Yemeni or Yemeni-adjacent curriculum decisions. They cannot build a Soqotri Broadcasting Corporation. They cannot create employment contexts in which Soqotri is economically functional rather than merely culturally meaningful.

What they can do is make the case for who should be doing those things, and how urgently, and with what resources. The scholarly consensus on Soqotri's status as a distinct, endangered language with irreplaceable features in the Semitic record is the foundation for that case. The naming war matters here too: if the institutional record calls Soqotri a "Southern Arabic" variety, the case for treating it as a distinct language requiring distinct preservation mechanisms collapses before it can be made.

The 2011 poetry festival pastoralists competing in oral verse were not waiting for linguists to validate their practice. They were doing it because it was worth doing, because their community said so, because the language was the medium in which their sense of the world was organized. That relationship between a community and its language is the only thing that has ever sustained a language across generations.

The archive records that relationship. It cannot replace it.

The voice is the thing. The archive is the shadow. The shadow is not nothing — ask the paleontologist about the value of fossils — but the shadow tells you about the organism only after the organism is gone.

The mountain dialect speakers in the interior of Socotra are still speaking. That is the datum. Everything else is a question of whether anything changes before they stop.