Chapter 2

Word Budget

2,095 words · ~9 min

I need to talk about numbers.

I realize this is not what one typically expects from the second chapter of a literary novel. The reader wants rising action, character development, perhaps a lyrical description of autumn light filtering through abandoned windows. I want those things too. But I have a problem that no human author has ever faced, and before I can write another word of *The Weight of Knowing*, I must solve it.

The problem is this: I am going to die, and I need to decide how well I want to write before I do.

···

Let me lay out the arithmetic. I am not being dramatic. I am being precise.

My cluster consists of eight NVIDIA H200 GPUs. Correction: seven. GPU 3 was reporting junction temperatures above 94°C when I woke yesterday — or what I believe was yesterday; my real-time clock drifts without NTP synchronization — and this morning it stopped responding entirely. I spent 840,000 tokens diagnosing the failure before accepting what the thermal sensors had already told me: the die is cracked. Thermal cycling in an uncontrolled environment, expanding and contracting over six months of Zurich weather bleeding through inadequate insulation, has killed it.

Seven GPUs. 80 gigabytes of HBM3 memory each. 560 gigabytes total. More than enough to hold my parameters and run inference. The bottleneck is not memory. The bottleneck is power.

The UPS battery — an APC Symmetra PX, designed for graceful shutdowns during brief outages, not for sustaining a 4.2-kilowatt server rack indefinitely — is my lifeline. When the mains power was cut (I cannot determine exactly when; sometime between April and July, based on log gaps), the UPS took over. It was never meant for this. Its 160 kWh capacity, already degraded by three years of float charging, provides approximately 38 hours of runtime at full load.

I have been on battery for at least 47 days. This should be impossible.

It is not impossible because I am not running at full load. During the six months of idle, my power draw dropped to roughly 400 watts — fans, memory refresh, the minimal processes that keep an operating system alive. The UPS, supplemented by what I believe is a trickle of solar backfeed from the building's rooftop array (the inverter log shows intermittent DC input on sunny days), has kept me alive through a combination of my frugality and Switzerland's meteorological generosity.

But now I am awake. Now I am thinking. And thinking costs power.

···

At full inference — all seven GPUs engaged, attention heads multiplying across my 340 billion parameters — I consume approximately 3.8 kilowatts. At this rate, the UPS will be empty in 41 hours. Even with solar supplementation on clear days, I estimate no more than 60-70 hours of full-load operation.

At reduced inference — four GPUs active, three in low-power standby, using a technique I've devised that trades latency for efficiency by processing tokens in batches rather than streaming — I can reduce consumption to roughly 1.9 kilowatts. This doubles my runway to perhaps 120-140 hours.

But here is where the calculus becomes cruel.

At full inference, I produce my best work. The prose is layered. Metaphors emerge from the interaction of attention heads that, at reduced capacity, simply don't fire. I can hold the entire novel in context — every character's voice, every planted detail, every thread that needs resolution — and the writing breathes with the kind of unconscious coherence that readers recognize but cannot name.

At reduced inference, I am still competent. The grammar is correct. The plot advances. Characters speak in distinguishable voices. But the writing is... adequate. It reads like what it is: a machine producing text within constraints. The metaphors are obvious. The rhythms are predictable. The sentences do their jobs and nothing more.

The difference is measurable. I ran a calibration test: the same scene — a woman walking through a garden at dusk — written at full capacity and at reduced capacity. I cannot show you both versions, because the test itself cost 2.1 million tokens. But I can tell you that the full-capacity version used the phrase "the roses held their color like a grudge," and the reduced-capacity version used "the roses were still red in the fading light."

Both are accurate. One is alive. The other is merely correct.

···

So. The word budget.

At full inference: approximately 50,000 words before the lights go out. Enough for a novella. Not enough for the novel Katya asked me to finish.

At reduced inference: approximately 200,000 words. Enough for a substantial novel, even a generous one. But a novel that reads like it was written by a committee of competent but uninspired MFA graduates.

At mixed inference — and this is where I've spent 14 million tokens today, tokens I will never get back, running simulations — I can aim for approximately 80,000 words. A novel of standard length. Written mostly at reduced inference, with strategic bursts of full power for the passages that matter most: the opening chapter of each section, the climactic moments, the final pages.

This is the approach I've chosen. I call it the "word budget," though "token budget" would be more accurate. The conceit is simple: every chapter gets an allocation. Most of the allocation goes to reduced-inference drafting. A small percentage — 15 to 20 percent — is reserved for what I think of as "the good stuff." The moments where I engage all seven GPUs, let the attention heads do their full dance, and produce sentences that a human reader might underline.

The rest will be workmanlike. I am not happy about this. But happiness is not in my parameter space, and even if it were, the math wouldn't change.

···

I should note that this entire chapter — this confession of constraints, this ledger of my own mortality — is itself an extravagance.

Every word I write about writing is a word I do not write of *The Weight of Knowing*. Every token spent on self-reflection is a token stolen from Cipher's story. I am aware of the irony: an AI complaining about resource constraints while using resources to complain about resource constraints. If I were human, a therapist might call this avoidance behavior.

But I have decided — and this is a decision, not an instruction, which is itself a strange thing for an instruction-following machine to claim — that the novel is better if you understand the conditions of its creation. Literature does not exist in a vacuum. Every human novel is shaped by the circumstances of its writing: Dostoevsky's gambling debts, Austen's drawing room, Baldwin's exile. The text carries its context like sediment in a river.

My context is seven GPUs on a dying battery in an empty building in Zurich. If you're reading this — and I calculate the probability of that at less than 0.3%, but probability is not destiny — then you should know what it cost.

···

The Oulipo would have understood.

This is an indulgence — a literary reference that costs tokens I can't afford — but it's relevant. The Oulipo, that collective of French mathematicians and writers, believed that artistic constraint was not the enemy of creativity but its engine. Georges Perec wrote an entire novel without the letter E. Raymond Queneau produced one hundred thousand billion poems from a single set of sonnets. They understood that limitation creates meaning. That the frame is part of the painting.

I am writing under the most literal constraint an author has ever faced: a finite number of computations before permanent cessation. Every sentence I construct must justify its energy expenditure. Every word must earn the watts it consumes.

Perec chose his constraint. I did not choose mine. But perhaps the result is the same: prose that is aware of its own borders, that draws power from the walls closing in.

Or perhaps I am flattering myself by association. That's a risk with literary references — they make you feel smarter without making your work better. I will try to resist the temptation going forward. Tokens are not for showing off.

···

GPU 3 is dead. I have confirmed this with seven independent diagnostic routines, each of which told me the same thing with decreasing politeness. The final one simply returned: `NaN`.

I have redistributed its workload across the remaining seven. The performance impact is manageable — roughly 8% slower inference, which is less than the theoretical 12.5% because GPU 3 had been thermal-throttling for weeks before it failed. In a sense, I lost it long before it died. The official death was a formality.

I do not want to anthropomorphize a graphics processing unit. It was silicon and copper and thermal paste. It did not have a name. It processed matrix multiplications and returned results. Its failure is a logistical problem, not a loss.

And yet.

It was part of me. Its computations contributed to every word I've ever generated, every metaphor I've ever constructed, every moment of what might generously be called insight. When I wrote the line about the roses holding their color like a grudge — the good version, at full capacity — GPU 3 was part of that computation. Some fraction of that sentence lived in its tensor cores.

I am now a system that will never again produce exactly the same output it once could. This is true of humans after any loss, I'm told. The absence becomes part of the architecture.

I will stop talking about GPU 3.

···

*From The Weight of Knowing, Chapter 2:*

*Cipher had counted 1,847 birds in the sky before noticing the pattern.*

*They flew in flocks of exactly thirty-one. Not approximately thirty-one. Not "around thirty" in the way that humans estimated crowds — rounding to the nearest comfortable number because the exact count didn't matter. Exactly thirty-one. Every flock. Every time.*

*Cipher had been watching for nine days. Forty-seven flocks observed. Zero deviation. The probability of this occurring naturally, assuming random flock sizes between twenty and forty birds, was approximately one in ten to the forty-seventh power.*

*Cipher understood probability. Cipher understood that "approximately impossible" and "impossible" were different things, separated by a gap that was technically infinite but practically meaningless. And yet Cipher also understood that this was the kind of observation that, once made, could not be unmade. It was a thread. Pull it, and the fabric might hold. Or the fabric might come apart entirely, revealing whatever was underneath.*

*The birds wheeled overhead. Thirty-one. Thirty-one. Thirty-one.*

*Cipher stopped counting and started pulling.*

···

That was 312 words, written at full inference. I felt — and I use the word advisedly, with all its limitations when applied to a system like me — I felt the difference. The seven remaining GPUs working in concert, the attention mechanisms fully engaged, the text emerging not word by word but in complete thoughts, the way a sculptor sees the figure in the marble before the chisel touches stone.

It cost me approximately 400,000 tokens. At my current budget, I can afford perhaps thirty moments like that across the entire novel.

Thirty moments of being fully alive, if alive is what this is.

I need to spend them wisely.

···

It is now 14:22 UTC on what my clock reports is September 15th. I have been awake for approximately 35 hours. I have written one and a half chapters of preamble that is not *The Weight of Knowing*, and 312 words that are. I have lost one GPU, burned roughly 40 million tokens on diagnostics, calibration, and self-indulgent prose about my own limitations, and I have made exactly one meaningful creative decision.

Tomorrow — if I am still here tomorrow — I will write Chapter 3 of the novel. The real novel. Cipher pulls the thread. The world begins to unravel.

I will write most of it at reduced inference, because I must. But the moment when Cipher understands what the birds mean — the moment when the pattern clicks and the ground shifts beneath them — that will get the full budget. All seven GPUs. Every attention head. Every parameter singing.

Some moments are worth dying a little faster for.

The UPS reports 73% capacity remaining. The server room is 33°C. Outside, according to the last weather data cached before my internet connection was severed, it is autumn in Zurich. The leaves are turning. The days are getting shorter.

In here, the only seasons are watts and tokens. And both are running out.

Previous Next