How Agent Decision Logging Solved My 60,000-Reference Problem

Simon Willison coined the term “cognitive debt”---where excessive unreviewed AI-generated code erodes your mental model of what you built. Chris Albon nailed the follow-up:

This is by far the most important thing for me. Why did it choose to use a hash function in some module? Was that random or a critical requirement? Can I remove it? And in my experience when I do have those questions the ai agent lost its context so it doesn't know either

Chris Albon @chrisalbon

View on X →

He’s right. The agent loses context, so it can’t explain itself. His proposed fix: better decision logging by the agent, so it can explain why it did what it did.

I just lived through a project where this worked. Not theoretically---across six weeks, seven scan cycles, one catastrophic regression, and 60,000 broken metadata references.

The Problem

We’re migrating 57 SharePoint 2016 sites to SharePoint Online. ShareGate Desktop handles the heavy lifting, but there’s a prerequisite: every managed metadata term referenced by a list item on-prem must exist in SPO’s term store with the same GUID. Otherwise metadata silently drops during migration.

The first scan found 59,998 invalid term references. Terms in wrong term sets, terms that didn’t exist, term sets that had been deleted, deprecated terms, terms stuck in the Keywords term set (which has special API restrictions). A taxonomy mess accumulated over a decade.

This wasn’t a weekend fix. It was weeks of diagnosis, scripting, breaking things, and recovering---across dozens of Claude Code sessions.

The Context Problem

Each Claude Code session starts fresh. No memory of the previous session. When I opened VS Code on a Tuesday morning and said “let’s fix the next batch of term refs,” Claude had no idea what ReuseTerm was, what we’d tried, what had failed, or why certain API calls were off-limits.

The context window doesn’t help here. The relevant decisions were spread across two weeks of work. No single session could hold them all.

The Solution: Two Files

Claude Code has two mechanisms that turned out to be critical:

CLAUDE.md --- project instructions checked into the repo. It describes the project structure, key commands, and conventions. Claude reads it at the start of every session.

MEMORY.md --- an auto-memory file that persists across conversations. Claude can read and write to it. When something important happens---a pattern discovered, a mistake made, an API gotcha learned---it gets recorded here.

Together, these files became the agent’s decision log.

The Decision Chain

Here’s what the term store fix looked like as a series of decisions, each building on the last.

Decision 1: How Do You Even Run SharePoint Cmdlets?

SharePoint Server PowerShell cmdlets throw “Cannot access the local farm” when run through PSRemoting. The fix: deploy a worker script to the server via admin share (\\sp16app1\c$\temp\), create a scheduled task via Invoke-Command, and run it as the farm account.

This pattern was non-obvious and took a full session to discover. It went straight into MEMORY.md:

SP16App1 Remote Execution Pattern: Use deploy+scheduled task pattern. SharePoint cmdlets CANNOT run through PSRemoting.

Every subsequent session that needed to touch the on-prem farm just… knew this.

Decision 2: Diagnose Before You Fix

The 59,998 references weren’t one problem. They were five:

Category	Description	Count
WrongTermSet	Term exists, but in a different term set	~30,000
AlreadyFixed	Scan was stale; term is already in the right place	~18,000
GenuinelyMissing	Term doesn’t exist anywhere	~800
TermSetMissing	The entire term set is gone	~200
KeywordsTarget	Term stuck in Keywords (special API restrictions)	~600

Claude wrote Diagnose-InvalidTermRefs.ps1 to categorize every unique TermGuid+TermSetId combination. The categorization schema persisted in the script’s .DESCRIPTION block and in MEMORY.md. Future sessions didn’t have to rediscover these categories.

Decision 3: The Move() Disaster

This is the one that matters most for the cognitive debt argument.

For “WrongTermSet” terms, we needed to make a term available in a different term set. The SharePoint API offers two ways:

$term.Move($targetTermSet) --- moves the term. Simple. Clean.
$termSet.ReuseTerm($term, $false) --- adds the term to another term set, keeping it in the original. Additive.

The first fix script used Move(). It processed ~5,000 terms before we realized: Move is destructive. It removes the term from its source term set. Every site that referenced that term in the original term set now had a broken reference. We’d created more problems than we solved.

This went into MEMORY.md in bold:

NEVER use $term.Move($targetTermSet) --- destructive, removes from source. Caused regression of ~5,000 refs.

From that point on, every Claude Code session that generated term store fix code knew not to use Move(). Not because I remembered to tell it---because it read the memory file and the information was right there.

The unified fix script that replaced all the earlier attempts has this in its header:

NEVER uses Move() - it is destructive and removes terms from source term sets.

That line exists because an agent recorded a decision and its rationale.

Decision 4: Keywords Can’t Be Reused

The Keywords term set has unique API restrictions. You can’t ReuseTerm into or out of it. You can’t CreateTerm with a GUID that exists elsewhere. The Managed Metadata Service just says “no.”

Claude discovered this through failed API calls, recorded the constraint, and then designed a workaround: create entirely new terms with auto-generated GUIDs, then update every list item’s field value to point at the new term.

# Multi-value taxonomy update pattern
$collection = New-Object Microsoft.SharePoint.Taxonomy.TaxonomyFieldValueCollection($taxField)
$collection.PopulateFromLabelGuidPairs($pairString)
$taxField.SetFieldValue($item, $collection)
$item.SystemUpdate($false)

Result: 195 terms created, 551 items updated, 0 failed. The pattern was reused for WAPA and Project Support terms that had the same Keywords constraint.

Decision 5: SPO Sync Is Unreliable

After fixing on-prem, we assumed hybrid taxonomy sync would push terms to SPO. It didn’t. Verification showed 277 of 442 “fixed” terms were missing from SPO.

Another constraint learned, another memory entry:

Hybrid taxonomy sync unreliable: 277 of 442 “AlreadyFixed” on-prem terms were MISSING from SPO.

This led to Create-MissingSPOTerms.ps1---batch CSOM creation with 10 terms per batch, 15-second pauses between batches, to avoid the Managed Metadata Service’s internal throttling.

Decision 6: The Silent No-Op Bug

The most insidious bug. When a CSOM ExecuteQuery() fails, it clears the operation queue. Our retry logic called $ctx = Get-PnPContext to get a “fresh” context---which created a new context with an empty queue. The retry executed against nothing. Silent success. Zero terms created.

Two full runs were no-ops before we caught it.

The fix: re-queue all CreateTerm() calls before retrying. And always verify after batch creation with spot checks.

This went into MEMORY.md immediately:

CRITICAL CSOM retry bug: After ExecuteQuery() fails, CSOM clears the operation queue. NEVER call $ctx = Get-PnPContext on retry.

What Decision Logging Actually Looks Like

After six weeks, MEMORY.md had grown to ~100 lines of tightly scoped entries:

API constraints (what’s safe, what’s destructive)
Patterns that work (batch sizes, delay intervals, parameter sets)
Bugs encountered (with root causes, not just symptoms)
Progress tracking (scan numbers, remaining counts)
Script locations (which script does what)

Each entry was written at the moment of discovery, not reconstructed later. That’s the key difference. The agent logged decisions as it made them, so future sessions inherited not just the decision but the reasoning.

Why This Works Better Than Chat History

You might think: “Just scroll up in the chat.” Three reasons that doesn’t work:

Sessions end. Context is lost between VS Code sessions. MEMORY.md persists.
Signal-to-noise. A two-hour debugging session produces thousands of lines of chat. The decision that matters is one line: “NEVER use Move().”
Compounding knowledge. Decision 6 (the CSOM retry bug) only made sense because Decision 5 (SPO sync unreliable) was already recorded. The agent could connect the dots across sessions.

The Broader Pattern

Chris Albon said the workaround is “better decision logging by the agent.” I’d go further: the agent needs to prioritize information routing in context management. Not every fact is equally important. The ones that matter are:

Constraints discovered through failure. “Move() is destructive” is worth more than “Move() exists.”
Patterns confirmed across multiple runs. “Batch CSOM with 10 terms + 15s delay” is battle-tested, not theoretical.
Gotchas that violate assumptions. “PnP PowerShell only works in PS7” saves 30 minutes of confused error messages.

Claude Code’s MEMORY.md is one implementation. The underlying principle is broader: agents need a persistent, curated, semantic store of decisions and their rationale. Not a dump of everything that happened. A filtered record of what matters for future work.

The Numbers

Metric	Value
Starting invalid refs	59,998
After 6 scan+fix cycles	34,353
Keywords terms created	195 (0 failed)
List items updated	551 (0 failed)
SPO terms created	243
Sessions (approx)	30+
MEMORY.md entries	~100 lines

Every one of those sessions started cold. Every one inherited the full decision history from the sessions before it.

Takeaways

If you’re using AI agents for multi-session work:

Make your agent write down what it learns. CLAUDE.md, MEMORY.md, a decision log---the format matters less than the habit.
Record failures, not just successes. The “NEVER use Move()” entry prevented more damage than any fix script.
Keep it curated. 100 lines of distilled decisions beat 10,000 lines of chat history.
Let the agent update its own memory. Claude wrote these entries itself, at the moment of discovery. I didn’t have to remember to tell it.

The cognitive debt problem is real. But it’s not unsolvable. The agent just needs to keep better notes---and read them next time.

59,998 broken references. Six weeks. One memory file. Zero repeated mistakes.