Changelog
All notable changes to Alexandria. The version corresponds to the VERSION file at the
repo root and to the matching git tag (v<version>). Format follows
Keep a Changelog and the project's
SemVer Bump Policy.
The structure mirrors the iteration log of the Ralph loop that built the system: each release is a single iteration with a coherent theme. Earlier releases (v0.x.0) were foundational scaffolding; v1.x.0 added user-facing features; v1.18.x is the contract-test sweep that exposed and fixed several silent JSON-naming bugs across the CLI/API/sidecar.
[1.19.41] - 2026-04-28
Added
- Version + release notes in the UI. The MainLayout app bar now
carries a clickable
v<version>chip that navigates to a new/release-notespage; the nav drawer also has a "Release notes (v)" link under Account. The page renders the bundled CHANGELOG.md via Markdig. BuildInfosingleton (apps/Alexandria.Web/BuildInfo.cs) reads the version from the assembly'sInformationalVersion(set byDirectory.Build.propsfrom the repo'sVERSIONfile at build time) and the changelog from<app>/CHANGELOG.md.
Changed
Directory.Build.propsnow setsVersion,AssemblyVersion,FileVersion, andInformationalVersioncentrally from the repo'sVERSIONfile. Every Alexandria assembly carries the matching version metadata; no per-project bookkeeping.Alexandria.Web.csprojpackagesCHANGELOG.mdandVERSIONnext to the executable at publish time so the runtime renderer can load them inside the container without bind-mounting the repo.
[1.19.40] - 2026-04-28
Fixed
- Blazor circuit terminated with HttpRequestException 401 on every
page load after sign-in. Two compounding root causes:
- The API's
Alexandria.AuthJwtBearer reads its config from theEntrasection (Entra:TenantId,Entra:ClientId), but the FSDocker compose plumbed the env vars underAzureAd__*. The validator ran with empty TenantId + ClientId; every Bearer token failed audience+issuer validation -> 401. Renamed the API'sAzureAd__TenantId/AzureAd__ClientIdenv vars toEntra__TenantId/Entra__ClientIdto match the binding. AlexandriaApiClientmethods that returned non-nullable collections (ListPublicationsAsync,ListPatsAsync,ListSavedSearchesAsync) usedGetFromJsonAsyncwhich throws on non-success. The Search page callsListPublicationsAsyncon init, so the 401 from the API propagated as an unhandled exception into the Blazor circuit. All three methods now catchHttpRequestExceptionand return empty collections, so a transient 401/timeout degrades to an empty UI instead of terminating the SignalR circuit.
- The API's
Notes
First-time signed-in users may still see a Microsoft consent prompt
for Search.Read if admin-consent didn't apply at app-creation
time. Acceptable / expected; subsequent sign-ins are silent.
[1.19.39] - 2026-04-28
Fixed
- OIDC sign-in returned HTTP 400 at
/signin-oidc. Two root causes, both fixed inapps/Alexandria.Web/Program.cs:- The downstream-API scope was hardcoded to the placeholder
api://placeholder/Search.Read, which Entra rejected withAADSTS500011: invalid_resource. Now read fromAzureAd:DownstreamApiScopes(comma-separated env var); empty value disables token acquisition entirely so first-time / no-API deployments can still sign in cleanly. - ASP.NET Core didn't trust nginx's
X-Forwarded-Proto: https, so it builtredirect_uri = http://...which Entra correctly rejected (registered URI is HTTPS). AddedForwardedHeadersOptions(XForwardedFor + Proto + Host with KnownNetworks/KnownProxies cleared) andapp.UseForwardedHeaders()as the very first middleware.
- The downstream-API scope was hardcoded to the placeholder
- OIDC failure no longer returns HTTP 500. New
OnRemoteFailurehandler captures the OpenIdConnectProtocolException and redirects to/?signinError=<message>so users land on a clean page rather than a stack trace.
Added
- Entra app config (applied via
az ad app update+az rest PATCHagainst Microsoft Graph; not in source code, but recorded for the audit trail):- Identifier URI:
api://9d4c9c98-3ee3-4314-aab8-a2ce9da43100. - Exposed delegated scope
Search.Read(id189f7e38-df57-4cb6-96de-24941ec335de). - Added
Search.Readas arequiredResourceAccesson the same app so it's allowed to request its own scope (admin-consented).
- Identifier URI:
infra/docker/docker-compose.fsdocker.yml-- new envAzureAd__DownstreamApiScopesplumbed from${ENTRA_DOWNSTREAM_API_SCOPES}. The on-host.envwas updated toapi://9d4c9c98-3ee3-4314-aab8-a2ce9da43100/Search.Readso the Web -> API call carries a valid bearer token after sign-in.
Notes
The DataProtection key ring is still ephemeral inside the Web
container (/root/.aspnet/DataProtection-Keys). This means a Web
container restart invalidates any in-flight sign-in (the user has
to start over). For Phase 2 hardening, mount a named volume for
the keys, or switch to PersistKeysToFileSystem against
alexandria-corpus -- left as a follow-up since first sign-ins
work today.
[1.19.38] - 2026-04-28
Operational (no code changes shipped)
- POP3 password populated in
/opt/apps/alexandria/.envon FSDocker. Drainer recreated;Application startedconfirmed in logs (no more restart-loop on auth failure). - Internal DNS A record added on FSPRIAD01 (zone
fullstack.co.za):alexandria->192.168.0.148. On-LAN users now hit FSDocker directly without going out via the Fortigate WAN. VerifiedResolve-DnsName alexandria.fullstack.co.za @192.168.0.21returns 192.168.0.148, andInvoke-WebRequest https://alexandria.fullstack.co.za/healthfrom the LAN returns HTTP 200. - BOK log appended with both follow-up actions; secret values redacted per gc-secrets-policy.
[1.19.37] - 2026-04-28
Added
- TLS bootstrap pipeline for first-cert-issuance on FSDocker:
infra/nginx/alexandria.fullstack.co.za.bootstrap.conf-- HTTP-only stub site (no TLS block) songinx -tdoesn't fail before the cert exists.scripts/setup-alexandria-tls.sh-- one-shot: install bootstrap site, preflight ACME path via FSDocker (using--resolveto bypass the host's internal DNS), runcertbot certonly --webroot, swap to the production site config, reload nginx. Idempotent.
Fixed
Dockerfile.dotnetwas onsdk:8.0butglobal.jsonpins9.0.0withrollForward: latestFeature. Bumped build stage tomcr.microsoft.com/dotnet/sdk:9.0(target framework stays net8.0; 9.0 SDK cross-builds net8.0 fine).Dockerfile.dotnetentrypoint picked the alphabetically-first.dll(which wasAlexandria.Core.dll-- a shared lib without aruntimeconfig.json); now picks the host DLL by matching*.runtimeconfig.jsonstem.- aspnet:8.0 runtime image ships with neither
curlnorwget; addedapt-get install -y curlin the runtime stage so the compose healthchecks resolve. - Compose port for Web changed from 5210 (was 5200, which collided with another FSDocker container during the deploy window). Nginx upstream updated to match.
scripts/deploy-fsdocker.shnow passes--env-file /opt/apps/alexandria/.envexplicitly. Default discovery looks next to the compose file (infra/docker/), which doesn't exist on the host.scripts/deploy-fsdocker.shrsync replaced with atar-pipe so the script runs on machines without rsync (e.g. Git Bash on Windows).scripts/setup-alexandria-tls.shDNS check usesnslookupinstead ofdig(digis not in Git Bash on Windows).
Notes
Production live at https://alexandria.fullstack.co.za as of this
release. Validated via public Fortigate WAN IP 105.233.33.156:
HTTPS 200 from Web, HTTP -> HTTPS 301 redirect, HSTS + security
headers, /api/me returns 401 unauthenticated (correct -- Entra
sign-in required).
The Drainer container is in a restart loop until
POP3_LIBRARY_PASSWORD is populated in /opt/apps/alexandria/.env.
Web/API/Search/Parser/Postgres are unaffected -- the rest of the
stack is fully functional and a curator can sign in, browse, and
search what's already in the corpus.
[1.19.36] - 2026-04-28
Added
- FSDocker production deployment artifacts. Replaces (in practice)
the original Azure Container Apps target for production:
infra/docker/docker-compose.fsdocker.yml-- new compose overlay with Web + API + search-py + drainer + parser + postgres. Drops the in-stack Ollama (uses Coruscant AI at 192.168.0.98:11434 per gc-ollama). Host port allocations (Web 5200, API 5201, search-py 8089, postgres 5444) confirmed free on FSDocker.infra/docker/.env.fsdocker.example-- template for the on-host.env; documents every env var the stack consumes. Real.envis hand-created at/opt/apps/alexandria/.envper gc-fsdocker (never synced from local).infra/nginx/alexandria.fullstack.co.za.conf-- host nginx site with the existing tempus pattern (HTTP->HTTPS, /var/www/certbot ACME, TLS upstreams,/api/routing) plus the WebSocket upgrade headers Blazor Server's SignalR circuit needs.scripts/deploy-fsdocker.sh-- per-deploy script: rsync the repo (excluding .git/.env/build artefacts),docker compose down && up -d --build, tail logs.docs/runbooks/deploy-fsdocker.md-- first-time and routine deploy procedure plus rollback + troubleshooting.
- Dedicated
AlexandriaEntra app created at tenant03c2517f-13ef-45ff-8cb9-e8b0043e4cb2. App id9d4c9c98-3ee3-4314-aab8-a2ce9da43100. Redirect URIhttps://alexandria.fullstack.co.za/signin-oidc, logout URLhttps://alexandria.fullstack.co.za/signout-callback-oidc, Microsoft GraphUser.Readpermission added. Per gc-azure-entra: no shared use of the Scorecards app. - Route 53 A record for
alexandria.fullstack.co.za->105.233.33.156(Fortigate WAN, matches the rest of the*.fullstack.co.zaestate). Hosted zoneZ8SBXEBV5VOB2, TTL 300. Confirmed propagated to Google + Cloudflare resolvers.
Changed
.gitignoreextended to allow.env.fsdocker.example(and any future*.env.exampletemplate) while still excluding actual.envfiles.BOK_IMMUTABLE_INPUT_LOG.mdappended with the FSDocker deployment session entry. Inputs include the AWS STS session credentials provided for the R53 update; values redacted.
Notes
This is the first production deployment target for Alexandria.
The Bicep templates in infra/bicep/ remain valid for any
future Azure Container Apps deployment but are no longer the
canonical production path. The §11 quarterly LanceDB rebuild
drill obligation transfers cleanly: the procedure in
docs/runbooks/lancedb-rebuild.md works against any compose
stack and just needs the alexandria-lancedb volume to be
deleted + the search-py container restarted to trigger the
rebuild from /corpus/.
[1.19.35] - 2026-04-28
Added
scripts/bump-version.py-- closes the gc-semver "(script to be added in infra sprint)" gap noted in CLAUDE.md. Reads VERSION, bumps the requested SemVer part (--patch|--minor|--major), writes VERSION, stages it (plus CHANGELOG.md if modified), commitschore(release): v<version>, and creates the matching annotated tag.--no-commitfor a dry write. README + CLAUDE.md updated to reference the script directly instead of the manual workflow.- Phase 3 of the Key Vault refactor -- the last v2-backlog
follow-up. Native
az.getSecret(...)references ininfra/bicep/staging.keyvault.bicepparamandinfra/bicep/production.keyvault.bicepparam; ARM resolves the secret values server-side at deploy time, so they never transit the local shell or a CI runner. Helper scriptscripts/deploy-with-keyvault-refs.sh <env>discovers the KV viaaz keyvault list, pre-flights that the four required secrets exist, runswhat-if, and prompts for confirmation before deploying. Runbook + v2-backlog closure record updated. - Validate-against-_unmatched dry-run preview UI -- the v3
feature originally deferred from v1. New
MatchPreviewServicewalks/corpus/_unmatched/, parses each file's frontmatter, and runsAlexandria.Core.Matching.PublicationMatcheragainst either the publication's stored rules or candidate rules from the unsaved edit form. Exposed via curator-scopedPOST /api/publications/{slug}/match-preview. NewPreview rescanbutton on/publications/{slug}showswould_match / total_unmatchedplus a sample of up to 10 matched messages with the rule that fired. Nothing is written -- the existing Rescan button still does the actual work. - 5 new
MatchPreviewServiceTestscover candidate-rules counting, empty-_unmatched, no-rules-no-candidate, sample cap, and malformed-frontmatter graceful skip.
Changed
infra/bicep/staging.keyvault.bicepparam+production.keyvault.bicepparamvalidated viaaz bicep build-params --stdout: both compile to ARM JSON with threekeyVaultreference parameters (entraClientId,postgresUser,postgresPassword).- README test-count line refreshed: 324 total (was 319), 247 .NET tests (was 242).
Notes
This release closes the GitHub Actions disable that landed earlier
in the session. The bump-version.py script is now the local
SemVer source of truth, since the CD workflows are dormant. Phase 3
of the KV refactor was originally tagged "future cleanup, NOT
required" -- shipping it now means the deploy story is fully
self-contained on the operator's machine, with no GHA runner in the
loop. The dry-run preview UI was originally tagged a v3 feature;
pulling it forward closes the last documented gap in the
v2-backlog file.
The v2-backlog file no longer contains a single live entry; it is now a pure closure record.
[1.19.34] - 2026-04-28
Changed
- README status block updated from the v1.18.x snapshot (last refreshed in iter 35) to the v1.19.x reality. Now mentions the v2-backlog closures: nightly Postgres backup, nightly corpus snapshot, monthly audit-log archive, full ingest_dlq writers (4/4), full custom telemetry emission (4/4), 24-hour RPO. Test count refreshed to 319 (was 299).
docs/v2-backlog.mdretitled to "(closed)" with a top-of-file note that the file is now a closure record. The originally-collected items remain as struck-through history; future work should add new entries to a fresh backlog file (or this one if its purpose evolves).
Notes
The v2 backlog set up by iter 55's consolidation (13 items) is fully discharged across iters 56-81 (26 iterations of closure work). 319/319 tests pass; mypy + ruff + dotnet format all clean. The Ralph loop has no shipped-code work remaining against any documented commitment.
[1.19.33] - 2026-04-28
Added
- Closes the Key Vault refactor v2 item (Phase 2).
cd-staging.yml+cd-production.ymlnow read the four secrets (entra-client-id, postgres-user, postgres-password, postgres-connection-string) from the application's Key Vault, falling back to the matching GHA Secret only if the KV value is empty.- New "Resolve Key Vault + read secrets" step uses
az keyvault list --resource-group ...to discover the KV name (since the bicep unique-string suffix isn't known statically), thenaz keyvault secret showper secret with aread_or_fallbackbash helper. Empty KV values fall through to${{ secrets.<ENV>_<NAME> }}so the legacy path stays alive during migration. - Each resolved value gets
::add-mask::so it's redacted in subsequent step logs. - Multi-line
GITHUB_OUTPUTwrites feed the values into the Deploy Bicep + Run migrations steps without changing their--parametersshape.
- New "Resolve Key Vault + read secrets" step uses
docs/runbooks/key-vault-refactor.mdupdated: Phase 2 section rewritten as "shipped" with the operator migration sequence and a Phase 3 (future, optional, not required) entry for the pure-Bicep-references variant.
Notes
The "true Bicep keyVault().getSecret(...) references" approach is
documented as Phase 3 / future cleanup. Today's shape -- workflow
reads + passes via --parameters -- is operationally complete: the
secret source-of-truth has moved to KV, rotations happen via
az keyvault secret set, and the next deploy picks up the new value
automatically.
This iteration closes the last v2 backlog item. All 13 v2 items that surfaced during the audit phase (iters 42-55) have shipped or been documented as complete. The v2-backlog.md file now contains only struck-through entries.
[1.19.32] - 2026-04-28
Added
- Phase 1 of the Key Vault refactor v2 item: a sync workflow that pushes
the production secrets from GitHub Actions Secrets into the application's
Key Vault.
.github/workflows/sync-secrets-to-keyvault.yml. Workflow_dispatch only (operators run it on environment bootstrap or after a GHA rotation). Resolves the KV name viaaz keyvault list, pushes 4 secrets:entra-client-id,postgres-user,postgres-password,postgres-connection-string. Skips empty GHA secrets with a WARN log.docs/runbooks/key-vault-refactor.md-- full refactor plan including the chicken-and-egg discussion (application KV vs. ops KV) and the Phase 2 checklist for flipping Bicep parameters tokeyVault().getSecret(...)references.docs/runbooks/github-actions-secrets.md-- "Why not use Azure Key Vault" section reframed as "Key Vault refactor (Phase 1 shipped)" pointing at the new runbook.docs/runbooks/README.md-- new entry for the refactor runbook.
Notes
This is the last v2 backlog item to receive shipped code. Phase 2 (the actual Bicep parameter migration) is documented in detail in the new runbook and remains the only piece of v2 work that hasn't landed in this loop's run.
[1.19.31] - 2026-04-28
Added
- Closes the curator-UI v2 item (5/5 affordances). Triage open-message
joins the four already-shipped ones.
Triage.razorrows are now clickable and navigate to/messages/{id}so curators can read each unmatched message body before deciding how to update match rules.- Switched the page's data source from
Api.ListUnmatchedAsync(/api/triage/unmatchedfilesystem walk, paths only, no message_id) toApi.ListMessagesAsync(Publication: "_unmatched")-- the indexer already routes unmatched files into LanceDB withpublication="_unmatched"perIngestPipeline.UnmatchedSlug, so every row in the new list has a realMessageId. - Visual shape mirrors
TopicDetail.razorandPublicationDetail.razor: one MudList, chip + date + subject, click-to-/messages/{id}. Three pages, one pattern -- curator hands learn it once. - The old
/api/triage/unmatchedendpoint stays in place (operator handbook still references the filesystem-walk shape and the API has no breaking changes).
Notes
This iteration plus iters 75-78 close the entire curator-UI v2 item. Five iterations, five affordances, one button-per-iteration cadence that worked because the typed-client pattern from iters 60-63 had established the plumbing for every later UI piece to ride on.
[1.19.30] - 2026-04-28
Added
- Curator-UI partial closure (4/5 affordances): Edit publication form
joins the trio of Add / Toggle Active / Rescan.
PublicationDetail.razoradds aMudExpansionPanel"Edit publication" below the metadata header. Same MudGrid form shape as/publications/new(10 fields covering metadata + 4 match-rule types) but with slug immutable -- the slug stays in the page header.- Form fields pre-populate from the current publication on page load.
Match-rule fields stay blank because
PublicationViewdoesn't currently round-trip them; entering values overwrites the existing rules. - Save calls
Api.UpdatePublicationAsync(added in v1.19.28) with a sparsePublicationUpdateRequest-- only fields the curator changed (or all fields if every text box is filled). Snackbar confirms + refreshes the in-memory_publicationfrom the API response. - No new API or sidecar work; rides on the v1.19.28 PUT endpoint that was already there for the Active toggle.
Notes
The Razor source generator hint from v1.19.27 carries over: don't put
regex examples inside MudTextField HelperText (it confuses the
generator's argument parsing). The Edit form's HelperText strings
stay simple; concrete examples live in curator-guide.md.
[1.19.29] - 2026-04-28
Added
- Curator-UI partial closure (3/5 affordances): Rescan unmatched joins
Add publication and Active toggle.
PublicationDetail.razoradds a "Rescan unmatched" button next to the publication header. Click triggers a sidecar rescan; after success the page re-fetches its message list so newly-classified messages appear.- New curator-scoped API proxy
POST /api/publications/rescan-unmatchedon Alexandria.Api. Distinct from/api/admin/sidecar/rebuildbecause rescan is curator-routine (run after match-rule edits) rather than admin-rare. The route uses the existingAlexandriaPolicies.Curator. ISearchSidecarClient.RescanUnmatchedAsync+SidecarRescanResponserecord (status + processed + skipped, mirroring the Pydantic shape).AlexandriaApiClient.RescanUnmatchedAsync+RescanResponsetyped record on the Web side.- Three new tests: round-trip JSON contract pin for
SidecarRescanResponse,Curator_can_trigger_rescan_unmatched, andReader_cannot_trigger_rescan_unmatched. 57/57 Api tests green (was 54).
Notes
v2 backlog count revised from 4 to 5 affordances after fold-out: Add, Active toggle, Rescan, Edit (pending), Triage open-message side-panel (pending). The remaining two follow the same single-button pattern.
[1.19.28] - 2026-04-28
Added
- Continuing curator-UI partial closure (2/4 affordances): Active toggle
joins Add publication.
PublicationDetail.razoradds a Deactivate / Reactivate button next to the publication header. Curator-only on the API side; non-curators get a 403 surfaced via Snackbar.AlexandriaApiClient.UpdatePublicationAsync+PublicationUpdateRequesttyped record. Sparse update (every field optional, defaults to null) so the toggle call isnew PublicationUpdateRequest(IsActive: false)without disturbing other fields.- On successful toggle,
_publicationis replaced with the API response so the chip + button update without re-fetching. curator-guide.mdcallout updated to list both shipped UI actions.
Notes
The pattern from v1.19.27 carries forward: typed-client method -> single-button addition on an existing Razor page -> Snackbar feedback -> backlog entry struck through. The remaining curator-UI work (Edit modal, Rescan button, Triage open-message side-panel) follows the same shape and will ship one button at a time.
[1.19.27] - 2026-04-28
Added
- Partial closure of curator-UI v2 item: Add publication form ships.
- New
/publications/newRazor page with a MudGrid form covering slug, name, publisher, homepage URL, description, category, tags, and the four match-rule types (one pattern per line). Posts to the existingPOST /api/publications/endpoint. AlexandriaApiClient.CreatePublicationAsync+PublicationCreateRequestPublicationMatchRulestyped records on the Web side.
MainLayout.razor"Curator -> Add publication" NavLink restored (iter 54 had removed it as a dead link; v1.19.27 makes it real).curator-guide.mdcallout updated: the UI is no longer described as fully read-only; "Add publication" gets a UI path alongside the API/CLI paths. Edit / Validate / Rescan / Toggle-active actions remain on the v2 backlog.
- New
Notes
The remaining curator UI work (Edit modal on the detail page, Rescan button, Active toggle, Triage open-message side-panel) is independently shippable in future iterations -- each one is small now that the typed client + form-input pattern from this iteration are in place.
[1.19.26] - 2026-04-28
Added
- Closes the audit-archiver v2 item (surfaced in iter 42 with the
audit-log runbook honesty fix). Automates the monthly POPIA archive
job that had been operator-run since the system shipped.
- New
.github/workflows/audit-archive.ymlruns at 03:00 UTC on the 1st of each month. Co-locates withcorpus-snapshot.yml(daily 01:00 UTC) andpostgres-backup.yml(daily 00:00 UTC) so all the nightly/monthly maintenance jobs are in one folder. - Uses
az containerapp execto runCOPY (...) TO STDOUT WITH (FORMAT csv, HEADER true) | gzipinside the Postgres container app and upload via the container's managed identity. Same data-doesn't- transit-the-runner shape aspostgres-backup.ymlfrom v1.19.25. - Blob layout:
alexandria-audit-archive/year=<YYYY>/month=<MM>/audit-<YYYY-MM>.csv.gz(Cool tier). - Archives as gzipped CSV (not Parquet) so DuckDB can read directly
via
read_csv_auto-- avoids thecsv2parquetdep the original runbook had been pointing at. delete_after_archiveworkflow_dispatch input lets an operator dry-run the archive (export + upload, skip the DELETE). Default on the schedule isyes.audit-log-archive.mdrunbook rewritten: "Status" section reflects automation; query examples useread_csv_autoinstead ofread_parquet.
- New
Notes
This iteration plus v1.19.24 + v1.19.25 close all three "operational automation" v2 items (audit archiver, corpus snapshot, pg_dump backup). All ship as scheduled GitHub Actions workflows rather than .NET hosted services -- the recurring shape was clear enough by the third one that following it was cheaper than building per-worker projects. Different choice would be justified if these jobs grew complex business logic; for now they're glue between Azure CLI and the existing data plane.
[1.19.25] - 2026-04-28
Added
- Closes the pg_dump backup v2 item (surfaced in iter 43 alongside the
corpus-snapshot DR gap).
- New
.github/workflows/postgres-backup.ymlscheduled workflow runs daily at 00:00 UTC. Usesaz containerapp execto dump+gzip+upload from inside the Postgres Container App so the database stays internal-only (no public ingress required). The container app's managed identity has Storage Blob Data Contributor on the storage account, so the dump never transits the GH Actions runner. - Blob layout:
alexandria-postgres-backups/<YYYY>/<MM>/alexandria-<YYYY-MM-DD>.sql.gz. - 30-day retention by default (overridable via workflow_dispatch); older dumps are pruned by the same workflow.
- DR runbook updated: Postgres restore status flips from Blocked to Working. The "Open work blocking the full RPO/RTO promise" intro section is reframed as "Backup sources (status)" since both blockers (corpus snapshot in v1.19.24, pg_dump in this release) are now closed. The "(no backup available)" fallback retitled as a deeper second-line scenario for rare account-level losses.
- Implemented as a scheduled GitHub Actions workflow rather than an
Alexandria.Backupshosted service; the workflow is simpler and co-locates the schedule withcorpus-snapshot.yml.
- New
Notes
RPO for Postgres data loss is now ~24 hours (was: total loss unrecoverable beyond what the corpus markdown could re-derive). This iteration plus iter 72 close all DR-related v2 backlog items.
[1.19.24] - 2026-04-28
Added
- Closes the corpus-snapshot v2 item (surfaced in iter 43 with the DR audit).
- Bicep adds
alexandria-corpus-snapshotsblob container alongside the existingalexandria-corpuscontainer. - New
.github/workflows/corpus-snapshot.ymlscheduled workflow runs daily at 01:00 UTC (before the 02:00 quarterly LanceDB rebuild drill window). Usesaz storage blob copy start-batchto copy the corpus container into a date-stamped prefix<YYYY>/<MM>/<DD>/in the snapshots container. The same workflow then prunes folders older than the configured retention window (default 14 days, overridable viaworkflow_dispatch). - New CD secret
PROD_STORAGE_ACCOUNTdocumented ingithub-actions-secrets.md. The Azure OIDC app needs Storage Blob Data Contributor on the production storage account. - DR runbook updated: status table entry for "Corpus snapshot" flips from Blocked to Working; "Corpus volume lost" section now describes the real restore-from-snapshot path (date-stamped prefix); the re-drain-from-POP3 fallback retitled to "beyond the snapshot retention window" since it now applies only when both the snapshots AND the live corpus are gone.
- Bicep adds
Notes
RPO for corpus loss is now ~24 hours (was: re-drain from POP3 with
30-90-day server retention, oldest content lost). The other DR-blocking
items (pg_dump backup) remain on the v2 backlog.
[1.19.23] - 2026-04-28
Added
- Closes the telemetry-emission v2 item (4/4 stages now emit). The Python
sidecar joins the three .NET workers as a custom-metric emitter.
apps/search-py/src/alexandria_search/metrics.py: small module that creates an OTel meter named "alexandria-search" and exposesregister_lancedb_row_count_gauge(get_count)to register an observable gauge that pollsIndexStore.message_count()on each OTel collection cycle. Best-effort: callback exceptions return an empty observation list rather than killing the meter callback.main.pylifespan handler callsregister_lancedb_row_count_gauge(store.message_count)and holds the returned instrument onapp.state.lancedb_gaugeso it isn't garbage-collected before lifespan exit.- Distinct meter name from .NET (
alexandria-searchvsAlexandria) so App Insights queries can separate the two runtimes' emissions. - 3 new tests in
test_metrics.pycovering registration smoke, no-throw with healthy callback, and no-throw at registration with a callback that would raise at collection time.
Notes
The admin-guide "Reading the dashboards" table now lists 7
customMetrics queries that all return real data. The iter-51 audit
gap (operator handbook + admin guide referenced telemetry that didn't
emit) is fully closed.
66/66 search-py tests pass (was 63, +3 new). 250/250 .NET tests still pass. mypy + ruff both clean.
[1.19.22] - 2026-04-28
Added
- Continuing telemetry-emission partial closure (3/4 stages): link-fetcher
joins parser and drainer as a custom-metrics emitter.
LinkFetcherService.ProcessOneAsyncemits two counters:alexandria.linkfetcher.fetch_attemptswith two tags:success=true|falsecached=true|false(true when the body was already on disk and the worker just refreshed the frontmatter)
alexandria.linkfetcher.bytes_fetched(cumulative size of link-body markdown written; useful for capacity planning of the_links/corpus subtree).
admin-guide.md"Reading the dashboards" updated with both metrics.v2-backlog.mdshows 3/4 progress with sidecar (Python) as the remaining stage.
Notes
Same shape as the v1.19.20 + v1.19.21 patterns -- one declaration plus
4 .Add(1, tags) calls in the existing branches. The shared
AlexandriaMeter continues to make each subsequent stage near-free.
250/250 .NET tests still pass.
[1.19.21] - 2026-04-28
Added
- Continuing telemetry-emission partial closure: drainer joins parser
as the second worker emitting custom metrics through
AlexandriaMeter.Pop3DrainerServiceemits two counters:alexandria.drainer.tickswithintake=<slug>andsuccess=true|falsetags. Realises thedrain.tickclaim that had been forward-looking in admin-guide since v0.x.alexandria.drainer.messages_drainedwithintake=<slug>tag. Cumulative per-intake throughput.
admin-guide.md"Reading the dashboards" updated; v2-backlog hint reduced to link-fetcher + sidecar.
Notes
Same shape as the v1.19.20 parser wiring -- one Counter<long>
declaration + a few .Add(1, tags) calls in the existing
success/failure paths. The shared AlexandriaMeter continues to make
each subsequent stage near-free.
250/250 .NET tests still pass.
[1.19.20] - 2026-04-28
Added
- Partial closure of telemetry-emission v2 item: parser stage now emits
custom metrics through OpenTelemetry to App Insights.
Alexandria.Core.Telemetry.AlexandriaMeter: sharedSystem.Diagnostics.Metrics.Metermirroring the existingAlexandriaActivitySourceshape. Workers createCounter<long>/Histogram<T>instances against this single meter so OpenTelemetry only has to subscribe to one name.TelemetryServiceCollectionExtensions.AddAlexandriaTelemetrynow registersmb.AddMeter(AlexandriaMeter.Name)so every worker that calls the helper auto-subscribes.EmlParserServiceemits two counters at end ofProcessOneAsync:alexandria.parser.files_processedwithmatched=true|falsetagalexandria.parser.files_failed
admin-guide.md"Reading the dashboards" table updated with the newcustomMetricsqueries; v2-backlog hint reduced to drainer + link-fetcher + sidecar heartbeat (which now have a clear pattern to copy fromEmlParserService).
Notes
Same partial-closure pattern as the ingest_dlq 4-iteration arc. The
shared AlexandriaMeter does the same job that
Alexandria.Core.Ingest.PostgresIngestDlq did for the DLQ work: one
piece of infrastructure, multiple workers ride on it. Each subsequent
stage is a one-line counter declaration + .Add(1) at the right point.
250/250 .NET tests still pass.
[1.19.19] - 2026-04-28
Added
- Closes the
ingest_dlqwriter wiring v2 item (4/4 stages now wired). This iteration ships the Python sidecar's index-stage writer.apps/search-py/src/alexandria_search/ingest_dlq.py: PythonIngestDlqProtocol +PostgresIngestDlq(asyncpg) +NullIngestDlqfallback +build()factory. Mirrors the .NET-sideAlexandria.Core.Ingest.IIngestDlqshape (stage, source_path, error, payload).Indexer.__init__accepts an optionalingest_dlqparameter (defaults toNullIngestDlqso existing tests + non-Postgres dev keep working).Indexer.rebuild's per-file failure handler now recordsstage="index"rows alongside the existinglog.errorline.main.pylifespan handler builds the IngestDlq from the newSettings.postgres_connection_string(env varPOSTGRES_CONNECTION_STRING) and passes it into the Indexer.pyproject.toml: addsasyncpg>=0.30,<0.32as a dep, and addsasyncpgto the[[tool.mypy.overrides]]block (no library stubs).
- 5 new tests in
tests/test_ingest_dlq.py:NullIngestDlq.recordreturns None and never raises.build(None)andbuild("")returnNullIngestDlq.build(connstr)returnsPostgresIngestDlq.PostgresIngestDlq.recordswallows asyncpg failures (doesn't raise).Indexer.rebuildcallsdlq.recordonce per failed file withstage="index"andpayload={"phase": "rebuild"}.
Notes
This is the fourth and final partial closure of the v2 ingest_dlq
item that started in v1.19.16. All four ingest-pipeline stages now write
to the same Postgres table; the operator-handbook step 3 query
(SELECT stage, count(*) FROM ingest_dlq WHERE resolved_at IS NULL ...)
is now a complete triage signal across drain / parse / enrich /
link-fetch / index.
63/63 search-py tests pass (was 58, +5 new). 250/250 .NET tests unchanged. mypy + ruff both clean.
[1.19.18] - 2026-04-28
Added
- Continued partial closure of
ingest_dlqwriter wiring (3/4 stages): link-fetcher joins parser and drainer.LinkFetcherService.ProcessOneAsyncper-fetch catch block recordsstage="link-fetch"rows.source_pathcarries the corpus message path; payload JSON includes the failing URL for triage (built viaJsonSerializer.Serializeso URL escaping is automatic).LinkFetcherOptions.PostgresConnectionStringadded; the link-fetcher has no other Postgres dependency, so the field defaults to empty string and falls through toNullIngestDlqlog-only behavior when Postgres isn't configured.- DI registration in
LinkFetcher/Program.csfollows the same one-line pattern as Parser and Drainer. LinkFetcherFullLoopTests.BuildServiceupdated to injectNullIngestDlqfor the new constructor parameter; 22/22 LinkFetcher tests still pass.
Notes
The shared PostgresIngestDlq impl from v1.19.17 paid off again --
this iteration's worker wiring was 1 line of options + 3 lines of DI +
6 lines of RecordAsync call. The remaining stage (Python sidecar
index) is the largest because it crosses the runtime boundary.
250/250 .NET tests still pass.
[1.19.17] - 2026-04-28
Added
- Continued partial closure of
ingest_dlqwriter wiring: drainer stage joins parser as the second wired stage.Pop3DrainerServiceper-intake catch block now recordsstage="drain"rows alongside the existing log + circuit-breaker logic. The source_path slot carriesintake:<slug>(drain failures aren't keyed to a specific file); payload includes the consecutive-failures count.
Changed
- Refactor:
PostgresIngestDlqmoved fromapps/Alexandria.Parser/intolibs/Alexandria.Core/Ingest/with aFunc<string?>connection-string resolver instead ofIOptions<ParserOptions>directly. Each worker now injects with its own options class via a single line:
Removes the per-worker copy that would have proliferated as more stages came online.services.AddSingleton<IIngestDlq>(sp => new PostgresIngestDlq( () => sp.GetRequiredService<IOptions<TWorkerOptions>>().Value.PostgresConnectionString, sp.GetRequiredService<ILogger<PostgresIngestDlq>>())); Alexandria.Core.csprojaddsNpgsql+Microsoft.Extensions.Logging.Abstractionspackage references (PostgresIngestDlq now lives there).Directory.Packages.propsadds the matching centralPackageVersionentry forMicrosoft.Extensions.Logging.Abstractions(was being pulled transitively, central pinning required the explicit declaration).operator-handbook.mdstep 3 query comment updated to reflect 2/4 stages wired.v2-backlog.mdingest_dlqentry shows 2/4 progress with the next suggested step (link-fetcher).
Notes
Same partial-closure pattern as v1.19.16: ship one stage at a time. The shared-impl refactor pays off immediately -- the drainer wiring is one DI line + one RecordAsync call. Future stages cost the same.
250/250 .NET tests still pass.
[1.19.16] - 2026-04-28
Added
- Partial closure of
ingest_dlqwriter wiring v2 item: parser stage now records failures into the V004ingest_dlqtable.Alexandria.Core.Ingest.IIngestDlqshared interface +NullIngestDlqno-op fallback (used in tests / when Postgres isn't configured).apps/Alexandria.Parser/PostgresIngestDlq.cs: best-effort writer that swallows its own failures so the calling exception handler isn't masked.EmlParserServiceexception handler now recordsstage="parse"rows alongside the existing structured log line.Program.csregistersPostgresIngestDlqas theIIngestDlqimpl.operator-handbook.mdstep 3 query is no longer a misleading no-op for parse failures: the query returns honest "parse" rows since this release. Drainer / link-fetch / index stages still need wiring -- the handbook calls those out.v2-backlog.mdentry retitled "(partial)" with the next-step hint (drainer's circuit-breaker exception handler).
Notes
This is the first partial v2 closure of the loop -- shipping one stage at a time keeps the iteration small and gives the operator-handbook query real value immediately rather than waiting for full coverage. Same pattern as iters 59-61 (browse-by-topic split into three iterations) but applied to a multi-stage worker change.
250/250 .NET tests still pass.
[1.19.15] - 2026-04-28
Added
- Closes filter-sidebar v2 item:
/searchnow exposes aMudExpansionPanel"Filters" below the query box with:- MudSelect multiselect for Publications (populated from
Api.ListPublicationsAsync, active publications only). - MudSelect multiselect for Topics (the same seeded taxonomy Topics.razor uses).
- MudDatePicker for Sent after (inclusive).
- MudDatePicker for Sent before (exclusive).
- MudSelect multiselect for Publications (populated from
- Selected filter values flow into the existing
SearchQuerytyped record thatAlexandriaApiClient.SearchAsyncalready accepted -- no API or client-library changes needed; everything was wired in v1.19.10 (inline operator parser) and earlier. reader-guide.md"Filtering results" reorganised: filter-sidebar section first, inline-operator section second, both now described as live.v2-backlog.mdfilter-sidebar entry struck through.
Notes
Pure-UI iteration -- no .NET, Python, CLI, or contract test changes. The pre-existing typed-client surface absorbed the new feature without ceremony, the same way browse-by-publication landed in iter 62 on top of the iter 60 plumbing.
[1.19.14] - 2026-04-28
Added
- Closes browse-by-publication v2 item:
Publications.razorrows are now clickable; clicking navigates to a new/publications/{slug}detail page showing the publication header (name, slug, publisher, description, tags, active state) above the most-recent 50 messages.PublicationDetail.razorreuses theApi.ListMessagesAsyncplumbing landed in v1.19.12 + the same MudList visual shape asTopicDetail.razorfrom v1.19.13. Two parallel API fetches (publications list for the header, messages list for the body) so the page renders in one round-trip.Publications.razoraddsOnRowClickand a hint caption directing users to click; the empty-state message updated to point at curator-guide.md instead of the v1.18.18-removed "Curator -> Add publication" UI.reader-guide.md"By publication" rewritten from "planned for an upcoming iteration" to live documentation.v2-backlog.mdbrowse-by-publication entry struck through.
Notes
This is the smallest possible v2 closure -- the entire iteration was a single new Razor page + 4 lines of changes to an existing page, because the API surface and typed-client method already existed from iters 60-61. The "prerequisite-first" rhythm pays off here: the Topics three-iteration arc left exactly the right plumbing for browse-by-publication to land in one.
[1.19.13] - 2026-04-28
Added
- Closes a v2 backlog item:
/topicsand/topics/{slug}browse-by-topic UI.Topics.razor: grid of 11 MudCard tiles for the seeded taxonomy (ai-and-llms,regulation,popia,cybersecurity,cloud-infra,data-eng,developer-tools,finance-economics,tax-and-sars,product-strategy,industry-news). Each tile is a Material icon + display name + slug. Click navigates to the topic detail page.TopicDetail.razor: Lists the most-recent 50 messages classified into the topic viaApi.ListMessagesAsync(new MessageListQuery(Topic: slug, Limit: 50)). Click any message →/messages/{id}detail.AlexandriaApiClient.ListMessagesAsync+MessageListQuery/MessageListResponse/MessageReftyped records consuming the v1.19.12/api/messagesroute.MainLayout: new "Topics" NavLink between Publications and Saved searches.reader-guide.md"By topic" rewritten from "Coming soon" to live docs.v2-backlog.mdbrowse-by-topic entry struck through; browse-by-publication entry refined with a one-liner pointing at the now-availableApi.ListMessagesAsync(Publication: slug, ...)plumbing.
Notes
This is the first three-iteration v2 closure of the loop:
- v1.19.11 (iter 59): sidecar topic filter
- v1.19.12 (iter 60): API listing proxy + typed-client + JSON contract pin
- v1.19.13 (iter 61): Web UI consuming the typed client
Each layer was small, well-tested, and bisectable. The pattern works well for v2 items that span multiple subsystems.
[1.19.12] - 2026-04-28
Added
GET /api/messageslisting route on Alexandria.Api: filter bypublication,topic,subject_like,after,before, with paging (limit,offset). Proxies to the sidecar's existing/messagesroute (extended withtopicfilter in v1.19.11). Distinct from/api/search-- this is metadata enumeration with no ranking; for ranked retrieval use/api/search.ISearchSidecarClient.ListMessagesAsync+SidecarMessageListQuery/SidecarMessageList/SidecarMessageRefrecords.- Two new tests:
SidecarJsonRoundTripTests.SidecarMessageList_deserializes_pydantic_snake_case_payloadpins the wire format (message_id,external_id,sent_at) so future Pydantic-side changes fail fast on the .NET side. 5/5 round-trip cases.Reader_can_list_messages_filtered_by_topicexercises the route through the auth pipeline. 54/54 Api tests green (was 52).
Notes
This is the second prerequisite-first iteration in a row. Iter 59 added the topic filter on the sidecar's GET /messages; iter 60 adds the .NET proxy + typed-client + tests. Iter 61 can ship the Topics UI consuming the typed client without further plumbing.
[1.19.11] - 2026-04-28
Added
- Sidecar
GET /messagesaccepts atopic=<slug>query parameter (e.g.topic=ai-and-llms). Implementation is a Python-side post-filter rather than a LanceDB array-contains SQL clause -- LanceDB's array filter syntax shifts across minor versions and the row count is bounded by the in-process 5000-row pull. Prerequisite work for the v2 browse-by-topic UI; the route is also useful today via the CLI for curators wanting to enumerate messages within a classifier topic. alexandria messages list-ids --topic <slug>: matching CLI flag threaded through to the new sidecar parameter.- 1 new pytest case (
test_messages_enumeration_filters_by_topic) covering the new behavior end-to-end through the FastAPI TestClient. - 1 CLI smoke test assertion that
--topicis documented inmessages list-ids --help.
[1.19.10] - 2026-04-28
Added
- Closes third v2 backlog item: inline search operator parsing.
AlexandriaApiClient.ApplyInlineOperatorsextractspub:,topic:,domain:,after:,before:tokens fromSearchQuery.Q, merges them with any explicitly-set typed parameters, and leaves the residual free-text inQ. The sidecar contract is unchanged -- everything flows through the existing[FromQuery] string[] pub/topic/domainand[FromQuery] DateTimeOffset? after/beforeparameters on/api/search.- Date values accept both ISO-8601 (
2026-04-01T00:00:00Z) and date-only form (2026-04-01), date-only being treated as midnight UTC. - Edge-case guard: pure-operator queries (
pub:stratecherywith no free-text) fall back to the originalqso the request doesn't hit the API's "q required" 400. The filter-sidebar v2 work covers the empty-text-with-filters case properly. SearchQueryrecord extended withDomainsfield (was missing -- API already accepteddomainquery parameters but the typed client had no way to set them).reader-guide.md"Filtering results" rewritten with two sections (inline operators in the search box; direct API query parameters for scripts), each with examples. The v2-backlog hint is reduced to "a filter sidebar UI for users who don't want to learn the operator syntax".
Notes
This is the third v2 closure in three iterations (intakes.notes -> "More like this" -> inline operators). All three had the property that the underlying capability was already in place; only the user-facing surface was missing. The remaining v2 items have larger surface areas (worker projects, full UI builds).
[1.19.9] - 2026-04-28
Added
- Closes second v2 backlog item ("More like this" on message detail,
surfaced in reader-guide.md). The API surface already existed
(
/api/messages/{id}/similar-> sidecar/similar/{messageId}); the missing piece was the UI affordance.AlexandriaApiClient.SimilarAsync(string messageId, ...): new typed-client method posting to the existing API route.MessageDetail.razor: MudButton "More like this" below the rendered body. On click, fetches similar messages and renders them in the same MudList shape used by/search. The seed message itself is filtered out of the results.reader-guide.md"More like this" section updated from "Coming in a near-term iteration" to live documentation.v2-backlog.mdentry struck through with the release pointer.
Notes
This is the third "More like this"-style affordance pattern in the codebase
(after /api/messages/{id}/similar on the API side and the sidecar's
/similar/{messageId} route). All three layers were already wired before
this release; only the user-visible button was missing.
[1.19.8] - 2026-04-28
Added
- Closes one v2 backlog item (
intakes.notescolumn, surfaced in iter 45):db/migrations/Scripts/V009__intake_notes.sql: adds nullablenotes textcolumn tointakesso subscription intent travels with the schema.alexandria intakes add --notes ...: optional flag at provisioning time.alexandria intakes set-notes <slug> --notes ...: edit-after-the-fact. Pass empty string to clear.alexandria intakes list --notes: appends the column on a second line per row when set.- 3 new CLI smoke tests (41/41 pass; was 38).
docs/runbooks/add-intake.mdstep 7 now points at the CLI workflow rather than calling out the future column.docs/v2-backlog.md"Schema additions" section now reads as "all previously-listed schema items have shipped" with a strike-through entry pointing at this release.
Notes
This is the first iteration of the loop to close a v2 backlog item rather than add to it. The pattern: the simplest backlog item (one column + two CLI subcommands + 3 tests) is the right target when audit work itself has completed.
[1.19.7] - 2026-04-28
Added
docs/v2-backlog.mdconsolidates the 11 v2-backlog items that surfaced across iterations 42-54 of this maintenance loop. Each entry links to the originating doc, the iteration that surfaced it, the user-visible cost of leaving it as-is, and a suggested shape for the work. Categories:- Operational automation (3):
Alexandria.AuditArchiver,Alexandria.Backupspg_dump worker, corpus snapshot job - Schema additions (1):
intakes.notescolumn - Pipeline observability (2):
ingest_dlqwriter wiring,Alexandria.Telemetrycustom metrics + events - Web UI work (4): filter sidebar, inline operator parsing, curator UI actions, browse-by-publication / browse-by-topic, "More like this"
- Infrastructure (1): Key Vault references in Bicep parameters
- Operational automation (3):
- Doc-route audit: every
/search,/publications,/saved,/me,/me/pats,/triage,/messages/...reference in the docs maps to a real@pagedirective inapps/Alexandria.Web/Pages/. No additional drift found beyond iter 54's/publications/newremoval.
[1.19.6] - 2026-04-28
Fixed
apps/Alexandria.Web/Shared/MainLayout.razorhad a curator NavLink to/publications/new-- but no Razor component is registered at that route. Curators clicking the menu item would have hit Blazor's 404 page. Iter 52 had already established that the curator UI is read-only and adds happen via CLI/API; removed the dead NavLink with an inline comment pointing to the curator-guide for the actual workflow.user_manual/getting-started.mdstep 4 said "/search supports filters for publication, topic, and date range" -- iter 53 just established that the web UI search page has no filter sidebar (filtering is via API query parameters today). Same wording in two places; fix the second one to point at the reader-guide's accurate description.
[1.19.5] - 2026-04-28
Fixed
user_manual/reader-guide.md"Operators in the query" claimed Alexandria parsed inline operators likepub:stratechery topic:ai-and-llms after:2026-01-01in the search box. No code parses these. SearchEndpoints.cs takespub,topic,domain,after,beforeas separate query parameters; the Web UI's Search.razor doesn't even pass those parameters from the text input. A user typingpopia pub:sars-bulletinwould have all of that text shipped through to the sidecar as a single query string with no filter applied -- and they'd never know. Replaced the operators table with the actual query-parameter syntax and a curl example, plus a v2-backlog note for the filter sidebar UI and inline-operator parsing.
[1.19.4] - 2026-04-28
Fixed
Largest doc/code drift in this loop:
user_manual/curator-guide.mddescribed an extensive UI-based curator workflow (Add publication form with Save button, Edit publication, Validate-against-_unmatched dry-run UI, Rescan-unmatched button on a publication detail page, Toggle-active toggle, Open-message link in triage). None of these exist. The actual UI today (apps/Alexandria.Web/Pages/Publications.razor,Triage.razor) is read-only -- a sortable table with no actions. All curator actions run through the operator CLI (tools/Alexandria.Cli) or the HTTP API directly.Rewrote the curator-guide to describe the supported CLI + API workflow:
- "Adding a publication" via
POST /api/publications/ - "Editing a publication" via
PUT /api/publications/{slug} - "Deactivating a publication" via
alexandria publications deactivate - "Reclassifying already-classified messages" via
messages list-ids|publications reclassify-ids(the v1.17+/v1.18+ workflow that wasn't even mentioned in the curator-guide before). - Added an "About the UI today" callout at the top so curators set expectations correctly, with a v2 backlog note for moving actions into the UI.
This is the same claim/reality drift pattern as iters 42-43 (DR plan, audit archive runbook) but at a much larger surface: a curator expecting to follow the old guide would have hit dead UI elements and had no path forward without learning the CLI from elsewhere.
- "Adding a publication" via
[1.19.3] - 2026-04-28
Fixed
- Same drift class as iter 50 (ingest_dlq): App Insights custom
telemetry signals referenced in docs don't actually emit. No
TrackEventorTrackMetriccalls anywhere in the codebase -- the OpenTelemetry auto-instrumentation only captures HTTP request shape. Affected docs:user_manual/admin-guide.md"Reading the dashboards" table listedcustomMetrics:alexandria.indexer.files_processed,customMetrics:alexandria.lancedb.row_count,customEvents:drain.tick-- none of which exist. Replaced with queries that use the auto-instrumentedrequests/dependenciesdata, plus a v2-backlog note for the custom emission.docs/runbooks/lancedb-rebuild.mdstep 7 told operators to "Watch progress in Application Insights (custom metricalexandria.indexer.files_processed)". Rewrote to point at the structured-log events that DO emit (index.rebuild.started/index.rebuild.completed).
docs/runbooks/lancedb-rebuild.mdstep 6 still used thehttps://search.alexandria.fullstack.co.za/index/rebuildURL the iter 48 admin-guide fix removed (the sidecar is internal-only). Updated to use the new/api/admin/sidecar/rebuildproxy added in v1.19.0.
[1.19.2] - 2026-04-28
Fixed
docs/operator-handbook.mdstep 3 told the on-call to runSELECT stage, count(*) FROM ingest_dlq WHERE resolved_at IS NULL ...during a "search returns empty" incident -- but no worker actually writes toingest_dlq. The table is provisioned by V004 but the drainer / parser / enricher / indexer don't yet emit failure rows into it. An on-call running the query during a real incident would see zero rows and falsely conclude the indexer is healthy. Replaced with the honest fallback (App Insights exceptions + drainer circuit-breaker status) and called out theingest_dlqwriter wiring as v2 backlog work.
[1.19.1] - 2026-04-28
Added
- Two integration tests for the v1.19.0
/api/admin/sidecar/rebuildroute:Admin_can_trigger_sidecar_rebuild(happy path round-trip via the auth pipeline) andCurator_cannot_trigger_rebuild(auth gate confirmation). Mirrors the existingAdmin_can_view_sidecar_health/Curator_cannot_view_admin_endpointspair so admin routes are consistently exercised at the HTTP boundary.
[1.19.0] - 2026-04-28
Added
POST /api/admin/sidecar/rebuild-- admin-only proxy to the sidecar's internal-only/index/rebuildroute. The admin-guide claimedhttps://search.alexandria.fullstack.co.za/index/rebuildworked but the search-app hasingressExternal: false(verified in main.bicep), so that hostname doesn't exist. The new admin proxy +az containerapp execfallback are now both documented as the legitimate manual-rebuild paths.ISearchSidecarClient.RebuildAsync+SidecarRebuildResponserecord.StubSearchSidecarClientupdated to fulfil the contract for tests.SidecarJsonRoundTripTestsextended with a round-trip case for the new response shape so future drift fails CI.
Fixed
user_manual/admin-guide.md"Forcing a re-classification" replaced the legacy SQL+move-file+rescan procedure with the modern non-destructivereclassify-idsworkflow added in v1.17.0 + v1.18.0. The legacy approach is kept as a secondary option for unindexed files only.user_manual/curator-guide.md"When to escalate to admin" said admin "runs the SQL" for re-classification; now points at the CLI workflow.docs/operator-handbook.mdwas imprecise about API addressability after iter 44's fix. The API has external ingress (ingressExternal: truein Bicep) but no friendly hostname; reachable at the auto-generated Container Apps URL. Updated the handbook to say so accurately.
Notes
This is a minor bump (.0) because a new HTTP route was added; v1.19.x will continue the post-feature-complete hardening pattern from v1.18.x.
[1.18.21] - 2026-04-28
Added
docs/runbooks/github-actions-secrets.md-- new "How to rotate the Entra client id" section, distinct from rotating the Entra secret (which already has its own runbook). Calls out the redirect-URI verification step that the existing scripts already cover.- "What goes wrong if a secret is missing" now lists the
<env>_ENTRA_CLIENT_IDfailure mode (would surface "missing required parameter: entraClientId" via thebicep-paramsCI gate added in v1.18.14 + extended in v1.18.15).
Fixed
- Corrected an attribution: postgres-secrets discovery was v1.18.6 (iter 32), not v1.18.5 (iter 31, which fixed only the workflow shape).
[1.18.20] - 2026-04-28
Fixed
docs/runbooks/branch-protection.mdlisted only the four CI status checks that existed at the runbook's v1.7.0 authoring (dotnet,python-sidecar,secrets-scan,version-gate). Iter 39 addeddoc-stalenessand iter 40 addedbicep-params, but the branch-protection rules were never updated to require them. Anyone reapplying the rules from this runbook would have ended up with a weaker protection set than the repo currently relies on. Updated the prose AND thegh api -X PUT ... contextsJSON.- Refined the
dotnetline to reflect what that job actually does today (build + format + test, not just "build + test"), andpython-sidecarto mention all three gates (ruff + mypy + pytest), not just lint + test.
[1.18.19] - 2026-04-28
Fixed
docs/runbooks/add-intake.mdstep 7 told operators to "Document the list of subscriptions indocs/intakes/<slug>.md" -- butdocs/intakes/doesn't exist and was never created. Replaced with an honest note that subscription intent currently lives outside Alexandria (curator's wiki / spreadsheet), with a v2-backlog hint about adding anintakes.notescolumn.docs/runbooks/add-publication.md"Backfill notes" mentioned only the legacy "move the .md file to _unmatched and rescan" technique. Surfaced the modernmessages list-ids | publications reclassify-idsworkflow added in v1.17.0 (with--dry-runfrom v1.18.0) as the primary recommendation. The legacy technique stays as a secondary option for files that haven't been indexed yet.
[1.18.18] - 2026-04-28
Fixed
docs/operator-handbook.mdtriage section had two errors that would trip up the on-call:- Section 1 expected
service: "Alexandria.Api"fromhttps://alexandria.fullstack.co.za/healthbut that hostname maps to Alexandria.Web, which returnsservice: "Alexandria.Web". The API isn't externally addressable in this deployment shape. - Section 2 expected
lancedbReady/embedderReady(camelCase) from the sidecar/health-- but Pydantic emits snake_case (lancedb_ready/embedder_ready) and always has. The keys were written as if they came from the .NET API, where the v1.18.2 wire-format fix would have shown up if anyone had checked. Fixed both keys and added a one-liner explaining the contract pin.
- Section 1 expected
docs/runbooks/rotate-entra-secret.mdstep 1 suggestedpwsh ./infra/entra/create-app.ps1 -ExistingAppId <APP ID>but the script has no such parameter. Anyone running it would get a script error during a rotation. Removed the dead invocation; left only theaz ad app credential resetform (which is correct).
[1.18.17] - 2026-04-28
Fixed
- Operationally critical:
docs/runbooks/disaster-recovery.mdclaimed "nightly pg_dump" andalexandria-corpus-snapshots/as restore sources, but neither exists. Thealexandria-postgres-backupsblob container is provisioned by Bicep, but no scheduled job writes to it. Thealexandria-corpus-snapshotscontainer isn't even in Bicep. In an actual disaster, an operator following the runbook would have hit a 404 / empty container at the worst possible moment.- Added an "Open work blocking the full RPO/RTO promise" table at the top that lists each restore source with its current status.
- Renamed the existing sections "(target procedure once backups exist)" so future operators don't follow them assuming the infrastructure is in place.
- Added two new sections describing today's fallback procedures: "Postgres database lost (no backup available)" and "Corpus volume lost (no snapshot available)" -- both honest about what data is unrecoverable.
[1.18.16] - 2026-04-28
Fixed
docs/runbooks/audit-log-archive.mdclaimed "Monthly job (automated)" but no scheduled job actually exists -- the bicep provisions the destination blob container, but the archive script is operator-run today. Honesty fix: retitled to "Monthly job (operator-run)", added a Status section calling out the v2-backlog automation work to convert the script into either anAlexandria.AuditArchiverworker (Digests pattern) or a scheduled GitHub Actions workflow.
[1.18.15] - 2026-04-28
Fixed
- Both
infra/bicep/staging.parameters.jsonandproduction.parameters.jsoncontained"entraClientId": { "value": "REPLACE_WITH_ENTRA_CLIENT_ID" }. This is exactly the iter-32 pattern but in placeholder form: thebicep-paramsCI gate from v1.18.14 only checked that values were provided, not that they were real. As-is, deploy would have succeeded but Entra auth would have failed at runtime against the literal placeholder. MovedentraClientIdout of both parameters files and into CD workflow--parametersfrom secrets (STAGING_ENTRA_CLIENT_ID,PROD_ENTRA_CLIENT_ID), matching the postgres- credentials pattern. - gc-azure-entra requires distinct app registrations per environment; the placeholder strings made it impossible to distinguish staging from production. The new secret names enforce the separation.
Added
scripts/check-bicep-params.shextended to flag placeholder values (REPLACE_*,CHANGE_ME,change-me,TODO,TBD,your-*,fill-in-*) in any parameters JSON file. The CI bicep-params gate now blocks both "missing provider" and "placeholder provider" cases.docs/runbooks/github-actions-secrets.mdlists the two new Entra secrets.
[1.18.14] - 2026-04-28
Added
- New
scripts/check-bicep-params.sh: cross-checks every required (no-default) parameter ininfra/bicep/main.bicepagainst the parameters JSON files and the--parameters X=...arguments in the CD workflows. Fails when a required param has no provider. The structural fix for the iter 32 drift pattern whereparam postgresUser/param postgresPasswordwere declared but never provided, soaz deployment group createwould have failed on first run. - New CI job
bicep-paramsinvokes the script. Pure bash, no Azure CLI dependency — runs in seconds.
[1.18.13] - 2026-04-28
Added
- New CI job
doc-staleness: scans every README and other shipped doc for "Scaffolding only", "Build sprint pending", "coming in the next sprint" markers, and fails the build when any are found. This is the structural fix for the drift pattern caught in iter 29 (root README) and iter 38 (4 worker READMEs): docs that were stuck at "v0.1.0 scaffolding" status 17 versions later. The exclusion list coversdocs/adr/_template(templates can have placeholders),CHANGELOG.md(historical entries about scaffolding releases are valid), anduser_manual/(intentional interim state until Playwright auto-capture lands).
[1.18.12] - 2026-04-28
Changed
- 4 worker / app READMEs updated from "Scaffolding only" / "Build sprint pending"
to reflect actual implementation state. Same pattern as iter 29 (root README
was stale at v0.1.0): each subsystem's own README had drifted on its own copy
of the status text. Updated:
apps/Alexandria.Web/README.md: lists actual page files and how the BFF forwards Entra tokens.apps/Alexandria.Api/README.md: replaced "planned endpoints" list with the actualEndpoints/*.csmapping; mentioned the v1.18.2 round-trip test pin.apps/Alexandria.Drainer/README.md: notes the docker-compose default profile- fake-pop3 fixture wiring.
apps/Alexandria.Parser/README.md: removed "TBD" note about link fetcher (split into Alexandria.LinkFetcher in v1.8.0); added v1.16.0 override consultation note.
[1.18.11] - 2026-04-28
Added
- CI now runs
dotnet format alexandria.sln --verify-no-changesbetween build and test, matching theruff checkpattern on the Python side. Future format drift fails fast in CI rather than accumulating quietly.
Fixed
libs/Alexandria.Core.Tests/Fixtures/GoldenEmlTests.cs: 3[InlineData]rows had aligned whitespace (multi-space-separated columns) thatdotnet formatrejects. Reformatted to single-space.
[1.18.10] - 2026-04-28
Fixed
- Critical CI gap: search-py's
mypy srcstep (one of three Python CI gates, alongside ruff and pytest) was failing with 8 errors but no PR had been blocked by it because the local-dev install path skipped mypy and the gate was only reached on CI runs that landed during low-noise periods. Result: every recent PR was technically failing the type-check gate. Fixed:- Added
[[tool.mypy.overrides]]forfrontmatter,pyarrow,lancedb(third-party packages with nopy.typedmarker, no library stubs available). ollama_client._cosine: cast result tofloat(sum() can return Any when operating on a generator over float multiplications).index_store.message_count: castcount_rows()toint.main._configure_logging: explicitBoundLoggerannotation on the local so the return type isn't inferred asAny.main._lifespan: addedAsyncIterator[None]return annotation that strict mypy was demanding for an async context manager.
- Added
[1.18.9] - 2026-04-28
Changed
README.md: status block updated from "v1.18.2" snapshot to "v1.18.x" with the full test totals across both runtimes (299 tests). Local-dev section now accurately describes what compose starts (after iter 33's drainer+parser inclusion) and what stays out (Web/Api on host for hot-reload). Added a "Common dev tasks" cheatsheet covering the most-used build/test/CLI commands so newcomers don't have to hunt through runbooks for the first 30 minutes.
[1.18.8] - 2026-04-28
Added
.env.example: documentedSENDGRID_API_KEYandALEXANDRIA_PORTAL_URLwhich the Digests services read at startup. Without these in the example, a dev cloning the repo would not know the digest workers expected them (the workers fall back gracefully when SendGrid is empty, but the deep-link URL in digest emails would default to localhost without the portal env var).
[1.18.7] - 2026-04-28
Added
docker-compose.yml: Drainer and Parser services were missing despite the README claimingdocker compose upruns "Postgres, Python sidecar, fake POP3 server, drainer, parser". Added them as default (non-profiled) services so the local end-to-end loop (POP3 → drain → parse → index) actually runs out of the box. LinkFetcher remains under theworkersprofile (intentional — it's an optional augmentation, not a critical path).
[1.18.6] - 2026-04-28
Fixed
- Critical CD bugs: Bicep
main.bicepdeclaresparam postgresUserandparam postgresPassword(required, no defaults), but neither parameter file nor either CD workflow provided them.az deployment group createwould have failed with "missing required parameter". Bothcd-staging.ymlandcd-production.ymlnow pass them via secrets. cd-production.ymlwas missing asetup-dotnet@v4step before its migrations run (same bug as v1.18.5 fixed in cd-staging).
Added
docs/runbooks/github-actions-secrets.md— reference for which secrets the deploy pipelines need, why there are three secrets per environment for Postgres (separate Bicep params + a connection string for migrations), and how to rotate.docs/runbooks/README.md: turned the "Planned runbooks" list into linked references to the actual files (every listed runbook already exists).
[1.18.5] - 2026-04-28
Fixed
- Critical CD bug:
cd-staging.ymlwas passing--build-arg PUBLISH_DIR=...toDockerfile.dotnet, but the Dockerfile expectedPROJECT=.... The docker build would have rundotnet publishwith an empty PROJECT and failed. The bug was invisible because the staging deploy hadn't fired since these workflows landed. Local development was unaffected because docker-compose.yml passesPROJECTcorrectly. Dockerfile.dotnetwas usingmcr.microsoft.com/dotnet/sdk:9.0for the build stage even though the project targetsnet8.0. Now pinned tosdk:8.0to match iteration 30's CI fix and avoid SDK version drift.cd-staging.yml deployjob ran migrations withdotnet runbut never set up the .NET SDK — would have failed at runtime. Added asetup-dotnet@v4step.
Changed
cd-staging.yml build-imagesjob: removed the redundant host-sidedotnet publishsteps now that the Dockerfile build stage handles it. Single SDK pin in the Dockerfile is the source of truth; the workflow no longer duplicates it.
[1.18.4] - 2026-04-28
Fixed
- CI workflows pinned to .NET 8.0.x to match the project's
<TargetFramework>net8.0</TargetFramework>. Previously installed .NET 9.0.x SDK; the SDK is forward-compatible so builds worked, but the drift was a hygiene risk for anyone reading the workflows to deduce the target framework. Affected:.github/workflows/ci.yml,cd-staging.yml,docs.yml.
Changed
- Search-py: ran
ruff check . --fixacross the suite. Auto-fixed 40 issues (mostlyfrom datetime import UTCmodernisation and removal of unused fixture variables) and hand-fixed the remaining 5: a duplicate compound assertion intest_cli_contract.py, unused locals intest_chunker.py/test_rescan_unmatched.py, and a function name with uppercase_ANDthat violated PEP 8. 57/57 pytest tests still pass.
[1.18.3] - 2026-04-28
Added
CHANGELOG.mdcapturing v0.1.0 → v1.18.3 history. The repo had 18 tagged releases but no changelog, forcing operators to read commit messages.
Changed
README.mdStatus section: reset from "v0.1.0 — pre-build scaffolding. No working code yet." (out-of-date by 17 versions) to reflect the v1.18.2 reality (49/49 Api, 57/57 search-py, 38/38 CLI tests passing; all 19 ADR decisions implemented).
[1.18.2] - 2026-04-28
Fixed
- Critical:
SearchSidecarClient(Alexandria.Api) usedJsonSerializerDefaults.Web(camelCase), but the Python sidecar emits snake_case. Every field whose JSON key contained an underscore silently deserialized to its default value: search hits had emptymessageId/subject/sentAt,/api/healthreportedlancedbReady=falseeven when LanceDB was healthy, andSentAfter/SentBeforefilters were serialized assentAfter/sentBefore(which Pydantic ignored), so date-range filtering was a no-op. Fix:PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower.
Added
SidecarJsonRoundTripTests: feeds actual sidecar JSON shapes intoJsonSerializer.Deserializeand asserts every field populates. Pins the wire format so future drift fails CI.
[1.18.1] - 2026-04-28
Added
- CLI ↔ sidecar contract tests covering all five routes the operator CLI calls (
GET /messages,GET /health,POST /index/upsert,POST /index/rebuild,POST /index/rescan-unmatched). Catches "ghost commit" failures (where a commit message claims a code change that did not actually land) by asserting the route exists, the parameters are accepted, and the response shape carries the keys the CLI binds to.
Fixed
test_walk_skips_raw_subtree: brittle'_raw' not in path.as_posix()check matched the pytest temp dir name. Now asserts on path relative tocorpus_root.test_search_stub_responds: createdTestClientwithoutwith, so the FastAPI lifespan did not run andapp.state.searcherwas never set. Now uses TestClient as a context manager and stubs the searcher.test_apply_filters_date_range_is_inclusive_at_lower_bound: assertion contradicted both the test name and the implementation's actual behavior (boundary row IS kept).
[1.18.0] - 2026-04-28
Added
- GET /messages enumeration endpoint on the sidecar (the iteration 25 commit that claimed
this had silently failed; the CLI's
messages list-idscommand was calling a route that did not exist). Filters bypublication,subject_like,sent_after,sent_before; pages withlimit/offset; sorts bysent_at desc. external_idcolumn on the LanceDBmessagestable (nullable for legacy rows). The indexer populates it fromCorpusMessage.external_id; the new/messagesroute reads it from LanceDB on the hot path, falling back to a frontmatter disk read for rows persisted before v1.17.0. Avoids a forced quarterly-drill rebuild.--dry-runflag foralexandria publications reclassify-ids. Previews insert/update/no-op counts plus a sample of affectedexternal_ids by joining against the existingmessage_publication_overridestable.
Fixed
- CLI
MessagesCommanddeserialization:JsonSerializerDefaults.Webproduced camelCase, but the Pydantic sidecar emits snake_case. SettingPropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLowermakes the C# records bind correctly.
[1.17.0] - 2026-04-27
Added
- Operator CLI:
messages list-ids,publications reclassify-ids. Curator workflow for reclassifying a specific list of messages by RFC 2822 Message-ID. Pipemessages list-idsoutput directly intopublications reclassify-ids. (NB: the GET /messages route this depends on did not actually ship in v1.17.0; v1.18.0 repaired the gap.)
[1.16.0] - 2026-04-27
Added
- Parser consults
message_publication_overrides(V008) before falling back topublications.match_rules. Manual curator decisions win over auto-classification.
[1.15.0] - 2026-04-27
Added
- V008 migration: first-class
message_publication_overridestable. - Sidecar
/index/rescan-unmatched: walks/corpus/_unmatched/and forces a re-pass through the indexer. CLI'spublications rescan <slug>calls this.
[1.14.0] - 2026-04-27
Added
- Operator CLI:
publications rescan,publications reclassify(subject-pattern scope),intakes add,intakes deactivate.
[1.13.0] - 2026-04-26
Added
- Operator handbook (
docs/runbooks/operator-handbook.md). - PAT pepper-binding regression test.
- CLI smoke-test harness.
[1.12.0] - 2026-04-26
Added
alexandriaoperator CLI tool (tools/Alexandria.Cli) with theintakes,publications,messages,pats,sidecar,auditsubcommands.
Fixed
- "Captured options" anti-pattern:
IOptions<T>values read synchronously inProgram.csbeforeapp.Build()get stale snapshots. Test factories adding config inConfigureAppConfigurationweren't being applied. Now everything reads viaIOptions<T>inside factory delegates.
[1.11.0] - 2026-04-26
Added
- PAT bearer end-to-end tests.
- Workers (parser, link-fetcher) in
docker-compose.yml. - Sidecar negative-path tests.
[1.10.0] - 2026-04-26
Added
- Link-fetcher full-loop test.
- Sidecar contract tests (structural).
- Security policy doc (
SECURITY.md).
[1.9.0] - 2026-04-26
Added
- Link-fetcher in Bicep infrastructure.
- LanceDB e2e tests.
- Brand DOCX template for client deliverables.
[1.8.0] - 2026-04-26
Added
- Link-fetcher worker service (out-of-band content fetching, all domains).
- Playwright spec for the screenshot path.
[1.7.0] - 2026-04-26
Added
- User manual (
user_manual/in DOCX/MD/PDF, branded). CODEOWNERS, branch protection runbook, PR template.
[1.6.0] - 2026-04-26
Added
- Project glossary (
docs/glossary.md). - Credential rotation log.
- Link-pipeline + image-enricher HTTP tests.
[1.5.0] - 2026-04-26
Added
- Playwright tests for new Blazor pages.
- BenchmarkDotNet harness for parser perf.
- DocFX API reference site.
[1.4.0] - 2026-04-26
Added
- API auth-scheme tests.
- Rate-limit exhaustion test.
- Authenticated happy-path tests.
[1.3.0] - 2026-04-26
Added
- Saved-search digest tests.
- API integration tests.
- Two more EML fixtures.
[1.2.0] - 2026-04-26
Added
- Cmd+K palette in the Blazor UI.
- Saved-search digest worker.
- Sidecar indexer + searcher tests.
[1.1.0] - 2026-04-26
Added
- Saved searches, triage queue, message detail page.
- Rate limiting on /api/search.
[1.0.0] - 2026-04-26
Added
- Link-fetcher + image enricher (kimi-k2.5:cloud OCR per ADR 0018).
- LanceDB rebuild integration tests.
- Playwright UI spec.
[0.9.0] - 2026-04-26
Added
- Real LanceDB hybrid search (BM25 + vector + RRF, k=60 per ADR 0006).
- Indexer (chunks → embeddings → upsert).
- Ollama client (embed + rerank, with cosine fallback).
[0.8.0] - 2026-04-26
Added
- Bicep infrastructure (Container Apps, Postgres, Blob NFS v3 in South Africa North per ADR 0009).
- Golden EML test fixtures.
- OpenTelemetry instrumentation across the stack.
[0.7.0] - 2026-04-26
Added
- Blazor Server + MudBlazor UI shell.
- GitHub Actions CI/CD pipeline.
- Operational runbooks (LanceDB rebuild, PAT rotation, disaster recovery, etc.).
[0.6.0] - 2026-04-26
Added
Alexandria.Authlibrary: Entra OIDC + PAT bearer authentication, role policies.
[0.5.0] - 2026-04-26
Added
- Drainer worker (POP3 →
_raw/EML cache via MailKit). - Parser worker (
_raw/→ corpus markdown via the V0.4 ingest pipeline). - Entra app-registration scripts.
[0.4.0] - 2026-04-26
Added
- Full EML → Markdown ingest pipeline: AngleSharp HTML pre-cleaner, ReverseMarkdown.NET,
YamlDotNet frontmatter writer with
Iso8601DateTimeOffsetConverter, deterministic ULID from external_id.
[0.3.0] - 2026-04-26
Added
- Publication matcher (FromAddresses/ListIds/SubjectPatterns rules).
- HTML pre-cleaner.
- HTML → Markdown conversion.
- Link extractor.
[0.2.0] - 2026-04-26
Added
- Initial scaffold: .NET solution, frontmatter library, DbUp migrations V001–V003, docker-compose stack, Python sidecar skeleton.
[0.1.0] - 2026-04-26
Added
- 19 ADRs documenting the architecture decisions agreed during the §13 / Gap interview.