Changelog

All notable changes to Alexandria. The version corresponds to the VERSION file at the repo root and to the matching git tag (v<version>). Format follows Keep a Changelog and the project's SemVer Bump Policy.

The structure mirrors the iteration log of the Ralph loop that built the system: each release is a single iteration with a coherent theme. Earlier releases (v0.x.0) were foundational scaffolding; v1.x.0 added user-facing features; v1.18.x is the contract-test sweep that exposed and fixed several silent JSON-naming bugs across the CLI/API/sidecar.

[1.19.41] - 2026-04-28

Added

Version + release notes in the UI. The MainLayout app bar now carries a clickable v<version> chip that navigates to a new /release-notes page; the nav drawer also has a "Release notes (v)" link under Account. The page renders the bundled CHANGELOG.md via Markdig.
BuildInfo singleton (apps/Alexandria.Web/BuildInfo.cs) reads the version from the assembly's InformationalVersion (set by Directory.Build.props from the repo's VERSION file at build time) and the changelog from <app>/CHANGELOG.md.

Changed

Directory.Build.props now sets Version, AssemblyVersion, FileVersion, and InformationalVersion centrally from the repo's VERSION file. Every Alexandria assembly carries the matching version metadata; no per-project bookkeeping.
Alexandria.Web.csproj packages CHANGELOG.md and VERSION next to the executable at publish time so the runtime renderer can load them inside the container without bind-mounting the repo.

[1.19.40] - 2026-04-28

Fixed

Blazor circuit terminated with HttpRequestException 401 on every page load after sign-in. Two compounding root causes:
1. The API's Alexandria.Auth JwtBearer reads its config from the Entra section (Entra:TenantId, Entra:ClientId), but the FSDocker compose plumbed the env vars under AzureAd__*. The validator ran with empty TenantId + ClientId; every Bearer token failed audience+issuer validation -> 401. Renamed the API's AzureAd__TenantId / AzureAd__ClientId env vars to Entra__TenantId / Entra__ClientId to match the binding.
2. AlexandriaApiClient methods that returned non-nullable collections (ListPublicationsAsync, ListPatsAsync, ListSavedSearchesAsync) used GetFromJsonAsync which throws on non-success. The Search page calls ListPublicationsAsync on init, so the 401 from the API propagated as an unhandled exception into the Blazor circuit. All three methods now catch HttpRequestException and return empty collections, so a transient 401/timeout degrades to an empty UI instead of terminating the SignalR circuit.

Notes

First-time signed-in users may still see a Microsoft consent prompt for Search.Read if admin-consent didn't apply at app-creation time. Acceptable / expected; subsequent sign-ins are silent.

[1.19.39] - 2026-04-28

Fixed

OIDC sign-in returned HTTP 400 at /signin-oidc. Two root causes, both fixed in apps/Alexandria.Web/Program.cs:
1. The downstream-API scope was hardcoded to the placeholder api://placeholder/Search.Read, which Entra rejected with AADSTS500011: invalid_resource. Now read from AzureAd:DownstreamApiScopes (comma-separated env var); empty value disables token acquisition entirely so first-time / no-API deployments can still sign in cleanly.
2. ASP.NET Core didn't trust nginx's X-Forwarded-Proto: https, so it built redirect_uri = http://... which Entra correctly rejected (registered URI is HTTPS). Added ForwardedHeadersOptions (XForwardedFor + Proto + Host with KnownNetworks/KnownProxies cleared) and app.UseForwardedHeaders() as the very first middleware.
OIDC failure no longer returns HTTP 500. New OnRemoteFailure handler captures the OpenIdConnectProtocolException and redirects to /?signinError=<message> so users land on a clean page rather than a stack trace.

Added

Entra app config (applied via az ad app update + az rest PATCH against Microsoft Graph; not in source code, but recorded for the audit trail):
- Identifier URI: api://9d4c9c98-3ee3-4314-aab8-a2ce9da43100.
- Exposed delegated scope Search.Read (id 189f7e38-df57-4cb6-96de-24941ec335de).
- Added Search.Read as a requiredResourceAccess on the same app so it's allowed to request its own scope (admin-consented).
infra/docker/docker-compose.fsdocker.yml -- new env AzureAd__DownstreamApiScopes plumbed from ${ENTRA_DOWNSTREAM_API_SCOPES}. The on-host .env was updated to api://9d4c9c98-3ee3-4314-aab8-a2ce9da43100/Search.Read so the Web -> API call carries a valid bearer token after sign-in.

Notes

The DataProtection key ring is still ephemeral inside the Web container (/root/.aspnet/DataProtection-Keys). This means a Web container restart invalidates any in-flight sign-in (the user has to start over). For Phase 2 hardening, mount a named volume for the keys, or switch to PersistKeysToFileSystem against alexandria-corpus -- left as a follow-up since first sign-ins work today.

[1.19.38] - 2026-04-28

Operational (no code changes shipped)

POP3 password populated in /opt/apps/alexandria/.env on FSDocker. Drainer recreated; Application started confirmed in logs (no more restart-loop on auth failure).
Internal DNS A record added on FSPRIAD01 (zone fullstack.co.za): alexandria -> 192.168.0.148. On-LAN users now hit FSDocker directly without going out via the Fortigate WAN. Verified Resolve-DnsName alexandria.fullstack.co.za @192.168.0.21 returns 192.168.0.148, and Invoke-WebRequest https://alexandria.fullstack.co.za/health from the LAN returns HTTP 200.
BOK log appended with both follow-up actions; secret values redacted per gc-secrets-policy.

[1.19.37] - 2026-04-28

Added

TLS bootstrap pipeline for first-cert-issuance on FSDocker:
- infra/nginx/alexandria.fullstack.co.za.bootstrap.conf -- HTTP-only stub site (no TLS block) so nginx -t doesn't fail before the cert exists.
- scripts/setup-alexandria-tls.sh -- one-shot: install bootstrap site, preflight ACME path via FSDocker (using --resolve to bypass the host's internal DNS), run certbot certonly --webroot, swap to the production site config, reload nginx. Idempotent.

Fixed

Dockerfile.dotnet was on sdk:8.0 but global.json pins 9.0.0 with rollForward: latestFeature. Bumped build stage to mcr.microsoft.com/dotnet/sdk:9.0 (target framework stays net8.0; 9.0 SDK cross-builds net8.0 fine).
Dockerfile.dotnet entrypoint picked the alphabetically-first .dll (which was Alexandria.Core.dll -- a shared lib without a runtimeconfig.json); now picks the host DLL by matching *.runtimeconfig.json stem.
aspnet:8.0 runtime image ships with neither curl nor wget; added apt-get install -y curl in the runtime stage so the compose healthchecks resolve.
Compose port for Web changed from 5210 (was 5200, which collided with another FSDocker container during the deploy window). Nginx upstream updated to match.
scripts/deploy-fsdocker.sh now passes --env-file /opt/apps/alexandria/.env explicitly. Default discovery looks next to the compose file (infra/docker/), which doesn't exist on the host.
scripts/deploy-fsdocker.sh rsync replaced with a tar-pipe so the script runs on machines without rsync (e.g. Git Bash on Windows).
scripts/setup-alexandria-tls.sh DNS check uses nslookup instead of dig (dig is not in Git Bash on Windows).

Notes

Production live at https://alexandria.fullstack.co.za as of this release. Validated via public Fortigate WAN IP 105.233.33.156: HTTPS 200 from Web, HTTP -> HTTPS 301 redirect, HSTS + security headers, /api/me returns 401 unauthenticated (correct -- Entra sign-in required).

The Drainer container is in a restart loop until POP3_LIBRARY_PASSWORD is populated in /opt/apps/alexandria/.env. Web/API/Search/Parser/Postgres are unaffected -- the rest of the stack is fully functional and a curator can sign in, browse, and search what's already in the corpus.

[1.19.36] - 2026-04-28

Added

FSDocker production deployment artifacts. Replaces (in practice) the original Azure Container Apps target for production:
- infra/docker/docker-compose.fsdocker.yml -- new compose overlay with Web + API + search-py + drainer + parser + postgres. Drops the in-stack Ollama (uses Coruscant AI at 192.168.0.98:11434 per gc-ollama). Host port allocations (Web 5200, API 5201, search-py 8089, postgres 5444) confirmed free on FSDocker.
- infra/docker/.env.fsdocker.example -- template for the on-host .env; documents every env var the stack consumes. Real .env is hand-created at /opt/apps/alexandria/.env per gc-fsdocker (never synced from local).
- infra/nginx/alexandria.fullstack.co.za.conf -- host nginx site with the existing tempus pattern (HTTP->HTTPS, /var/www/certbot ACME, TLS upstreams, /api/ routing) plus the WebSocket upgrade headers Blazor Server's SignalR circuit needs.
- scripts/deploy-fsdocker.sh -- per-deploy script: rsync the repo (excluding .git/.env/build artefacts), docker compose down && up -d --build, tail logs.
- docs/runbooks/deploy-fsdocker.md -- first-time and routine deploy procedure plus rollback + troubleshooting.
Dedicated Alexandria Entra app created at tenant 03c2517f-13ef-45ff-8cb9-e8b0043e4cb2. App id 9d4c9c98-3ee3-4314-aab8-a2ce9da43100. Redirect URI https://alexandria.fullstack.co.za/signin-oidc, logout URL https://alexandria.fullstack.co.za/signout-callback-oidc, Microsoft Graph User.Read permission added. Per gc-azure-entra: no shared use of the Scorecards app.
Route 53 A record for alexandria.fullstack.co.za -> 105.233.33.156 (Fortigate WAN, matches the rest of the *.fullstack.co.za estate). Hosted zone Z8SBXEBV5VOB2, TTL 300. Confirmed propagated to Google + Cloudflare resolvers.

Changed

.gitignore extended to allow .env.fsdocker.example (and any future *.env.example template) while still excluding actual .env files.
BOK_IMMUTABLE_INPUT_LOG.md appended with the FSDocker deployment session entry. Inputs include the AWS STS session credentials provided for the R53 update; values redacted.

Notes

This is the first production deployment target for Alexandria. The Bicep templates in infra/bicep/ remain valid for any future Azure Container Apps deployment but are no longer the canonical production path. The §11 quarterly LanceDB rebuild drill obligation transfers cleanly: the procedure in docs/runbooks/lancedb-rebuild.md works against any compose stack and just needs the alexandria-lancedb volume to be deleted + the search-py container restarted to trigger the rebuild from /corpus/.

[1.19.35] - 2026-04-28

Added

scripts/bump-version.py -- closes the gc-semver "(script to be added in infra sprint)" gap noted in CLAUDE.md. Reads VERSION, bumps the requested SemVer part (--patch|--minor|--major), writes VERSION, stages it (plus CHANGELOG.md if modified), commits chore(release): v<version>, and creates the matching annotated tag. --no-commit for a dry write. README + CLAUDE.md updated to reference the script directly instead of the manual workflow.
Phase 3 of the Key Vault refactor -- the last v2-backlog follow-up. Native az.getSecret(...) references in infra/bicep/staging.keyvault.bicepparam and infra/bicep/production.keyvault.bicepparam; ARM resolves the secret values server-side at deploy time, so they never transit the local shell or a CI runner. Helper script scripts/deploy-with-keyvault-refs.sh <env> discovers the KV via az keyvault list, pre-flights that the four required secrets exist, runs what-if, and prompts for confirmation before deploying. Runbook + v2-backlog closure record updated.
Validate-against-_unmatched dry-run preview UI -- the v3 feature originally deferred from v1. New MatchPreviewService walks /corpus/_unmatched/, parses each file's frontmatter, and runs Alexandria.Core.Matching.PublicationMatcher against either the publication's stored rules or candidate rules from the unsaved edit form. Exposed via curator-scoped POST /api/publications/{slug}/match-preview. New Preview rescan button on /publications/{slug} shows would_match / total_unmatched plus a sample of up to 10 matched messages with the rule that fired. Nothing is written -- the existing Rescan button still does the actual work.
5 new MatchPreviewServiceTests cover candidate-rules counting, empty-_unmatched, no-rules-no-candidate, sample cap, and malformed-frontmatter graceful skip.

Changed

infra/bicep/staging.keyvault.bicepparam + production.keyvault.bicepparam validated via az bicep build-params --stdout: both compile to ARM JSON with three keyVault reference parameters (entraClientId, postgresUser, postgresPassword).
README test-count line refreshed: 324 total (was 319), 247 .NET tests (was 242).

Notes

This release closes the GitHub Actions disable that landed earlier in the session. The bump-version.py script is now the local SemVer source of truth, since the CD workflows are dormant. Phase 3 of the KV refactor was originally tagged "future cleanup, NOT required" -- shipping it now means the deploy story is fully self-contained on the operator's machine, with no GHA runner in the loop. The dry-run preview UI was originally tagged a v3 feature; pulling it forward closes the last documented gap in the v2-backlog file.

The v2-backlog file no longer contains a single live entry; it is now a pure closure record.

[1.19.34] - 2026-04-28

Changed

README status block updated from the v1.18.x snapshot (last refreshed in iter 35) to the v1.19.x reality. Now mentions the v2-backlog closures: nightly Postgres backup, nightly corpus snapshot, monthly audit-log archive, full ingest_dlq writers (4/4), full custom telemetry emission (4/4), 24-hour RPO. Test count refreshed to 319 (was 299).
docs/v2-backlog.md retitled to "(closed)" with a top-of-file note that the file is now a closure record. The originally-collected items remain as struck-through history; future work should add new entries to a fresh backlog file (or this one if its purpose evolves).

Notes

The v2 backlog set up by iter 55's consolidation (13 items) is fully discharged across iters 56-81 (26 iterations of closure work). 319/319 tests pass; mypy + ruff + dotnet format all clean. The Ralph loop has no shipped-code work remaining against any documented commitment.

[1.19.33] - 2026-04-28

Added

Closes the Key Vault refactor v2 item (Phase 2). cd-staging.yml + cd-production.yml now read the four secrets (entra-client-id, postgres-user, postgres-password, postgres-connection-string) from the application's Key Vault, falling back to the matching GHA Secret only if the KV value is empty.
- New "Resolve Key Vault + read secrets" step uses az keyvault list --resource-group ... to discover the KV name (since the bicep unique-string suffix isn't known statically), then az keyvault secret show per secret with a read_or_fallback bash helper. Empty KV values fall through to ${{ secrets.<ENV>_<NAME> }} so the legacy path stays alive during migration.
- Each resolved value gets ::add-mask:: so it's redacted in subsequent step logs.
- Multi-line GITHUB_OUTPUT writes feed the values into the Deploy Bicep + Run migrations steps without changing their --parameters shape.
docs/runbooks/key-vault-refactor.md updated: Phase 2 section rewritten as "shipped" with the operator migration sequence and a Phase 3 (future, optional, not required) entry for the pure-Bicep-references variant.

Notes

The "true Bicep keyVault().getSecret(...) references" approach is documented as Phase 3 / future cleanup. Today's shape -- workflow reads + passes via --parameters -- is operationally complete: the secret source-of-truth has moved to KV, rotations happen via az keyvault secret set, and the next deploy picks up the new value automatically.

This iteration closes the last v2 backlog item. All 13 v2 items that surfaced during the audit phase (iters 42-55) have shipped or been documented as complete. The v2-backlog.md file now contains only struck-through entries.

[1.19.32] - 2026-04-28

Added

Phase 1 of the Key Vault refactor v2 item: a sync workflow that pushes the production secrets from GitHub Actions Secrets into the application's Key Vault.
- .github/workflows/sync-secrets-to-keyvault.yml. Workflow_dispatch only (operators run it on environment bootstrap or after a GHA rotation). Resolves the KV name via az keyvault list, pushes 4 secrets: entra-client-id, postgres-user, postgres-password, postgres-connection-string. Skips empty GHA secrets with a WARN log.
- docs/runbooks/key-vault-refactor.md -- full refactor plan including the chicken-and-egg discussion (application KV vs. ops KV) and the Phase 2 checklist for flipping Bicep parameters to keyVault().getSecret(...) references.
- docs/runbooks/github-actions-secrets.md -- "Why not use Azure Key Vault" section reframed as "Key Vault refactor (Phase 1 shipped)" pointing at the new runbook.
- docs/runbooks/README.md -- new entry for the refactor runbook.

Notes

This is the last v2 backlog item to receive shipped code. Phase 2 (the actual Bicep parameter migration) is documented in detail in the new runbook and remains the only piece of v2 work that hasn't landed in this loop's run.

[1.19.31] - 2026-04-28

Added

Closes the curator-UI v2 item (5/5 affordances). Triage open-message joins the four already-shipped ones.
- Triage.razor rows are now clickable and navigate to /messages/{id} so curators can read each unmatched message body before deciding how to update match rules.
- Switched the page's data source from Api.ListUnmatchedAsync (/api/triage/unmatched filesystem walk, paths only, no message_id) to Api.ListMessagesAsync(Publication: "_unmatched") -- the indexer already routes unmatched files into LanceDB with publication="_unmatched" per IngestPipeline.UnmatchedSlug, so every row in the new list has a real MessageId.
- Visual shape mirrors TopicDetail.razor and PublicationDetail.razor: one MudList, chip + date + subject, click-to-/messages/{id}. Three pages, one pattern -- curator hands learn it once.
- The old /api/triage/unmatched endpoint stays in place (operator handbook still references the filesystem-walk shape and the API has no breaking changes).

Notes

This iteration plus iters 75-78 close the entire curator-UI v2 item. Five iterations, five affordances, one button-per-iteration cadence that worked because the typed-client pattern from iters 60-63 had established the plumbing for every later UI piece to ride on.

[1.19.30] - 2026-04-28

Added

Curator-UI partial closure (4/5 affordances): Edit publication form joins the trio of Add / Toggle Active / Rescan.
- PublicationDetail.razor adds a MudExpansionPanel "Edit publication" below the metadata header. Same MudGrid form shape as /publications/new (10 fields covering metadata + 4 match-rule types) but with slug immutable -- the slug stays in the page header.
- Form fields pre-populate from the current publication on page load. Match-rule fields stay blank because PublicationView doesn't currently round-trip them; entering values overwrites the existing rules.
- Save calls Api.UpdatePublicationAsync (added in v1.19.28) with a sparse PublicationUpdateRequest -- only fields the curator changed (or all fields if every text box is filled). Snackbar confirms + refreshes the in-memory _publication from the API response.
- No new API or sidecar work; rides on the v1.19.28 PUT endpoint that was already there for the Active toggle.

Notes

The Razor source generator hint from v1.19.27 carries over: don't put regex examples inside MudTextField HelperText (it confuses the generator's argument parsing). The Edit form's HelperText strings stay simple; concrete examples live in curator-guide.md.

[1.19.29] - 2026-04-28

Added

Curator-UI partial closure (3/5 affordances): Rescan unmatched joins Add publication and Active toggle.
- PublicationDetail.razor adds a "Rescan unmatched" button next to the publication header. Click triggers a sidecar rescan; after success the page re-fetches its message list so newly-classified messages appear.
- New curator-scoped API proxy POST /api/publications/rescan-unmatched on Alexandria.Api. Distinct from /api/admin/sidecar/rebuild because rescan is curator-routine (run after match-rule edits) rather than admin-rare. The route uses the existing AlexandriaPolicies.Curator.
- ISearchSidecarClient.RescanUnmatchedAsync + SidecarRescanResponse record (status + processed + skipped, mirroring the Pydantic shape).
- AlexandriaApiClient.RescanUnmatchedAsync + RescanResponse typed record on the Web side.
- Three new tests: round-trip JSON contract pin for SidecarRescanResponse, Curator_can_trigger_rescan_unmatched, and Reader_cannot_trigger_rescan_unmatched. 57/57 Api tests green (was 54).

Notes

v2 backlog count revised from 4 to 5 affordances after fold-out: Add, Active toggle, Rescan, Edit (pending), Triage open-message side-panel (pending). The remaining two follow the same single-button pattern.

[1.19.28] - 2026-04-28

Added

Continuing curator-UI partial closure (2/4 affordances): Active toggle joins Add publication.
- PublicationDetail.razor adds a Deactivate / Reactivate button next to the publication header. Curator-only on the API side; non-curators get a 403 surfaced via Snackbar.
- AlexandriaApiClient.UpdatePublicationAsync + PublicationUpdateRequest typed record. Sparse update (every field optional, defaults to null) so the toggle call is new PublicationUpdateRequest(IsActive: false) without disturbing other fields.
- On successful toggle, _publication is replaced with the API response so the chip + button update without re-fetching.
- curator-guide.md callout updated to list both shipped UI actions.

Notes

The pattern from v1.19.27 carries forward: typed-client method -> single-button addition on an existing Razor page -> Snackbar feedback -> backlog entry struck through. The remaining curator-UI work (Edit modal, Rescan button, Triage open-message side-panel) follows the same shape and will ship one button at a time.

[1.19.27] - 2026-04-28

Added

Partial closure of curator-UI v2 item: Add publication form ships.
- New /publications/new Razor page with a MudGrid form covering slug, name, publisher, homepage URL, description, category, tags, and the four match-rule types (one pattern per line). Posts to the existing POST /api/publications/ endpoint.
- AlexandriaApiClient.CreatePublicationAsync + PublicationCreateRequest
  - PublicationMatchRules typed records on the Web side.
- MainLayout.razor "Curator -> Add publication" NavLink restored (iter 54 had removed it as a dead link; v1.19.27 makes it real).
- curator-guide.md callout updated: the UI is no longer described as fully read-only; "Add publication" gets a UI path alongside the API/CLI paths. Edit / Validate / Rescan / Toggle-active actions remain on the v2 backlog.

Notes

The remaining curator UI work (Edit modal on the detail page, Rescan button, Active toggle, Triage open-message side-panel) is independently shippable in future iterations -- each one is small now that the typed client + form-input pattern from this iteration are in place.

[1.19.26] - 2026-04-28

Added

Closes the audit-archiver v2 item (surfaced in iter 42 with the audit-log runbook honesty fix). Automates the monthly POPIA archive job that had been operator-run since the system shipped.
- New .github/workflows/audit-archive.yml runs at 03:00 UTC on the 1st of each month. Co-locates with corpus-snapshot.yml (daily 01:00 UTC) and postgres-backup.yml (daily 00:00 UTC) so all the nightly/monthly maintenance jobs are in one folder.
- Uses az containerapp exec to run COPY (...) TO STDOUT WITH (FORMAT csv, HEADER true) | gzip inside the Postgres container app and upload via the container's managed identity. Same data-doesn't- transit-the-runner shape as postgres-backup.yml from v1.19.25.
- Blob layout: alexandria-audit-archive/year=<YYYY>/month=<MM>/audit-<YYYY-MM>.csv.gz (Cool tier).
- Archives as gzipped CSV (not Parquet) so DuckDB can read directly via read_csv_auto -- avoids the csv2parquet dep the original runbook had been pointing at.
- delete_after_archive workflow_dispatch input lets an operator dry-run the archive (export + upload, skip the DELETE). Default on the schedule is yes.
- audit-log-archive.md runbook rewritten: "Status" section reflects automation; query examples use read_csv_auto instead of read_parquet.

Notes

This iteration plus v1.19.24 + v1.19.25 close all three "operational automation" v2 items (audit archiver, corpus snapshot, pg_dump backup). All ship as scheduled GitHub Actions workflows rather than .NET hosted services -- the recurring shape was clear enough by the third one that following it was cheaper than building per-worker projects. Different choice would be justified if these jobs grew complex business logic; for now they're glue between Azure CLI and the existing data plane.

[1.19.25] - 2026-04-28

Added

Closes the pg_dump backup v2 item (surfaced in iter 43 alongside the corpus-snapshot DR gap).
- New .github/workflows/postgres-backup.yml scheduled workflow runs daily at 00:00 UTC. Uses az containerapp exec to dump+gzip+upload from inside the Postgres Container App so the database stays internal-only (no public ingress required). The container app's managed identity has Storage Blob Data Contributor on the storage account, so the dump never transits the GH Actions runner.
- Blob layout: alexandria-postgres-backups/<YYYY>/<MM>/alexandria-<YYYY-MM-DD>.sql.gz.
- 30-day retention by default (overridable via workflow_dispatch); older dumps are pruned by the same workflow.
- DR runbook updated: Postgres restore status flips from Blocked to Working. The "Open work blocking the full RPO/RTO promise" intro section is reframed as "Backup sources (status)" since both blockers (corpus snapshot in v1.19.24, pg_dump in this release) are now closed. The "(no backup available)" fallback retitled as a deeper second-line scenario for rare account-level losses.
- Implemented as a scheduled GitHub Actions workflow rather than an Alexandria.Backups hosted service; the workflow is simpler and co-locates the schedule with corpus-snapshot.yml.

Notes

RPO for Postgres data loss is now ~24 hours (was: total loss unrecoverable beyond what the corpus markdown could re-derive). This iteration plus iter 72 close all DR-related v2 backlog items.

[1.19.24] - 2026-04-28

Added

Closes the corpus-snapshot v2 item (surfaced in iter 43 with the DR audit).
- Bicep adds alexandria-corpus-snapshots blob container alongside the existing alexandria-corpus container.
- New .github/workflows/corpus-snapshot.yml scheduled workflow runs daily at 01:00 UTC (before the 02:00 quarterly LanceDB rebuild drill window). Uses az storage blob copy start-batch to copy the corpus container into a date-stamped prefix <YYYY>/<MM>/<DD>/ in the snapshots container. The same workflow then prunes folders older than the configured retention window (default 14 days, overridable via workflow_dispatch).
- New CD secret PROD_STORAGE_ACCOUNT documented in github-actions-secrets.md. The Azure OIDC app needs Storage Blob Data Contributor on the production storage account.
- DR runbook updated: status table entry for "Corpus snapshot" flips from Blocked to Working; "Corpus volume lost" section now describes the real restore-from-snapshot path (date-stamped prefix); the re-drain-from-POP3 fallback retitled to "beyond the snapshot retention window" since it now applies only when both the snapshots AND the live corpus are gone.

Notes

RPO for corpus loss is now ~24 hours (was: re-drain from POP3 with 30-90-day server retention, oldest content lost). The other DR-blocking items (pg_dump backup) remain on the v2 backlog.

[1.19.23] - 2026-04-28

Added

Closes the telemetry-emission v2 item (4/4 stages now emit). The Python sidecar joins the three .NET workers as a custom-metric emitter.
- apps/search-py/src/alexandria_search/metrics.py: small module that creates an OTel meter named "alexandria-search" and exposes register_lancedb_row_count_gauge(get_count) to register an observable gauge that polls IndexStore.message_count() on each OTel collection cycle. Best-effort: callback exceptions return an empty observation list rather than killing the meter callback.
- main.py lifespan handler calls register_lancedb_row_count_gauge(store.message_count) and holds the returned instrument on app.state.lancedb_gauge so it isn't garbage-collected before lifespan exit.
- Distinct meter name from .NET (alexandria-search vs Alexandria) so App Insights queries can separate the two runtimes' emissions.
- 3 new tests in test_metrics.py covering registration smoke, no-throw with healthy callback, and no-throw at registration with a callback that would raise at collection time.

Notes

The admin-guide "Reading the dashboards" table now lists 7 customMetrics queries that all return real data. The iter-51 audit gap (operator handbook + admin guide referenced telemetry that didn't emit) is fully closed.

66/66 search-py tests pass (was 63, +3 new). 250/250 .NET tests still pass. mypy + ruff both clean.

[1.19.22] - 2026-04-28

Added

Continuing telemetry-emission partial closure (3/4 stages): link-fetcher joins parser and drainer as a custom-metrics emitter.
- LinkFetcherService.ProcessOneAsync emits two counters:
  - alexandria.linkfetcher.fetch_attempts with two tags:
    - success=true|false
    - cached=true|false (true when the body was already on disk and the worker just refreshed the frontmatter)
  - alexandria.linkfetcher.bytes_fetched (cumulative size of link-body markdown written; useful for capacity planning of the _links/ corpus subtree).
- admin-guide.md "Reading the dashboards" updated with both metrics.
- v2-backlog.md shows 3/4 progress with sidecar (Python) as the remaining stage.

Notes

Same shape as the v1.19.20 + v1.19.21 patterns -- one declaration plus 4 .Add(1, tags) calls in the existing branches. The shared AlexandriaMeter continues to make each subsequent stage near-free.

250/250 .NET tests still pass.

[1.19.21] - 2026-04-28

Added

Continuing telemetry-emission partial closure: drainer joins parser as the second worker emitting custom metrics through AlexandriaMeter.
- Pop3DrainerService emits two counters:
  - alexandria.drainer.ticks with intake=<slug> and success=true|false tags. Realises the drain.tick claim that had been forward-looking in admin-guide since v0.x.
  - alexandria.drainer.messages_drained with intake=<slug> tag. Cumulative per-intake throughput.
- admin-guide.md "Reading the dashboards" updated; v2-backlog hint reduced to link-fetcher + sidecar.

Notes

Same shape as the v1.19.20 parser wiring -- one Counter<long> declaration + a few .Add(1, tags) calls in the existing success/failure paths. The shared AlexandriaMeter continues to make each subsequent stage near-free.

250/250 .NET tests still pass.

[1.19.20] - 2026-04-28

Added

Partial closure of telemetry-emission v2 item: parser stage now emits custom metrics through OpenTelemetry to App Insights.
- Alexandria.Core.Telemetry.AlexandriaMeter: shared System.Diagnostics.Metrics.Meter mirroring the existing AlexandriaActivitySource shape. Workers create Counter<long> / Histogram<T> instances against this single meter so OpenTelemetry only has to subscribe to one name.
- TelemetryServiceCollectionExtensions.AddAlexandriaTelemetry now registers mb.AddMeter(AlexandriaMeter.Name) so every worker that calls the helper auto-subscribes.
- EmlParserService emits two counters at end of ProcessOneAsync:
  - alexandria.parser.files_processed with matched=true|false tag
  - alexandria.parser.files_failed
- admin-guide.md "Reading the dashboards" table updated with the new customMetrics queries; v2-backlog hint reduced to drainer + link-fetcher + sidecar heartbeat (which now have a clear pattern to copy from EmlParserService).

Notes

Same partial-closure pattern as the ingest_dlq 4-iteration arc. The shared AlexandriaMeter does the same job that Alexandria.Core.Ingest.PostgresIngestDlq did for the DLQ work: one piece of infrastructure, multiple workers ride on it. Each subsequent stage is a one-line counter declaration + .Add(1) at the right point.

250/250 .NET tests still pass.

[1.19.19] - 2026-04-28

Added

Closes the ingest_dlq writer wiring v2 item (4/4 stages now wired). This iteration ships the Python sidecar's index-stage writer.
- apps/search-py/src/alexandria_search/ingest_dlq.py: Python IngestDlq Protocol + PostgresIngestDlq (asyncpg) + NullIngestDlq fallback + build() factory. Mirrors the .NET-side Alexandria.Core.Ingest.IIngestDlq shape (stage, source_path, error, payload).
- Indexer.__init__ accepts an optional ingest_dlq parameter (defaults to NullIngestDlq so existing tests + non-Postgres dev keep working).
- Indexer.rebuild's per-file failure handler now records stage="index" rows alongside the existing log.error line.
- main.py lifespan handler builds the IngestDlq from the new Settings.postgres_connection_string (env var POSTGRES_CONNECTION_STRING) and passes it into the Indexer.
- pyproject.toml: adds asyncpg>=0.30,<0.32 as a dep, and adds asyncpg to the [[tool.mypy.overrides]] block (no library stubs).
5 new tests in tests/test_ingest_dlq.py:
- NullIngestDlq.record returns None and never raises.
- build(None) and build("") return NullIngestDlq.
- build(connstr) returns PostgresIngestDlq.
- PostgresIngestDlq.record swallows asyncpg failures (doesn't raise).
- Indexer.rebuild calls dlq.record once per failed file with stage="index" and payload={"phase": "rebuild"}.

Notes

This is the fourth and final partial closure of the v2 ingest_dlq item that started in v1.19.16. All four ingest-pipeline stages now write to the same Postgres table; the operator-handbook step 3 query (SELECT stage, count(*) FROM ingest_dlq WHERE resolved_at IS NULL ...) is now a complete triage signal across drain / parse / enrich / link-fetch / index.

63/63 search-py tests pass (was 58, +5 new). 250/250 .NET tests unchanged. mypy + ruff both clean.

[1.19.18] - 2026-04-28

Added

Continued partial closure of ingest_dlq writer wiring (3/4 stages): link-fetcher joins parser and drainer.
- LinkFetcherService.ProcessOneAsync per-fetch catch block records stage="link-fetch" rows. source_path carries the corpus message path; payload JSON includes the failing URL for triage (built via JsonSerializer.Serialize so URL escaping is automatic).
- LinkFetcherOptions.PostgresConnectionString added; the link-fetcher has no other Postgres dependency, so the field defaults to empty string and falls through to NullIngestDlq log-only behavior when Postgres isn't configured.
- DI registration in LinkFetcher/Program.cs follows the same one-line pattern as Parser and Drainer.
- LinkFetcherFullLoopTests.BuildService updated to inject NullIngestDlq for the new constructor parameter; 22/22 LinkFetcher tests still pass.

Notes

The shared PostgresIngestDlq impl from v1.19.17 paid off again -- this iteration's worker wiring was 1 line of options + 3 lines of DI + 6 lines of RecordAsync call. The remaining stage (Python sidecar index) is the largest because it crosses the runtime boundary.

250/250 .NET tests still pass.

[1.19.17] - 2026-04-28

Added

Continued partial closure of ingest_dlq writer wiring: drainer stage joins parser as the second wired stage. Pop3DrainerService per-intake catch block now records stage="drain" rows alongside the existing log + circuit-breaker logic. The source_path slot carries intake:<slug> (drain failures aren't keyed to a specific file); payload includes the consecutive-failures count.

Changed

Refactor: PostgresIngestDlq moved from apps/Alexandria.Parser/ into libs/Alexandria.Core/Ingest/ with a Func<string?> connection-string resolver instead of IOptions<ParserOptions> directly. Each worker now injects with its own options class via a single line:
```
services.AddSingleton<IIngestDlq>(sp => new PostgresIngestDlq(
    () => sp.GetRequiredService<IOptions<TWorkerOptions>>().Value.PostgresConnectionString,
    sp.GetRequiredService<ILogger<PostgresIngestDlq>>()));
```
Removes the per-worker copy that would have proliferated as more stages came online.
Alexandria.Core.csproj adds Npgsql + Microsoft.Extensions.Logging.Abstractions package references (PostgresIngestDlq now lives there). Directory.Packages.props adds the matching central PackageVersion entry for Microsoft.Extensions.Logging.Abstractions (was being pulled transitively, central pinning required the explicit declaration).
operator-handbook.md step 3 query comment updated to reflect 2/4 stages wired.
v2-backlog.md ingest_dlq entry shows 2/4 progress with the next suggested step (link-fetcher).

Notes

Same partial-closure pattern as v1.19.16: ship one stage at a time. The shared-impl refactor pays off immediately -- the drainer wiring is one DI line + one RecordAsync call. Future stages cost the same.

250/250 .NET tests still pass.

[1.19.16] - 2026-04-28

Added

Partial closure of ingest_dlq writer wiring v2 item: parser stage now records failures into the V004 ingest_dlq table.
- Alexandria.Core.Ingest.IIngestDlq shared interface + NullIngestDlq no-op fallback (used in tests / when Postgres isn't configured).
- apps/Alexandria.Parser/PostgresIngestDlq.cs: best-effort writer that swallows its own failures so the calling exception handler isn't masked.
- EmlParserService exception handler now records stage="parse" rows alongside the existing structured log line.
- Program.cs registers PostgresIngestDlq as the IIngestDlq impl.
- operator-handbook.md step 3 query is no longer a misleading no-op for parse failures: the query returns honest "parse" rows since this release. Drainer / link-fetch / index stages still need wiring -- the handbook calls those out.
- v2-backlog.md entry retitled "(partial)" with the next-step hint (drainer's circuit-breaker exception handler).

Notes

This is the first partial v2 closure of the loop -- shipping one stage at a time keeps the iteration small and gives the operator-handbook query real value immediately rather than waiting for full coverage. Same pattern as iters 59-61 (browse-by-topic split into three iterations) but applied to a multi-stage worker change.

250/250 .NET tests still pass.

[1.19.15] - 2026-04-28

Added

Closes filter-sidebar v2 item: /search now exposes a MudExpansionPanel "Filters" below the query box with:
- MudSelect multiselect for Publications (populated from Api.ListPublicationsAsync, active publications only).
- MudSelect multiselect for Topics (the same seeded taxonomy Topics.razor uses).
- MudDatePicker for Sent after (inclusive).
- MudDatePicker for Sent before (exclusive).
Selected filter values flow into the existing SearchQuery typed record that AlexandriaApiClient.SearchAsync already accepted -- no API or client-library changes needed; everything was wired in v1.19.10 (inline operator parser) and earlier.
reader-guide.md "Filtering results" reorganised: filter-sidebar section first, inline-operator section second, both now described as live.
v2-backlog.md filter-sidebar entry struck through.

Notes

Pure-UI iteration -- no .NET, Python, CLI, or contract test changes. The pre-existing typed-client surface absorbed the new feature without ceremony, the same way browse-by-publication landed in iter 62 on top of the iter 60 plumbing.

[1.19.14] - 2026-04-28

Added

Closes browse-by-publication v2 item: Publications.razor rows are now clickable; clicking navigates to a new /publications/{slug} detail page showing the publication header (name, slug, publisher, description, tags, active state) above the most-recent 50 messages.
- PublicationDetail.razor reuses the Api.ListMessagesAsync plumbing landed in v1.19.12 + the same MudList visual shape as TopicDetail.razor from v1.19.13. Two parallel API fetches (publications list for the header, messages list for the body) so the page renders in one round-trip.
- Publications.razor adds OnRowClick and a hint caption directing users to click; the empty-state message updated to point at curator-guide.md instead of the v1.18.18-removed "Curator -> Add publication" UI.
- reader-guide.md "By publication" rewritten from "planned for an upcoming iteration" to live documentation.
- v2-backlog.md browse-by-publication entry struck through.

Notes

This is the smallest possible v2 closure -- the entire iteration was a single new Razor page + 4 lines of changes to an existing page, because the API surface and typed-client method already existed from iters 60-61. The "prerequisite-first" rhythm pays off here: the Topics three-iteration arc left exactly the right plumbing for browse-by-publication to land in one.

[1.19.13] - 2026-04-28

Added

Closes a v2 backlog item: /topics and /topics/{slug} browse-by-topic UI.
- Topics.razor: grid of 11 MudCard tiles for the seeded taxonomy (ai-and-llms, regulation, popia, cybersecurity, cloud-infra, data-eng, developer-tools, finance-economics, tax-and-sars, product-strategy, industry-news). Each tile is a Material icon + display name + slug. Click navigates to the topic detail page.
- TopicDetail.razor: Lists the most-recent 50 messages classified into the topic via Api.ListMessagesAsync(new MessageListQuery(Topic: slug, Limit: 50)). Click any message → /messages/{id} detail.
- AlexandriaApiClient.ListMessagesAsync + MessageListQuery / MessageListResponse / MessageRef typed records consuming the v1.19.12 /api/messages route.
- MainLayout: new "Topics" NavLink between Publications and Saved searches.
- reader-guide.md "By topic" rewritten from "Coming soon" to live docs.
- v2-backlog.md browse-by-topic entry struck through; browse-by-publication entry refined with a one-liner pointing at the now-available Api.ListMessagesAsync(Publication: slug, ...) plumbing.

Notes

This is the first three-iteration v2 closure of the loop:

v1.19.11 (iter 59): sidecar topic filter
v1.19.12 (iter 60): API listing proxy + typed-client + JSON contract pin
v1.19.13 (iter 61): Web UI consuming the typed client

Each layer was small, well-tested, and bisectable. The pattern works well for v2 items that span multiple subsystems.

[1.19.12] - 2026-04-28

Added

GET /api/messages listing route on Alexandria.Api: filter by publication, topic, subject_like, after, before, with paging (limit, offset). Proxies to the sidecar's existing /messages route (extended with topic filter in v1.19.11). Distinct from /api/search -- this is metadata enumeration with no ranking; for ranked retrieval use /api/search.
ISearchSidecarClient.ListMessagesAsync + SidecarMessageListQuery / SidecarMessageList / SidecarMessageRef records.
Two new tests:
- SidecarJsonRoundTripTests.SidecarMessageList_deserializes_pydantic_snake_case_payload pins the wire format (message_id, external_id, sent_at) so future Pydantic-side changes fail fast on the .NET side. 5/5 round-trip cases.
- Reader_can_list_messages_filtered_by_topic exercises the route through the auth pipeline. 54/54 Api tests green (was 52).

Notes

This is the second prerequisite-first iteration in a row. Iter 59 added the topic filter on the sidecar's GET /messages; iter 60 adds the .NET proxy + typed-client + tests. Iter 61 can ship the Topics UI consuming the typed client without further plumbing.

[1.19.11] - 2026-04-28

Added

Sidecar GET /messages accepts a topic=<slug> query parameter (e.g. topic=ai-and-llms). Implementation is a Python-side post-filter rather than a LanceDB array-contains SQL clause -- LanceDB's array filter syntax shifts across minor versions and the row count is bounded by the in-process 5000-row pull. Prerequisite work for the v2 browse-by-topic UI; the route is also useful today via the CLI for curators wanting to enumerate messages within a classifier topic.
alexandria messages list-ids --topic <slug>: matching CLI flag threaded through to the new sidecar parameter.
1 new pytest case (test_messages_enumeration_filters_by_topic) covering the new behavior end-to-end through the FastAPI TestClient.
1 CLI smoke test assertion that --topic is documented in messages list-ids --help.

[1.19.10] - 2026-04-28

Added

Closes third v2 backlog item: inline search operator parsing.
- AlexandriaApiClient.ApplyInlineOperators extracts pub:, topic:, domain:, after:, before: tokens from SearchQuery.Q, merges them with any explicitly-set typed parameters, and leaves the residual free-text in Q. The sidecar contract is unchanged -- everything flows through the existing [FromQuery] string[] pub / topic / domain and [FromQuery] DateTimeOffset? after / before parameters on /api/search.
- Date values accept both ISO-8601 (2026-04-01T00:00:00Z) and date-only form (2026-04-01), date-only being treated as midnight UTC.
- Edge-case guard: pure-operator queries (pub:stratechery with no free-text) fall back to the original q so the request doesn't hit the API's "q required" 400. The filter-sidebar v2 work covers the empty-text-with-filters case properly.
- SearchQuery record extended with Domains field (was missing -- API already accepted domain query parameters but the typed client had no way to set them).
- reader-guide.md "Filtering results" rewritten with two sections (inline operators in the search box; direct API query parameters for scripts), each with examples. The v2-backlog hint is reduced to "a filter sidebar UI for users who don't want to learn the operator syntax".

Notes

This is the third v2 closure in three iterations (intakes.notes -> "More like this" -> inline operators). All three had the property that the underlying capability was already in place; only the user-facing surface was missing. The remaining v2 items have larger surface areas (worker projects, full UI builds).

[1.19.9] - 2026-04-28

Added

Closes second v2 backlog item ("More like this" on message detail, surfaced in reader-guide.md). The API surface already existed (/api/messages/{id}/similar -> sidecar /similar/{messageId}); the missing piece was the UI affordance.
- AlexandriaApiClient.SimilarAsync(string messageId, ...): new typed-client method posting to the existing API route.
- MessageDetail.razor: MudButton "More like this" below the rendered body. On click, fetches similar messages and renders them in the same MudList shape used by /search. The seed message itself is filtered out of the results.
- reader-guide.md "More like this" section updated from "Coming in a near-term iteration" to live documentation.
- v2-backlog.md entry struck through with the release pointer.

Notes

This is the third "More like this"-style affordance pattern in the codebase (after /api/messages/{id}/similar on the API side and the sidecar's /similar/{messageId} route). All three layers were already wired before this release; only the user-visible button was missing.

[1.19.8] - 2026-04-28

Added

Closes one v2 backlog item (intakes.notes column, surfaced in iter 45):
- db/migrations/Scripts/V009__intake_notes.sql: adds nullable notes text column to intakes so subscription intent travels with the schema.
- alexandria intakes add --notes ...: optional flag at provisioning time.
- alexandria intakes set-notes <slug> --notes ...: edit-after-the-fact. Pass empty string to clear.
- alexandria intakes list --notes: appends the column on a second line per row when set.
- 3 new CLI smoke tests (41/41 pass; was 38).
docs/runbooks/add-intake.md step 7 now points at the CLI workflow rather than calling out the future column.
docs/v2-backlog.md "Schema additions" section now reads as "all previously-listed schema items have shipped" with a strike-through entry pointing at this release.

Notes

This is the first iteration of the loop to close a v2 backlog item rather than add to it. The pattern: the simplest backlog item (one column + two CLI subcommands + 3 tests) is the right target when audit work itself has completed.

[1.19.7] - 2026-04-28

Added

docs/v2-backlog.md consolidates the 11 v2-backlog items that surfaced across iterations 42-54 of this maintenance loop. Each entry links to the originating doc, the iteration that surfaced it, the user-visible cost of leaving it as-is, and a suggested shape for the work. Categories:
- Operational automation (3): Alexandria.AuditArchiver, Alexandria.Backups pg_dump worker, corpus snapshot job
- Schema additions (1): intakes.notes column
- Pipeline observability (2): ingest_dlq writer wiring, Alexandria.Telemetry custom metrics + events
- Web UI work (4): filter sidebar, inline operator parsing, curator UI actions, browse-by-publication / browse-by-topic, "More like this"
- Infrastructure (1): Key Vault references in Bicep parameters
Doc-route audit: every /search, /publications, /saved, /me, /me/pats, /triage, /messages/... reference in the docs maps to a real @page directive in apps/Alexandria.Web/Pages/. No additional drift found beyond iter 54's /publications/new removal.

[1.19.6] - 2026-04-28

Fixed

apps/Alexandria.Web/Shared/MainLayout.razor had a curator NavLink to /publications/new -- but no Razor component is registered at that route. Curators clicking the menu item would have hit Blazor's 404 page. Iter 52 had already established that the curator UI is read-only and adds happen via CLI/API; removed the dead NavLink with an inline comment pointing to the curator-guide for the actual workflow.
user_manual/getting-started.md step 4 said "/search supports filters for publication, topic, and date range" -- iter 53 just established that the web UI search page has no filter sidebar (filtering is via API query parameters today). Same wording in two places; fix the second one to point at the reader-guide's accurate description.

[1.19.5] - 2026-04-28

Fixed

user_manual/reader-guide.md "Operators in the query" claimed Alexandria parsed inline operators like pub:stratechery topic:ai-and-llms after:2026-01-01 in the search box. No code parses these. SearchEndpoints.cs takes pub, topic, domain, after, before as separate query parameters; the Web UI's Search.razor doesn't even pass those parameters from the text input. A user typing popia pub:sars-bulletin would have all of that text shipped through to the sidecar as a single query string with no filter applied -- and they'd never know. Replaced the operators table with the actual query-parameter syntax and a curl example, plus a v2-backlog note for the filter sidebar UI and inline-operator parsing.

[1.19.4] - 2026-04-28

Fixed

Largest doc/code drift in this loop: user_manual/curator-guide.md described an extensive UI-based curator workflow (Add publication form with Save button, Edit publication, Validate-against-_unmatched dry-run UI, Rescan-unmatched button on a publication detail page, Toggle-active toggle, Open-message link in triage). None of these exist. The actual UI today (apps/Alexandria.Web/Pages/Publications.razor, Triage.razor) is read-only -- a sortable table with no actions. All curator actions run through the operator CLI (tools/Alexandria.Cli) or the HTTP API directly.

Rewrote the curator-guide to describe the supported CLI + API workflow:
- "Adding a publication" via POST /api/publications/
- "Editing a publication" via PUT /api/publications/{slug}
- "Deactivating a publication" via alexandria publications deactivate
- "Reclassifying already-classified messages" via messages list-ids | publications reclassify-ids (the v1.17+/v1.18+ workflow that wasn't even mentioned in the curator-guide before).
- Added an "About the UI today" callout at the top so curators set expectations correctly, with a v2 backlog note for moving actions into the UI.
This is the same claim/reality drift pattern as iters 42-43 (DR plan, audit archive runbook) but at a much larger surface: a curator expecting to follow the old guide would have hit dead UI elements and had no path forward without learning the CLI from elsewhere.

[1.19.3] - 2026-04-28

Fixed

Same drift class as iter 50 (ingest_dlq): App Insights custom telemetry signals referenced in docs don't actually emit. No TrackEvent or TrackMetric calls anywhere in the codebase -- the OpenTelemetry auto-instrumentation only captures HTTP request shape. Affected docs:
- user_manual/admin-guide.md "Reading the dashboards" table listed customMetrics:alexandria.indexer.files_processed, customMetrics:alexandria.lancedb.row_count, customEvents:drain.tick -- none of which exist. Replaced with queries that use the auto-instrumented requests / dependencies data, plus a v2-backlog note for the custom emission.
- docs/runbooks/lancedb-rebuild.md step 7 told operators to "Watch progress in Application Insights (custom metric alexandria.indexer.files_processed)". Rewrote to point at the structured-log events that DO emit (index.rebuild.started / index.rebuild.completed).
docs/runbooks/lancedb-rebuild.md step 6 still used the https://search.alexandria.fullstack.co.za/index/rebuild URL the iter 48 admin-guide fix removed (the sidecar is internal-only). Updated to use the new /api/admin/sidecar/rebuild proxy added in v1.19.0.

[1.19.2] - 2026-04-28

Fixed

docs/operator-handbook.md step 3 told the on-call to run SELECT stage, count(*) FROM ingest_dlq WHERE resolved_at IS NULL ... during a "search returns empty" incident -- but no worker actually writes to ingest_dlq. The table is provisioned by V004 but the drainer / parser / enricher / indexer don't yet emit failure rows into it. An on-call running the query during a real incident would see zero rows and falsely conclude the indexer is healthy. Replaced with the honest fallback (App Insights exceptions + drainer circuit-breaker status) and called out the ingest_dlq writer wiring as v2 backlog work.

[1.19.1] - 2026-04-28

Added

Two integration tests for the v1.19.0 /api/admin/sidecar/rebuild route: Admin_can_trigger_sidecar_rebuild (happy path round-trip via the auth pipeline) and Curator_cannot_trigger_rebuild (auth gate confirmation). Mirrors the existing Admin_can_view_sidecar_health / Curator_cannot_view_admin_endpoints pair so admin routes are consistently exercised at the HTTP boundary.

[1.19.0] - 2026-04-28

Added

POST /api/admin/sidecar/rebuild -- admin-only proxy to the sidecar's internal-only /index/rebuild route. The admin-guide claimed https://search.alexandria.fullstack.co.za/index/rebuild worked but the search-app has ingressExternal: false (verified in main.bicep), so that hostname doesn't exist. The new admin proxy + az containerapp exec fallback are now both documented as the legitimate manual-rebuild paths.
- ISearchSidecarClient.RebuildAsync + SidecarRebuildResponse record.
- StubSearchSidecarClient updated to fulfil the contract for tests.
- SidecarJsonRoundTripTests extended with a round-trip case for the new response shape so future drift fails CI.

Fixed

user_manual/admin-guide.md "Forcing a re-classification" replaced the legacy SQL+move-file+rescan procedure with the modern non-destructive reclassify-ids workflow added in v1.17.0 + v1.18.0. The legacy approach is kept as a secondary option for unindexed files only.
user_manual/curator-guide.md "When to escalate to admin" said admin "runs the SQL" for re-classification; now points at the CLI workflow.
docs/operator-handbook.md was imprecise about API addressability after iter 44's fix. The API has external ingress (ingressExternal: true in Bicep) but no friendly hostname; reachable at the auto-generated Container Apps URL. Updated the handbook to say so accurately.

Notes

This is a minor bump (.0) because a new HTTP route was added; v1.19.x will continue the post-feature-complete hardening pattern from v1.18.x.

[1.18.21] - 2026-04-28

Added

docs/runbooks/github-actions-secrets.md -- new "How to rotate the Entra client id" section, distinct from rotating the Entra secret (which already has its own runbook). Calls out the redirect-URI verification step that the existing scripts already cover.
"What goes wrong if a secret is missing" now lists the <env>_ENTRA_CLIENT_ID failure mode (would surface "missing required parameter: entraClientId" via the bicep-params CI gate added in v1.18.14 + extended in v1.18.15).

Fixed

Corrected an attribution: postgres-secrets discovery was v1.18.6 (iter 32), not v1.18.5 (iter 31, which fixed only the workflow shape).

[1.18.20] - 2026-04-28

Fixed

docs/runbooks/branch-protection.md listed only the four CI status checks that existed at the runbook's v1.7.0 authoring (dotnet, python-sidecar, secrets-scan, version-gate). Iter 39 added doc-staleness and iter 40 added bicep-params, but the branch-protection rules were never updated to require them. Anyone reapplying the rules from this runbook would have ended up with a weaker protection set than the repo currently relies on. Updated the prose AND the gh api -X PUT ... contexts JSON.
Refined the dotnet line to reflect what that job actually does today (build + format + test, not just "build + test"), and python-sidecar to mention all three gates (ruff + mypy + pytest), not just lint + test.

[1.18.19] - 2026-04-28

Fixed

docs/runbooks/add-intake.md step 7 told operators to "Document the list of subscriptions in docs/intakes/<slug>.md" -- but docs/intakes/ doesn't exist and was never created. Replaced with an honest note that subscription intent currently lives outside Alexandria (curator's wiki / spreadsheet), with a v2-backlog hint about adding an intakes.notes column.
docs/runbooks/add-publication.md "Backfill notes" mentioned only the legacy "move the .md file to _unmatched and rescan" technique. Surfaced the modern messages list-ids | publications reclassify-ids workflow added in v1.17.0 (with --dry-run from v1.18.0) as the primary recommendation. The legacy technique stays as a secondary option for files that haven't been indexed yet.

[1.18.18] - 2026-04-28

Fixed

docs/operator-handbook.md triage section had two errors that would trip up the on-call:
- Section 1 expected service: "Alexandria.Api" from https://alexandria.fullstack.co.za/health but that hostname maps to Alexandria.Web, which returns service: "Alexandria.Web". The API isn't externally addressable in this deployment shape.
- Section 2 expected lancedbReady / embedderReady (camelCase) from the sidecar /health -- but Pydantic emits snake_case (lancedb_ready / embedder_ready) and always has. The keys were written as if they came from the .NET API, where the v1.18.2 wire-format fix would have shown up if anyone had checked. Fixed both keys and added a one-liner explaining the contract pin.
docs/runbooks/rotate-entra-secret.md step 1 suggested pwsh ./infra/entra/create-app.ps1 -ExistingAppId <APP ID> but the script has no such parameter. Anyone running it would get a script error during a rotation. Removed the dead invocation; left only the az ad app credential reset form (which is correct).

[1.18.17] - 2026-04-28

Fixed

Operationally critical: docs/runbooks/disaster-recovery.md claimed "nightly pg_dump" and alexandria-corpus-snapshots/ as restore sources, but neither exists. The alexandria-postgres-backups blob container is provisioned by Bicep, but no scheduled job writes to it. The alexandria-corpus-snapshots container isn't even in Bicep. In an actual disaster, an operator following the runbook would have hit a 404 / empty container at the worst possible moment.
- Added an "Open work blocking the full RPO/RTO promise" table at the top that lists each restore source with its current status.
- Renamed the existing sections "(target procedure once backups exist)" so future operators don't follow them assuming the infrastructure is in place.
- Added two new sections describing today's fallback procedures: "Postgres database lost (no backup available)" and "Corpus volume lost (no snapshot available)" -- both honest about what data is unrecoverable.

[1.18.16] - 2026-04-28

Fixed

docs/runbooks/audit-log-archive.md claimed "Monthly job (automated)" but no scheduled job actually exists -- the bicep provisions the destination blob container, but the archive script is operator-run today. Honesty fix: retitled to "Monthly job (operator-run)", added a Status section calling out the v2-backlog automation work to convert the script into either an Alexandria.AuditArchiver worker (Digests pattern) or a scheduled GitHub Actions workflow.

[1.18.15] - 2026-04-28

Fixed

Both infra/bicep/staging.parameters.json and production.parameters.json contained "entraClientId": { "value": "REPLACE_WITH_ENTRA_CLIENT_ID" }. This is exactly the iter-32 pattern but in placeholder form: the bicep-params CI gate from v1.18.14 only checked that values were provided, not that they were real. As-is, deploy would have succeeded but Entra auth would have failed at runtime against the literal placeholder. Moved entraClientId out of both parameters files and into CD workflow --parameters from secrets (STAGING_ENTRA_CLIENT_ID, PROD_ENTRA_CLIENT_ID), matching the postgres- credentials pattern.
gc-azure-entra requires distinct app registrations per environment; the placeholder strings made it impossible to distinguish staging from production. The new secret names enforce the separation.

Added

scripts/check-bicep-params.sh extended to flag placeholder values (REPLACE_*, CHANGE_ME, change-me, TODO, TBD, your-*, fill-in-*) in any parameters JSON file. The CI bicep-params gate now blocks both "missing provider" and "placeholder provider" cases.
docs/runbooks/github-actions-secrets.md lists the two new Entra secrets.

[1.18.14] - 2026-04-28

Added

New scripts/check-bicep-params.sh: cross-checks every required (no-default) parameter in infra/bicep/main.bicep against the parameters JSON files and the --parameters X=... arguments in the CD workflows. Fails when a required param has no provider. The structural fix for the iter 32 drift pattern where param postgresUser / param postgresPassword were declared but never provided, so az deployment group create would have failed on first run.
New CI job bicep-params invokes the script. Pure bash, no Azure CLI dependency — runs in seconds.

[1.18.13] - 2026-04-28

Added

New CI job doc-staleness: scans every README and other shipped doc for "Scaffolding only", "Build sprint pending", "coming in the next sprint" markers, and fails the build when any are found. This is the structural fix for the drift pattern caught in iter 29 (root README) and iter 38 (4 worker READMEs): docs that were stuck at "v0.1.0 scaffolding" status 17 versions later. The exclusion list covers docs/adr/_template (templates can have placeholders), CHANGELOG.md (historical entries about scaffolding releases are valid), and user_manual/ (intentional interim state until Playwright auto-capture lands).

[1.18.12] - 2026-04-28

Changed

4 worker / app READMEs updated from "Scaffolding only" / "Build sprint pending" to reflect actual implementation state. Same pattern as iter 29 (root README was stale at v0.1.0): each subsystem's own README had drifted on its own copy of the status text. Updated:
- apps/Alexandria.Web/README.md: lists actual page files and how the BFF forwards Entra tokens.
- apps/Alexandria.Api/README.md: replaced "planned endpoints" list with the actual Endpoints/*.cs mapping; mentioned the v1.18.2 round-trip test pin.
- apps/Alexandria.Drainer/README.md: notes the docker-compose default profile
  - fake-pop3 fixture wiring.
- apps/Alexandria.Parser/README.md: removed "TBD" note about link fetcher (split into Alexandria.LinkFetcher in v1.8.0); added v1.16.0 override consultation note.

[1.18.11] - 2026-04-28

Added

CI now runs dotnet format alexandria.sln --verify-no-changes between build and test, matching the ruff check pattern on the Python side. Future format drift fails fast in CI rather than accumulating quietly.

Fixed

libs/Alexandria.Core.Tests/Fixtures/GoldenEmlTests.cs: 3 [InlineData] rows had aligned whitespace (multi-space-separated columns) that dotnet format rejects. Reformatted to single-space.

[1.18.10] - 2026-04-28

Fixed

Critical CI gap: search-py's mypy src step (one of three Python CI gates, alongside ruff and pytest) was failing with 8 errors but no PR had been blocked by it because the local-dev install path skipped mypy and the gate was only reached on CI runs that landed during low-noise periods. Result: every recent PR was technically failing the type-check gate. Fixed:
- Added [[tool.mypy.overrides]] for frontmatter, pyarrow, lancedb (third-party packages with no py.typed marker, no library stubs available).
- ollama_client._cosine: cast result to float (sum() can return Any when operating on a generator over float multiplications).
- index_store.message_count: cast count_rows() to int.
- main._configure_logging: explicit BoundLogger annotation on the local so the return type isn't inferred as Any.
- main._lifespan: added AsyncIterator[None] return annotation that strict mypy was demanding for an async context manager.

[1.18.9] - 2026-04-28

Changed

README.md: status block updated from "v1.18.2" snapshot to "v1.18.x" with the full test totals across both runtimes (299 tests). Local-dev section now accurately describes what compose starts (after iter 33's drainer+parser inclusion) and what stays out (Web/Api on host for hot-reload). Added a "Common dev tasks" cheatsheet covering the most-used build/test/CLI commands so newcomers don't have to hunt through runbooks for the first 30 minutes.

[1.18.8] - 2026-04-28

Added

.env.example: documented SENDGRID_API_KEY and ALEXANDRIA_PORTAL_URL which the Digests services read at startup. Without these in the example, a dev cloning the repo would not know the digest workers expected them (the workers fall back gracefully when SendGrid is empty, but the deep-link URL in digest emails would default to localhost without the portal env var).

[1.18.7] - 2026-04-28

Added

docker-compose.yml: Drainer and Parser services were missing despite the README claiming docker compose up runs "Postgres, Python sidecar, fake POP3 server, drainer, parser". Added them as default (non-profiled) services so the local end-to-end loop (POP3 → drain → parse → index) actually runs out of the box. LinkFetcher remains under the workers profile (intentional — it's an optional augmentation, not a critical path).

[1.18.6] - 2026-04-28

Fixed

Critical CD bugs: Bicep main.bicep declares param postgresUser and param postgresPassword (required, no defaults), but neither parameter file nor either CD workflow provided them. az deployment group create would have failed with "missing required parameter". Both cd-staging.yml and cd-production.yml now pass them via secrets.
cd-production.yml was missing a setup-dotnet@v4 step before its migrations run (same bug as v1.18.5 fixed in cd-staging).

Added

docs/runbooks/github-actions-secrets.md — reference for which secrets the deploy pipelines need, why there are three secrets per environment for Postgres (separate Bicep params + a connection string for migrations), and how to rotate.
docs/runbooks/README.md: turned the "Planned runbooks" list into linked references to the actual files (every listed runbook already exists).

[1.18.5] - 2026-04-28

Fixed

Critical CD bug: cd-staging.yml was passing --build-arg PUBLISH_DIR=... to Dockerfile.dotnet, but the Dockerfile expected PROJECT=.... The docker build would have run dotnet publish with an empty PROJECT and failed. The bug was invisible because the staging deploy hadn't fired since these workflows landed. Local development was unaffected because docker-compose.yml passes PROJECT correctly.
Dockerfile.dotnet was using mcr.microsoft.com/dotnet/sdk:9.0 for the build stage even though the project targets net8.0. Now pinned to sdk:8.0 to match iteration 30's CI fix and avoid SDK version drift.
cd-staging.yml deploy job ran migrations with dotnet run but never set up the .NET SDK — would have failed at runtime. Added a setup-dotnet@v4 step.

Changed

cd-staging.yml build-images job: removed the redundant host-side dotnet publish steps now that the Dockerfile build stage handles it. Single SDK pin in the Dockerfile is the source of truth; the workflow no longer duplicates it.

[1.18.4] - 2026-04-28

Fixed

CI workflows pinned to .NET 8.0.x to match the project's <TargetFramework>net8.0</TargetFramework>. Previously installed .NET 9.0.x SDK; the SDK is forward-compatible so builds worked, but the drift was a hygiene risk for anyone reading the workflows to deduce the target framework. Affected: .github/workflows/ci.yml, cd-staging.yml, docs.yml.

Changed

Search-py: ran ruff check . --fix across the suite. Auto-fixed 40 issues (mostly from datetime import UTC modernisation and removal of unused fixture variables) and hand-fixed the remaining 5: a duplicate compound assertion in test_cli_contract.py, unused locals in test_chunker.py/test_rescan_unmatched.py, and a function name with uppercase _AND that violated PEP 8. 57/57 pytest tests still pass.

[1.18.3] - 2026-04-28

Added

CHANGELOG.md capturing v0.1.0 → v1.18.3 history. The repo had 18 tagged releases but no changelog, forcing operators to read commit messages.

Changed

README.md Status section: reset from "v0.1.0 — pre-build scaffolding. No working code yet." (out-of-date by 17 versions) to reflect the v1.18.2 reality (49/49 Api, 57/57 search-py, 38/38 CLI tests passing; all 19 ADR decisions implemented).

[1.18.2] - 2026-04-28

Fixed

Critical: SearchSidecarClient (Alexandria.Api) used JsonSerializerDefaults.Web (camelCase), but the Python sidecar emits snake_case. Every field whose JSON key contained an underscore silently deserialized to its default value: search hits had empty messageId/subject/sentAt, /api/health reported lancedbReady=false even when LanceDB was healthy, and SentAfter/SentBefore filters were serialized as sentAfter/sentBefore (which Pydantic ignored), so date-range filtering was a no-op. Fix: PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower.

Added

SidecarJsonRoundTripTests: feeds actual sidecar JSON shapes into JsonSerializer.Deserialize and asserts every field populates. Pins the wire format so future drift fails CI.

[1.18.1] - 2026-04-28

Added

CLI ↔ sidecar contract tests covering all five routes the operator CLI calls (GET /messages, GET /health, POST /index/upsert, POST /index/rebuild, POST /index/rescan-unmatched). Catches "ghost commit" failures (where a commit message claims a code change that did not actually land) by asserting the route exists, the parameters are accepted, and the response shape carries the keys the CLI binds to.

Fixed

test_walk_skips_raw_subtree: brittle '_raw' not in path.as_posix() check matched the pytest temp dir name. Now asserts on path relative to corpus_root.
test_search_stub_responds: created TestClient without with, so the FastAPI lifespan did not run and app.state.searcher was never set. Now uses TestClient as a context manager and stubs the searcher.
test_apply_filters_date_range_is_inclusive_at_lower_bound: assertion contradicted both the test name and the implementation's actual behavior (boundary row IS kept).

[1.18.0] - 2026-04-28

Added

GET /messages enumeration endpoint on the sidecar (the iteration 25 commit that claimed this had silently failed; the CLI's messages list-ids command was calling a route that did not exist). Filters by publication, subject_like, sent_after, sent_before; pages with limit/offset; sorts by sent_at desc.
external_id column on the LanceDB messages table (nullable for legacy rows). The indexer populates it from CorpusMessage.external_id; the new /messages route reads it from LanceDB on the hot path, falling back to a frontmatter disk read for rows persisted before v1.17.0. Avoids a forced quarterly-drill rebuild.
--dry-run flag for alexandria publications reclassify-ids. Previews insert/update/no-op counts plus a sample of affected external_ids by joining against the existing message_publication_overrides table.

Fixed

CLI MessagesCommand deserialization: JsonSerializerDefaults.Web produced camelCase, but the Pydantic sidecar emits snake_case. Setting PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower makes the C# records bind correctly.

[1.17.0] - 2026-04-27

Added

Operator CLI: messages list-ids, publications reclassify-ids. Curator workflow for reclassifying a specific list of messages by RFC 2822 Message-ID. Pipe messages list-ids output directly into publications reclassify-ids. (NB: the GET /messages route this depends on did not actually ship in v1.17.0; v1.18.0 repaired the gap.)

[1.16.0] - 2026-04-27

Added

Parser consults message_publication_overrides (V008) before falling back to publications.match_rules. Manual curator decisions win over auto-classification.

[1.15.0] - 2026-04-27

Added

V008 migration: first-class message_publication_overrides table.
Sidecar /index/rescan-unmatched: walks /corpus/_unmatched/ and forces a re-pass through the indexer. CLI's publications rescan <slug> calls this.

[1.14.0] - 2026-04-27

Added

Operator CLI: publications rescan, publications reclassify (subject-pattern scope), intakes add, intakes deactivate.

[1.13.0] - 2026-04-26

Added

Operator handbook (docs/runbooks/operator-handbook.md).
PAT pepper-binding regression test.
CLI smoke-test harness.

[1.12.0] - 2026-04-26

Added

alexandria operator CLI tool (tools/Alexandria.Cli) with the intakes, publications, messages, pats, sidecar, audit subcommands.

Fixed

"Captured options" anti-pattern: IOptions<T> values read synchronously in Program.cs before app.Build() get stale snapshots. Test factories adding config in ConfigureAppConfiguration weren't being applied. Now everything reads via IOptions<T> inside factory delegates.

[1.11.0] - 2026-04-26

Added

PAT bearer end-to-end tests.
Workers (parser, link-fetcher) in docker-compose.yml.
Sidecar negative-path tests.

[1.10.0] - 2026-04-26

Added

Link-fetcher full-loop test.
Sidecar contract tests (structural).
Security policy doc (SECURITY.md).

[1.9.0] - 2026-04-26

Added

Link-fetcher in Bicep infrastructure.
LanceDB e2e tests.
Brand DOCX template for client deliverables.

[1.8.0] - 2026-04-26

Added

Link-fetcher worker service (out-of-band content fetching, all domains).
Playwright spec for the screenshot path.

[1.7.0] - 2026-04-26

Added

User manual (user_manual/ in DOCX/MD/PDF, branded).
CODEOWNERS, branch protection runbook, PR template.

[1.6.0] - 2026-04-26

Added

Project glossary (docs/glossary.md).
Credential rotation log.
Link-pipeline + image-enricher HTTP tests.

[1.5.0] - 2026-04-26

Added

Playwright tests for new Blazor pages.
BenchmarkDotNet harness for parser perf.
DocFX API reference site.

[1.4.0] - 2026-04-26

Added

API auth-scheme tests.
Rate-limit exhaustion test.
Authenticated happy-path tests.

[1.3.0] - 2026-04-26

Added

Saved-search digest tests.
API integration tests.
Two more EML fixtures.

[1.2.0] - 2026-04-26

Added

Cmd+K palette in the Blazor UI.
Saved-search digest worker.
Sidecar indexer + searcher tests.

[1.1.0] - 2026-04-26

Added

Saved searches, triage queue, message detail page.
Rate limiting on /api/search.

[1.0.0] - 2026-04-26

Added

Link-fetcher + image enricher (kimi-k2.5:cloud OCR per ADR 0018).
LanceDB rebuild integration tests.
Playwright UI spec.

[0.9.0] - 2026-04-26

Added

Real LanceDB hybrid search (BM25 + vector + RRF, k=60 per ADR 0006).
Indexer (chunks → embeddings → upsert).
Ollama client (embed + rerank, with cosine fallback).

[0.8.0] - 2026-04-26

Added

Bicep infrastructure (Container Apps, Postgres, Blob NFS v3 in South Africa North per ADR 0009).
Golden EML test fixtures.
OpenTelemetry instrumentation across the stack.

[0.7.0] - 2026-04-26

Added

Blazor Server + MudBlazor UI shell.
GitHub Actions CI/CD pipeline.
Operational runbooks (LanceDB rebuild, PAT rotation, disaster recovery, etc.).

[0.6.0] - 2026-04-26

Added

Alexandria.Auth library: Entra OIDC + PAT bearer authentication, role policies.

[0.5.0] - 2026-04-26

Added

Drainer worker (POP3 → _raw/ EML cache via MailKit).
Parser worker (_raw/ → corpus markdown via the V0.4 ingest pipeline).
Entra app-registration scripts.

[0.4.0] - 2026-04-26

Added

Full EML → Markdown ingest pipeline: AngleSharp HTML pre-cleaner, ReverseMarkdown.NET, YamlDotNet frontmatter writer with Iso8601DateTimeOffsetConverter, deterministic ULID from external_id.

[0.3.0] - 2026-04-26

Added

Publication matcher (FromAddresses/ListIds/SubjectPatterns rules).
HTML pre-cleaner.
HTML → Markdown conversion.
Link extractor.

[0.2.0] - 2026-04-26

Added

Initial scaffold: .NET solution, frontmatter library, DbUp migrations V001–V003, docker-compose stack, Python sidecar skeleton.

[0.1.0] - 2026-04-26

Added

19 ADRs documenting the architecture decisions agreed during the §13 / Gap interview.

Release notes

Changelog

[1.19.41] - 2026-04-28

Added

Changed

[1.19.40] - 2026-04-28

Fixed

Notes

[1.19.39] - 2026-04-28

Fixed

Added

Notes

[1.19.38] - 2026-04-28

Operational (no code changes shipped)

[1.19.37] - 2026-04-28

Added

Fixed

Notes

[1.19.36] - 2026-04-28

Added

Changed

Notes

[1.19.35] - 2026-04-28

Added

Changed

Notes

[1.19.34] - 2026-04-28

Changed

Notes

[1.19.33] - 2026-04-28

Added

Notes

[1.19.32] - 2026-04-28

Added

Notes

[1.19.31] - 2026-04-28

Added

Notes

[1.19.30] - 2026-04-28

Added

Notes

[1.19.29] - 2026-04-28

Added

Notes

[1.19.28] - 2026-04-28

Added

Notes

[1.19.27] - 2026-04-28

Added

Notes

[1.19.26] - 2026-04-28

Added

Notes

[1.19.25] - 2026-04-28

Added

Notes

[1.19.24] - 2026-04-28

Added

Notes

[1.19.23] - 2026-04-28

Added

Notes

[1.19.22] - 2026-04-28

Added

Notes

[1.19.21] - 2026-04-28

Added

Notes

[1.19.20] - 2026-04-28

Added

Notes

[1.19.19] - 2026-04-28

Added

Notes

[1.19.18] - 2026-04-28

Added

Notes

[1.19.17] - 2026-04-28

Added

Changed