Skip to content

Data

This chapter prescribes the data layer for a Go service: the database engine, the query toolchain, the migration tool, the encryption-at-rest column shape, the discipline that keeps JSONB fields tractable, soft-delete and tenant-scoping invariants, and the LISTEN / NOTIFY pattern used for in-process eventing. The RFC 2119 keywords used here (MUST, MUST NOT, SHOULD, SHOULD NOT, MAY) carry the senses defined in chapter 00-introduction.md. The prescriptions in this chapter populate the store.go layer defined in 01-architecture.md.

TL;DR

  • PostgreSQL is the canonical relational engine. Adopting orgs MAY swap engines but MUST account for the loss of advisory locks, LISTEN / NOTIFY, and JSONB.
  • sqlc generates typed accessors from annotated .sql files; hand-written query strings SHOULD NOT appear in domain code.
  • goose runs migrations from an embed.FS and MUST acquire a PostgreSQL advisory lock before applying any migration.
  • Encrypted columns MUST carry a (key_id, algorithm, ciphertext) shape so that keys MAY be rotated and algorithms MAY evolve without rewriting the storage layer.
  • Soft-deleted rows MUST be filtered with deleted_at IS NULL in every read path; multi-tenant tables MUST be filtered by org_id (or the equivalent tenant key) in every read path.

Why this choice

  • PostgreSQL offers JSONB, LISTEN / NOTIFY, advisory locks, proper transactional DDL, and a mature ecosystem of pure-Go drivers. These features simplify the rest of the stack and remove the need for several auxiliary services.
  • sqlc turns annotated SQL into typed Go accessors. The generated code keeps domain logic free of string-interpolated SQL, avoids ORM overhead, and surfaces type drift at compile time rather than at runtime.
  • goose is a thin migration tool with first-class support for embed.FS, advisory locks, and timestamped migration files. It MUST be paired with the advisory-lock guard so that two replicas starting simultaneously cannot race to apply the same migration.
  • (key_id, algorithm, ciphertext) columns make key rotation a row-by-row rewrite rather than a downtime event. A new key MAY be introduced with a new key_id while the prior key is still available for decrypting historical rows.
  • JSONB discipline prevents the field from becoming an ungoverned dumping ground. A documented schema per JSONB column keeps the field as ergonomic as a typed column without losing flexibility.
  • LISTEN / NOTIFY offers a low-latency in-process event channel without standing up a separate message broker for workloads where messages MAY be lost on restart.

Prescriptive guidance

Engine and driver

  • Services MUST default to PostgreSQL 16 or newer.
  • Services MUST use pgx (v5) as the driver and connection pool. The standard library database/sql adapter SHOULD NOT be used for new code because it loses pgx's typed-parameter benefits.
  • Services SHOULD wrap the pool with an OpenTelemetry tracer (such as otelpgx) and a query-level tracer that records the call site for slow-query diagnosis. The two tracers MAY be composed via a multi-tracer adapter.
  • The database pool MUST be constructed in the composition root and injected into stores; package-level *pgxpool.Pool globals MUST NOT be used.

sqlc workflow

  • Queries MUST be authored as annotated SQL files under <repo>/internal/server/database/queries/<domain>.sql.
  • Each query MUST carry an -- name: <Name> :<kind> annotation (:one, :many, :exec, :execrows, etc.) so sqlc generates a typed Go function with the correct return shape.
  • sqlc generate MUST run as part of the build pipeline. The generated *.sql.go files MUST be checked in so that consumers who do not run sqlc still build the project.
  • Domain code MUST NOT contain hand-written query strings. Dynamic queries (for example, optional filter combinations) MAY use a SQL builder such as squirrel, but the use MUST be justified in code review and SHOULD be replaced with sqlc- generated queries when the variant set is bounded.

goose migrations

  • Migrations MUST live under <repo>/internal/server/database/migrations/.
  • Migration files MUST follow goose's NNNNNNNNNNNN_<slug>.sql naming with explicit -- +goose Up and -- +goose Down blocks.
  • The migration runner MUST embed the migrations directory via //go:embed so the binary ships migrations without a separate artifact.
  • The migration runner MUST acquire a PostgreSQL advisory lock (for example, pg_advisory_lock(<numeric_key>)) before applying migrations. The lock MUST be released after the run whether the run succeeds or fails.
  • Migrations MUST be reversible when feasible. A destructive Down step MAY be omitted only when documented in the migration body and reviewed.
  • Migrations MUST run before the HTTP / RPC server starts accepting traffic; a failed migration MUST cause the binary to exit non-zero.

Encryption-at-rest column shape

Columns holding encrypted user data MUST be stored as (key_id, algorithm, ciphertext) triples. The minimum shape:

CREATE TABLE secrets (
    id           uuid PRIMARY KEY,
    org_id       uuid NOT NULL,
    key_id       text NOT NULL,
    algorithm    text NOT NULL,
    ciphertext   bytea NOT NULL,
    nonce        bytea NOT NULL,
    created_at   timestamptz NOT NULL DEFAULT now(),
    deleted_at   timestamptz
);
  • key_id MUST identify the wrapping key version so that a new key MAY be introduced without losing the ability to decrypt historical rows.
  • algorithm MUST identify the algorithm (for example, AES-256-GCM) so that algorithms MAY evolve over time.
  • nonce MUST be stored when the algorithm requires it; nonces MUST be unique per (key_id, ciphertext) pair.
  • Encryption SHOULD use envelope encryption with a key-management service holding the key-encryption key; the data-encryption keys SHOULD be stored alongside the ciphertext or derived from the wrapping key on demand.

JSONB discipline

  • Each JSONB column MUST have a documented schema, ideally authored as a JSON Schema file under <repo>/internal/server/database/jsonb-schemas/<column>.schema.json.
  • Domain code MUST validate JSONB writes against the schema before insert. Stale rows MAY violate the schema; readers MUST be prepared either to migrate the row in place or to reject it with a typed error.
  • Indexes on JSONB fields MUST be created with the appropriate GIN operator class (jsonb_path_ops for containment queries, the default operator class for ? / ?& / ?| queries).
  • JSONB columns MUST NOT be used as a replacement for missing schema discipline. A field that is always read with a fixed shape MUST be promoted to a typed column.

Soft delete and tenant scoping

  • Tables holding user data MUST include a deleted_at timestamptz NULL column. Deletes MUST set deleted_at = now() rather than removing the row.
  • Every read query against a soft-deletable table MUST include WHERE deleted_at IS NULL (or a sqlc-annotated equivalent). Reviewers MUST treat the filter as part of the query's signature; a missing filter is a defect, not a stylistic preference.
  • Multi-tenant tables MUST include an org_id column (or the organization's equivalent tenant key). Every read and write MUST scope by org_id. Cross-tenant reads MUST be explicit, justified, and reviewed.
  • Foreign-key constraints across tenants SHOULD be denied at the schema level; cross-tenant references SHOULD use opaque identifiers that are validated at the application layer rather than enforced by referential integrity.

LISTEN / NOTIFY for in-process eventing

  • Services that need low-latency intra-cluster eventing MAY use PostgreSQL LISTEN / NOTIFY.
  • Notifications MUST be treated as best-effort; consumers MUST reconcile state from the canonical row on receipt rather than rely on the notification payload as the source of truth.
  • A dedicated listener connection MUST be held outside the main pool so that a long-running LISTEN does not exhaust the application's pool.
  • Channel names MUST be namespaced by domain (for example, credentials_changed rather than changed) to avoid collision across services that share a database.
sequenceDiagram
    participant Writer as Service (writer)
    participant DB as PostgreSQL
    participant Listener as Service (listener)
    Writer->>DB: BEGIN; UPDATE row ...; NOTIFY channel, 'row_id'
    DB-->>Writer: COMMIT
    DB-->>Listener: NOTIFY channel, payload
    Listener->>DB: SELECT row WHERE id = payload AND deleted_at IS NULL AND org_id = ?
    DB-->>Listener: canonical row
    Listener->>Listener: apply side effect (idempotent)

Reference Implementation: Pioneer

The data prescriptions are exercised end to end in the canonical reference implementation:

  • pioneer/sqlc.yaml configures sqlc to generate accessors from annotated SQL with pgx/v5-compatible types, explicit int64 handling, and per-package output directories that align with the three-layer domain shape.
  • pioneer/internal/server/database/pool.go builds the pgx pool, layers an otelpgx tracer on top, and composes a project-local DBQueryTracer for query-level instrumentation via a multi-tracer adapter.
  • pioneer/internal/server/database/queries/credentials.sql shows the -- name: ... :one / :many annotation convention, tenant scoping via org_id, and soft-delete filters via deleted_at IS NULL applied uniformly across read paths.

Pinned versions (snapshot 2026-05-08)

Component Version Notes
PostgreSQL 16.x Logical replication and JSONB stable
pgx v5.x Pure-Go driver and pool
otelpgx latest stable OpenTelemetry instrumentation for pgx
sqlc v1.27.x or newer pgx/v5 target
goose v3.x Embedded migrations, advisory lock
squirrel v1.5.x Dynamic filtering when sqlc variants explode

Adopting orgs MUST validate these versions against their own toolchain constraints at adoption time. The snapshot date above matches last-reviewed in this chapter's frontmatter.

Pitfalls

  • Missing advisory lock during goose run. Two replicas starting concurrently MAY race to apply the same migration. The runner MUST acquire pg_advisory_lock(<key>) for the duration of the run, including the case where the runner exits exceptionally.
  • Encrypting without a key_id column. Storing only the ciphertext forecloses key rotation. New tables MUST carry (key_id, algorithm, ciphertext) (plus nonce when the algorithm requires it) from the first migration.
  • JSONB as a free-for-all. Without a documented schema, JSONB drifts to the point where readers cannot make assumptions. Each JSONB column MUST have a schema file and writer-side validation; readers MUST handle stale rows explicitly.
  • Forgetting deleted_at IS NULL on a single query. A read path that omits the filter quietly resurrects deleted rows. Reviewers MUST treat the filter as part of the type signature of any query against a soft-deletable table.
  • Cross-tenant reads without an explicit override. A query that omits org_id returns rows for every tenant. Reviewers MUST require an explicit, justified override on any cross-tenant read, and MUST verify that the override is logged.
  • Holding LISTEN connections inside the main pool. A LISTEN pins a connection for its lifetime; left in the main pool it starves application queries. The listener MUST hold a dedicated connection outside the application pool.
  • Notification payloads treated as the source of truth. Notifications are best-effort; consumers MUST refetch the canonical row before acting.

See also

  • 00-introduction.md — RFC 2119 keyword definitions and doc conventions used throughout this chapter.
  • 01-architecture.md — the three-layer pattern whose store.go files call into the accessors prescribed here.
  • 03-surface.md — the RPC surface that calls into the domain services backed by this data layer.
  • 06-security.md — authentication and authorization that tie into the org_id scoping prescribed in this chapter (sibling casting).
  • sqlc documentation: https://docs.sqlc.dev/
  • goose documentation: https://github.com/pressly/goose
  • pgx documentation: https://pkg.go.dev/github.com/jackc/pgx/v5