Diamond DAG: Why BigBrotr's Architecture Works

Mar 1, 2025

When designing BigBrotr, we faced a fundamental question: how do you build a system that discovers, monitors, and archives data from thousands of relays without creating a distributed systems nightmare?

The Temptation of Microservices

The obvious approach is microservices with message queues. Seeder publishes to a queue, Finder consumes and publishes, Validator consumes and publishes, and so on. Each service would have its own database, its own deployment, and its own failure modes.

We rejected this approach for BigBrotr. Here’s why:

Operational complexity — message queues add another system to monitor, tune, and debug.
Ordering guarantees — relay discovery and validation have natural ordering that’s trivial in a database but complex in a message queue.
Query flexibility — analytics across the full dataset require all data in one place.
Debugging — SELECT * FROM relay WHERE url = '...' is easier than tracing messages through queue logs.

The Database as Integration Point

Instead, all six services share a single PostgreSQL database. One service writes, another reads. The database handles consistency, durability, and concurrency.

This means:

No message loss — data is in PostgreSQL with ACID guarantees.
No ordering issues — queries read the current state.
No backpressure — services run at their own pace.
Simple debugging — all data is queryable with SQL.

The Diamond DAG

The codebase itself follows a strict dependency structure:

              services
             /   |   \
          core  nips  utils
             \   |   /
              models

Imports only flow downward. This eliminates circular dependencies and makes the architecture self-documenting. When you look at a file’s imports, you immediately know where it sits in the hierarchy.

The “diamond” shape comes from the convergence at models — all middle-layer packages depend on the same foundation.

Independent Services

Each service inherits from BaseService[ConfigT] and implements a single async def run() method. They:

Run as separate OS processes
Have independent configuration files
Connect to the database through PgBouncer
Scale independently (run multiple Finders, skip the Synchronizer)
Fail independently (Monitor crash doesn’t affect Validator)

There’s no orchestrator, no service mesh, no dependency injection framework. Just processes reading from and writing to a database.

Lessons Learned

Start simple — a shared database beats a message queue for most data processing workloads.
Enforce boundaries — the diamond DAG prevents architectural erosion over time.
Content-address everything — SHA-256 deduplication eliminates an entire class of data consistency bugs.
Fail fast — frozen dataclasses with __post_init__ validation catch errors at construction, not at use.

The full architecture documentation is in the Architecture Overview.