Dec 23, 2025

9 min read

Blackship Architecture: State Machines, Dependency Graphs, and Resilience Patterns

architecture state-machines freebsd jails resilience design-patterns

Most jail managers track state with flags. is_running: bool. Maybe a PID file. When something goes wrong, you’re left guessing: Is it starting? Stopping? Half-crashed? Should I kill it?

Blackship takes a different approach. Every jail is a state machine with explicit transitions. Every startup sequence respects a dependency graph. Every restart uses circuit breakers with exponential backoff. Here’s how it works.

The Jail State Machine¶

stateDiagram-v2
    [*] --> Stopped
    Stopped --> Starting: start()
    Starting --> Running: started()
    Running --> Stopping: stop()
    Stopping --> Stopped: stopped()

    Starting --> Failed: fail()
    Running --> Failed: fail()
    Stopping --> Failed: fail()

    Failed --> Stopped: recover()

Five states. Six events. Every transition is explicit.

Why This Matters¶

Consider what happens when you run blackship up web:

Stopped → Starting: The start() event triggers. Hooks with phase = "pre_start" execute.
Starting → Running: After jail creation succeeds, started() fires. Hooks with phase = "post_start" execute.
If anything fails: The fail() event moves the jail to Failed state. No ambiguity.

Compare to flag-based systems:

# Typical approach
jail.is_running = True
jail.start()  # What if this fails halfway?
# Now is_running is True but the jail isn't actually running

With explicit state machines, invalid transitions are rejected:

blackship> stop web
Error: Cannot stop jail 'web' - current state is Stopped (expected Running)

You can’t stop something that isn’t running. You can’t start something that’s already starting. The state machine enforces this.

Dynamic Dispatch Mode¶

The state machine uses dynamic dispatch at runtime, allowing external events (health check failures, manual commands, supervisor signals) to drive transitions. This is the difference between:

Compile-time state machines: Good for protocols with fixed sequences
Runtime state machines: Good for reactive systems where external events arrive unpredictably

Jails are reactive. A jail can fail at any moment. A user can stop it at any moment. Health checks run on intervals. Dynamic dispatch handles all of this.

Dependency Graphs with Topological Ordering¶

When jails depend on each other, order matters. You can’t start your app before the database. You shouldn’t stop the database while the app is using it.

Blackship uses petgraph to build a directed acyclic graph (DAG) of jail dependencies:

[[jails]]
name = "app"
depends_on = ["cache", "database"]

[[jails]]
name = "cache"
depends_on = ["database"]

[[jails]]
name = "database"

This creates:

graph LR
    database --> cache
    database --> app
    cache --> app

Startup Order (Topological Sort)¶

When you run blackship up app, the dependency graph is walked:

Find all transitive dependencies of app
Topologically sort them
Start in order: database → cache → app

Each jail waits for its dependencies to reach Running state before starting.

Shutdown Order (Reverse Topological Sort)¶

When you run blackship down app:

Find all jails that depend on app (reverse dependencies)
Topologically sort (reversed)
Stop in order: app → cache → database

This ensures nothing is stopped while something else depends on it.

Cycle Detection¶

Circular dependencies are caught at config validation:

[[jails]]
name = "a"
depends_on = ["b"]

[[jails]]
name = "b"
depends_on = ["a"]  # Error: Cycle detected

blackship check
Error: Dependency cycle detected: a → b → a

The Warden: Resilience Through Circuit Breakers¶

When a jail crashes, the naive approach is: restart immediately. Forever.

This creates restart loops. The jail crashes, restarts, crashes in 100ms, restarts, crashes, restarts… CPU spins. Logs fill up. Nothing improves.

The Warden (Blackship’s supervisor) implements three resilience patterns:

1. Exponential Backoff¶

Attempt 1: Wait 1 second
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds
...
Attempt N: Wait min(2^N, 60) seconds

With jitter (±50%) to prevent thundering herd if multiple jails fail simultaneously.

2. Circuit Breaker¶

After 5 consecutive failures, the circuit opens:

stateDiagram-v2
    [*] --> CLOSED
    CLOSED --> OPEN: 5 failures
    OPEN --> HALF_OPEN: 5 minutes timeout
    HALF_OPEN --> CLOSED: success
    HALF_OPEN --> OPEN: failure

    note right of CLOSED: Normal operation\nRestarts allowed
    note right of OPEN: No restarts\nWaiting for timeout
    note right of HALF_OPEN: Test one restart

When the circuit is open, no restart attempts are made. This prevents wasting resources on a jail that clearly can’t run.

After 5 minutes, the circuit moves to half-open. One restart attempt is made. If it succeeds, we’re back to normal. If it fails, the circuit opens again.

3. Per-Jail State Tracking¶

Each jail has its own:

Attempt counter
Backoff calculator
Circuit breaker

A failing Redis jail doesn’t affect the PostgreSQL jail’s restart behavior. Isolation at every level.

Combining the Patterns¶

Jail 'web' crashes
├── Attempt 1: Wait 1.2s (jittered), restart → fails
├── Attempt 2: Wait 2.4s, restart → fails
├── Attempt 3: Wait 4.1s, restart → fails
├── Attempt 4: Wait 8.3s, restart → fails
├── Attempt 5: Wait 15.9s, restart → fails
├── Circuit OPENS (5 failures reached)
├── No restarts for 5 minutes
├── Circuit HALF-OPEN
├── Attempt 6: restart → succeeds!
├── Circuit CLOSED, attempt counter reset
└── Normal operation

Lifecycle Hooks: Extensibility Without Complexity¶

Hooks run at defined phases. Each hook specifies:

Phase: When to run (pre_start, post_start, pre_stop, post_stop, etc.)
Target: Where to run (host or jail)
Command: What to run
On Failure: What to do if it fails (abort or continue)

[[jails.hooks]]
phase = "post_start"
target = "jail"
command = "/etc/rc.d/nginx start"
on_failure = "abort"

[[jails.hooks]]
phase = "pre_stop"
target = "jail"
command = "/etc/rc.d/nginx stop"
on_failure = "continue"

Execution Flow¶

sequenceDiagram
    participant CLI as blackship up
    participant SM as State Machine
    participant Hooks as Hook Runner
    participant Jail as Jail FFI
    participant Net as Network

    CLI->>SM: start()
    SM->>SM: Stopped → Starting
    SM->>Hooks: pre_start (host)
    Hooks-->>SM: ok
    SM->>Jail: jail_set()
    Jail-->>SM: jid
    SM->>Hooks: post_create
    Hooks-->>SM: ok
    SM->>Net: setup VNET
    Net-->>SM: ok
    SM->>Hooks: pre_start (jail)
    Hooks-->>SM: ok
    SM->>SM: started()
    SM->>SM: Starting → Running
    SM->>Hooks: post_start (jail)
    Hooks-->>SM: ok
    SM-->>CLI: Running

If any hook with on_failure = "abort" fails, the entire operation aborts and the jail transitions to Failed.

Variable Substitution¶

Hooks support variable substitution:

command = "/path/to/script --jail ${JAIL_NAME} --path ${JAIL_PATH}"

Available variables: JAIL_NAME, JAIL_PATH, JAIL_IP, JAIL_HOSTNAME, custom environment variables.

ZFS Integration: Not Bolted On¶

ZFS isn’t an afterthought. The entire data model assumes ZFS:

zroot/blackship/
├── jails/
│   ├── web/
│   ├── postgres/
│   └── redis/
├── releases/
│   └── 15.0-RELEASE/
└── cache/

Snapshots as First-Class Operations¶

blackship snapshot create web pre-upgrade

This creates zroot/blackship/jails/web@pre-upgrade. Atomic. Consistent. No tar.gz nonsense.

Clones for Testing¶

blackship clone web@pre-upgrade web-test

This creates zroot/blackship/jails/web-test as a clone of the snapshot. Copy-on-write. Instant. Uses almost no additional disk space until you make changes.

Export with ZFS Send¶

blackship export web -o backup.zfs --zfs-send

Uses zfs send to create a stream. Faster than tar. Preserves all ZFS properties.

Import with ZFS Receive¶

blackship import backup.zfs --name web-restored

Auto-detects format (tar.zst or ZFS stream) and handles appropriately.

VNET Networking Architecture¶

graph TB
    subgraph Host["Host System"]
        bridge["blackship0 Bridge<br/>gateway: 10.0.1.1"]
        epair0a["epair0a"]
        epair1a["epair1a"]
    end

    subgraph web["Jail: web"]
        epair0b["epair0b<br/>10.0.1.10"]
    end

    subgraph db["Jail: db"]
        epair1b["epair1b<br/>10.0.1.11"]
    end

    bridge --- epair0a
    bridge --- epair1a
    epair0a <--> epair0b
    epair1a <--> epair1b

Each jail gets:

An epair interface (virtual ethernet pair)
One end attached to the bridge (host-side)
One end inside the jail
Static IP on the jail-side interface
Gateway pointing to the bridge IP

PF Integration via Anchors¶

Port forwarding uses PF anchors to avoid modifying /etc/pf.conf:

# Added to /etc/pf.conf once
rdr-anchor "blackship"
anchor "blackship"

Blackship manages rules inside the anchor:

blackship expose web -p 80
# Adds: rdr pass on $ext_if proto tcp to port 80 -> 10.0.1.10 port 80

blackship expose web -p 443 -I 192.168.1.100
# Adds: rdr pass on $ext_if proto tcp from any to 192.168.1.100 port 443 -> 10.0.1.10 port 443

No manual PF editing. No config file conflicts.

The Bridge: Central Orchestrator¶

All operations go through the Bridge (not the network bridge - the orchestration component):

graph TB
    subgraph Bridge["Bridge (Central Orchestrator)"]
        manifest["Manifest<br/>(TOML config)"]
        network["Network<br/>Manager"]
        zfs["ZFS<br/>Manager"]
        hooks["Hook<br/>Runner"]
        health["Health<br/>Checker"]
        ffi["Jail<br/>FFI"]
    end

    subgraph Warden["Warden (Supervisor)"]
        backoff["Exponential<br/>Backoff"]
        breaker["Circuit<br/>Breaker"]
        restart["Restart<br/>Logic"]
    end

    Bridge --> Warden
    health --> Warden

The Bridge:

Loads and validates the manifest (TOML config)
Builds the dependency graph
Coordinates with Network Manager for VNET setup
Delegates to ZFS Manager for dataset operations
Runs hooks at appropriate lifecycle phases
Calls Jail FFI for actual jail operations
Reports events to the Warden for supervision

Health Check Architecture¶

Health checks are command-based. Exit code determines health:

Exit 0: Healthy
Exit non-zero: Unhealthy

[[jails.healthcheck.checks]]
name = "http"
command = "curl -sf http://localhost:8080/health"
target = "jail"
interval = 30
timeout = 10
retries = 3

Execution Model¶

Health checks run on separate threads (via crossbeam)
Each check has its own timeout
After retries consecutive failures, the jail is marked unhealthy
Unhealthy status is reported to the Warden
Warden applies restart logic with circuit breaker

Target Semantics¶

target = "host": Command runs on the host, can check external ports
target = "jail": Command runs inside the jail via jexec

Direct Kernel Communication¶

Blackship doesn’t shell out to jail(8) or ifconfig(8) for core operations. It talks directly to the kernel via jail(2) and ioctl(2) syscalls.

What this means for you:

Faster startup when launching multiple jails
No parsing command output that changes between FreeBSD versions
Health checks don’t spawn processes every 30 seconds

What still uses commands:

ZFS operations (zfs(8)) - no kernel API available
PF rules (pfctl(8)) - anchor-based, doesn’t touch /etc/pf.conf

Key Architectural Decisions¶

1. State Machines Over Flags¶

Flags lie. A boolean is_running doesn’t capture “starting”, “stopping”, or “failed but still has a PID”. State machines make these explicit.

2. Graphs Over Lists¶

Dependencies aren’t flat. A depends on B and C. B depends on C. Representing this as a graph allows proper ordering, cycle detection, and transitive dependency resolution.

3. Circuit Breakers Over Infinite Retries¶

Systems fail. Sometimes they can’t be fixed by restarting. Circuit breakers recognize this and stop trying, preserving resources for jails that can actually run.

4. ZFS Native Over Abstraction Layers¶

Many tools treat ZFS as optional. Blackship assumes ZFS for its data model. Snapshots, clones, and send/receive are first-class operations, not afterthoughts.

5. Hooks Over Magic¶

Instead of hardcoding nginx startup or PostgreSQL initialization, hooks let users define what happens at each lifecycle phase. Maximum flexibility, zero magic.

That’s the architecture. State machines for lifecycle. Graphs for dependencies. Circuit breakers for resilience. ZFS for storage. Hooks for extensibility.

GitHub | Full Documentation

🔗 Interstellar Communications

No transmissions detected yet. Be the first to establish contact!

• Link to this post from your site• Share your thoughts via webmention• Join the IndieWeb conversation

Blackship: A FreeBSD Jail Orchestrator That Understands State

Announcing Blackship - declarative jail management with dependency graphs, state machines, circuit breakers, and ZFS-first design.

freebsdjailsopen-source

Dec 23, 2025

Blackship vs The Galaxy: FreeBSD Jail Managers Compared

Docker, Podman, Bastille, CBSD, iocage, pot. Which jail manager actually fits your workflow? A brutally honest comparison.

freebsdjailscontainers

Dec 23, 2025

Operation: From State Hero to Zero

The surgical breakdown of a 1.6k LOC Ruby monolith into focused modules. Or: how I performed open-heart surgery on a dying codebase and lived to tell the tale.

rubyrefactoringarchitecture

Jun 19, 2025

Back to all posts

The Jail State Machine¶

Why This Matters¶

Dynamic Dispatch Mode¶

Dependency Graphs with Topological Ordering¶

Startup Order (Topological Sort)¶

Shutdown Order (Reverse Topological Sort)¶

Cycle Detection¶

The Warden: Resilience Through Circuit Breakers¶

1. Exponential Backoff¶

2. Circuit Breaker¶

3. Per-Jail State Tracking¶

Combining the Patterns¶

Lifecycle Hooks: Extensibility Without Complexity¶

Execution Flow¶

Variable Substitution¶

ZFS Integration: Not Bolted On¶

Snapshots as First-Class Operations¶

Clones for Testing¶

Export with ZFS Send¶

Import with ZFS Receive¶

VNET Networking Architecture¶

PF Integration via Anchors¶

The Bridge: Central Orchestrator¶

Health Check Architecture¶

Execution Model¶

Target Semantics¶

Direct Kernel Communication¶

Key Architectural Decisions¶

1. State Machines Over Flags¶

2. Graphs Over Lists¶

3. Circuit Breakers Over Infinite Retries¶

4. ZFS Native Over Abstraction Layers¶

5. Hooks Over Magic¶

Share Mission Log

🔗 Interstellar Communications

Related Posts

Blackship: A FreeBSD Jail Orchestrator That Understands State

Blackship vs The Galaxy: FreeBSD Jail Managers Compared

Operation: From State Hero to Zero