ACE

Atomic Constraint Engine
core online solver js+gpu mem flat ▮ ready

A construction project is millions of constraints, each waiting on the world. ACE simulates every one — and when a constraint has no data, it refuses to invent a date.

Full Monte Carlo over the whole network. Dark atoms block hard. The forecast can never be more confident than your real information. Runs in this browser — nothing leaves your device.

Throughput · 3 atoms → 3 billion

Does it scale? gpu?

All-resident solvers (JS, GPU) hold the whole network in memory — they hit a wall in the low millions, marked in red. Fused streaming + GPU holds only a moving window and ships each window to the GPU: it simulates all the way to 3 billion atoms while memory stays flat. Simulate the whole thing; keep only the active window; spill the rest.



JS · all-resident GPU · all-resident Fused stream+GPU · flat memory ▌ memory wall
One Monte Carlo sample

A constraint with no data kills everything behind it

An atom is ready when every gate it holds has flipped — all its predecessors finished. Then its own work takes a sampled duration. A dark atom (no data from the world) never finishes, and everything downstream goes dark with it. That's the engine refusing to fake a date.

Why two loops

Streaming for memory, samples for speed

The outer loop walks the network in windows — only one window is ever in memory, so a 3-billion-atom project costs the same RAM as a 3-thousand-atom one. The inner axis is the samples: every Monte Carlo run is independent, so each window ships to the GPU as thousands of parallel threads. Memory comes from the outer loop; speed comes from the inner one.

One citizen, stored as columns

The atom

There is exactly one type. An atom is identity + state + the gates it holds — its inputs only. It never tracks what depends on it; that's derived by inverting the inputs once. For speed and GPU-portability, atoms aren't objects — they're columns of typed arrays. This is the form the engine actually runs.


  
The exact shader that runs in tab 01

GPU compute kernel · WGSL

Not a mockup. One thread = one full Monte Carlo sample, walking the entire window: real gates via CSR arrays, real dark-blocking via a −1 sentinel (finish time can never be negative, so −1 means "missing — no data from the world"), real conjunctive deadness. Same computation as the JS path — the P80s must agree within Monte-Carlo noise.


  
The API that drives it

WebGPU dispatch

Adapter → device → buffers → bind group → dispatch → read back. Windows tile into passes so per-thread scratch never exceeds GPU memory. That tiling is the honest answer to the memory wall.