Import Pipeline Optimization — SQL-First Device Message Processing at TrackLab

There is a particular flavour of denial that sets in when you watch PHP parse a JSONB field one row at a time and think, “this is fine.”

For months, that was us. TrackLab manages thousands of solar trackers spread across farms, and every device message triggers a chain of reconciliation: resolving parent relationships, detecting device types, assigning placements. Each Network Control Unit (NCU) sends a payload, and our PHP reconcilers would dutifully loop through every device, extracting fields, resolving parents, doing work that PostgreSQL was already better equipped to handle. We knew it. We just hadn’t stopped long enough to fix it.

This is the story of how we moved that processing to the right layer — and what it taught us about where different kinds of logic actually belong.

The insight: let PostgreSQL do what PostgreSQL does best

JSONB extraction is native to PostgreSQL. Parent resolution is a JOIN, not a loop. Once we sat down and looked at what the reconcilers were actually doing, the pattern was almost embarrassing: we had reimplemented set operations in PHP that the database handles natively and has decades of optimization behind.

The first move was extracting canonical JSONB field expressions into a single shared reference — one source of truth for how device message fields are accessed. Instead of scattered parsing logic across multiple reconcilers, every query that needed a device message field pulled from the same constants. Alongside this, parent resolution logic was consolidated into a dedicated resolver, and a clean data transfer object replaced the ad-hoc arrays that had been flowing through the system like a game of telephone.

Net result of that extraction: 334 lines of scattered parsing replaced with 251 lines of structured, reusable code. Fewer lines is nice. Fewer places to get it wrong is the actual win.

Collapsing the reconcilers

With shared expressions and a dedicated resolver in place, the parent reconciliation processor could be simplified. The original implementation was 244 lines of PHP that processed devices individually. The rewrite collapsed it to roughly 60 lines by leaning on SQL-first queries. Those numbers tell the story on their own.

More importantly, the granularity changed. Parent reconciliation moved from operating per-device to operating per-collector. Instead of resolving parent relationships one Tracker Controller Unit (TCU) at a time, the system now reconciles an entire collector’s worth of devices in a single pass. Fewer lines of code, fewer database round trips, and a model that better matches how the hardware actually works — which is one of those things that feels obvious in hindsight and invisible beforehand.

Placement reconciliation

With parent resolution handled cleanly, we could build something we had been wanting for a while: automatic placement matching. When a tracker’s parent NCU changes, its farm and section placement should follow. Previously this required manual intervention. For operators managing large fleets (see our use cases), that manual step was a real bottleneck — the kind that makes people build spreadsheets, which is always a sign that your software is failing them.

The new placement reconciler automatically matches a tracker’s placement to its parent NCU after link changes. It handles the edge cases that make this tricky in practice: decommissioned collectors that should not receive new placements, holding states where a device is temporarily unassigned, and link changes that happen during normal operation.

This is the kind of feature that only becomes practical once the underlying data resolution is clean. When parent relationships are resolved reliably at the database level, placement logic can trust the results and focus on business rules rather than data integrity. That same foundation makes automation like automated stow commands possible, because safety-critical decisions depend on knowing exactly which devices belong to which controllers.

Defence at the database boundary

Moving processing closer to the database also meant we could add validation closer to the source. We pushed pre-validation directly into the PostgreSQL import function, filtering out decommissioned devices at the SQL level before PHP processing even begins.

An adversarial code review — deliberately trying to break the pipeline with edge cases — uncovered additional gaps. We added guards to SQL JOINs ensuring that soft-deleted and decommissioned records cannot leak into active queries, plus input sanitisation on message ID arrays passed between layers.

Defence-in-depth is not about distrusting your own code. It is about acknowledging that data flows through multiple layers, and each layer should enforce its own invariants rather than assuming upstream got it right. Trust, but verify. Then verify again.

Benchmarking what you have built

We had a hunch the rewrite was faster. Hunches are not engineering. So we built a dedicated performance benchmarking system alongside the pipeline changes: a synthetic data generator that creates realistic device message scenarios, a stage-level profiler that measures each processing step independently, structured result objects for consistent analysis, and a full test suite validating the benchmarking infrastructure itself.

The benchmarking system is not a one-off script you run once and lose in someone’s home directory. It is a repeatable lane that can be run against any pipeline change to detect regressions or validate improvements. Without it, we would have shipped the rewrite hoping it was better. With it, we knew. The difference between those two sentences is the difference between engineering and optimism.

Idempotency as a requirement

Device commands — instructions sent back to trackers — needed the same rigour. Commands must be idempotent: sending the same command twice should not create duplicate state or conflicting instructions. This was addressed alongside the decommissioned device filtering, ensuring that the command pipeline respects the same lifecycle boundaries as the message pipeline. One set of rules for what counts as “alive,” applied everywhere.

What we learned

The recurring theme across this work is putting processing where it belongs. It sounds like a platitude until you watch it cut your reconciler from 244 lines to 60.

PostgreSQL has decades of optimisation behind JSONB extraction, JOINs, and filtering. We were reimplementing those set operations in PHP, and the code was both slower and harder to maintain. Once we stopped asking PHP to be a database, the application layer could focus on what it actually does well: workflow coordination, permission checks, and domain logic that does not map cleanly to SQL.

Adversarial review earned its keep on this project. Optimistic code review asks “does this work?” Adversarial review asks “how can I break this?” The guards we added from deliberately trying to break our own pipeline addressed real gaps, not theoretical ones. One example: we caught a soft-deleted NCU leaking into an active JOIN because the test data happened to never include that state. A five-minute adversarial scenario saved us a production incident. That is a trade we will take every time.

Granularity matters more than we expected. Moving from per-device to per-collector processing was not just a throughput improvement. The real gain was a model that better reflects the physical topology of the system — and code that makes sense to someone reading it for the first time because it mirrors how the hardware actually works. And none of these claims would hold up without the benchmarking infrastructure to back them. Building measurement tooling felt like overhead at the time. It paid for itself within the first week.

The bigger picture

TrackLab’s system architecture is designed around the idea that each layer should do what it is best at. This pipeline refactor was a concrete application of that principle: stop asking PHP to be a database, stop asking PostgreSQL to be a business rules engine, and build the validation and measurement infrastructure to keep everyone honest. You can read more about how TrackLab’s engineering approach applies this thinking across the platform.

If you are interested in what TrackLab monitors and manages, explore our product features to see the system from the operator’s perspective.

Pushing Data Processing to the Database: How We Optimized TrackLab’s Device Message Pipeline