Compiling Trading Signals: sigc and the Quantitative Hypothesis Pipeline

The Quant Research Bottleneck

Quantitative finance operates on a simple loop: form a hypothesis about market behaviour, test it against historical data, and if it survives, deploy it to trade live capital. The hypothesis is typically expressed as a signal — a function that takes market data as input and produces a trading decision as output. Simple signals might be moving average crossovers. Complex ones might involve multi-asset statistical arbitrage with regime detection.

The problem is not forming hypotheses. Experienced quants generate ideas constantly. The problem is the pipeline that connects an idea to a production deployment. In a typical firm, this pipeline looks something like the following.

A researcher writes a prototype in Python or R. It gets backtested against historical data. If the results are promising, an engineer rewrites it in C++ or Java for production. The production version is tested again to confirm it matches the prototype’s behaviour. Discrepancies are found. Debugging ensues. Weeks pass. By the time the signal is live, the market conditions that inspired it may have shifted.

This pipeline has three pathologies. First, it is slow — the idea-to-production cycle is measured in weeks, not hours. Second, it is error-prone — manual translation between languages introduces bugs that are difficult to detect because the “correct” behaviour is defined only by the prototype, which itself may have bugs. Third, it is expensive — it requires both research and engineering headcount for every signal.

sigc exists to eliminate this pipeline entirely.

What sigc Is

sigc is a compiler. It takes a signal specification as input and produces a verified Rust executable as output. The specification is defined visually through a structured interface that captures the signal’s logic, its parameters, its data dependencies, and its expected behaviour. sigc compiles this specification through a series of intermediate representations, applies formal verification passes, and emits optimised Rust code that is ready for production deployment.

The key insight is that trading signals, despite their variety, operate within a well-defined computational domain. They consume time-series data. They maintain state across time steps. They produce bounded outputs (positions, weights, scores). This domain is constrained enough that a purpose-built compiler can target it effectively, while being expressive enough to capture the vast majority of signals that quant researchers actually build.

The Compilation Pipeline

sigc’s pipeline has five stages.

Stage 1: Signal Specification. The researcher defines the signal using a visual specification language. This is not pseudocode — it is a structured representation that captures the signal’s data flow graph, its parameters (with types and bounds), its state variables, and its output mapping. The visual interface enforces syntactic correctness by construction: you cannot create a specification that is structurally invalid.

Stage 2: Intermediate Representation. The visual specification is lowered to a typed intermediate representation (IR). The IR is a directed acyclic graph where nodes are operations (arithmetic, logical, temporal aggregations) and edges carry typed data. At this stage, sigc performs type checking, dimension analysis (ensuring that you do not add a price to a volume), and temporal consistency checks (ensuring that the signal does not reference future data).

Stage 3: Optimisation. The IR undergoes a series of optimisation passes. Common subexpression elimination reduces redundant computation. Constant folding evaluates expressions that can be resolved at compile time. Temporal buffer analysis determines the minimum memory footprint required for the signal’s rolling computations. Loop fusion combines operations that iterate over the same time window.

Stage 4: Verification. This is where sigc diverges most sharply from a conventional compiler. The optimised IR is subjected to formal verification checks that go beyond type safety.

Numerical stability analysis identifies operations that are prone to floating-point issues — division by values near zero, subtraction of nearly equal large numbers, accumulation of rounding errors over long time series. Where possible, sigc rewrites these operations into numerically stable equivalents. Where it cannot, it inserts runtime guards and emits warnings.

Lookahead detection verifies that the signal uses only data that would have been available at each historical time step. This is the single most common source of backtesting errors in quantitative research — accidentally conditioning on future information — and sigc eliminates it by construction.

Boundary analysis proves that the signal’s output remains within declared bounds under all reachable input conditions, or identifies the input conditions under which it would not. This is critical for risk management: a signal that is supposed to produce weights in [-1, 1] must actually do so, always.

Stage 5: Code Generation. The verified IR is compiled to Rust source code. sigc generates idiomatic Rust that leverages the type system and ownership model for additional safety. The generated code includes inline documentation tracing each section back to the original specification, making it auditable. The output is a complete Cargo project: source, tests, benchmarks, and configuration.

Why Rust

The choice of Rust as the compilation target is deliberate and driven by three properties.

Performance. Trading signals in production must process market data with minimal latency. Rust compiles to native code with no garbage collector, no runtime overhead, and predictable performance characteristics. For high-frequency applications, this is not optional — it is a requirement.

Correctness. Rust’s type system and ownership model prevent entire categories of bugs at compile time: null pointer dereferences, data races, use-after-free errors, buffer overflows. For code that manages financial risk, these guarantees matter. sigc’s generated code inherits these guarantees from the target language in addition to the guarantees provided by sigc’s own verification passes.

Ecosystem. Rust’s package ecosystem provides high-quality libraries for the operations that trading signals need: efficient time-series data structures, SIMD-accelerated numerical computation, low-latency network I/O for market data feeds. sigc’s generated code can link against these libraries directly.

The alternative targets we considered — C++ offers comparable performance but weaker compile-time safety guarantees; Python and Java introduce runtime overhead that is unacceptable for latency-sensitive applications; FPGA targets would provide the lowest latency but at the cost of a dramatically more complex compilation pipeline that we may explore in the future.

Formal Verification in Practice

Formal verification in sigc is not an academic exercise. It addresses concrete failure modes that cost real money in quantitative finance.

Consider a signal that computes a z-score: the deviation of a current value from its rolling mean, divided by its rolling standard deviation. This is one of the most common building blocks in quantitative research. It is also a numerical minefield. When the rolling window contains identical values, the standard deviation is zero, and the z-score is undefined. When the standard deviation is very small, the z-score is very large, potentially blowing through position limits.

sigc’s verification pass detects this. It identifies the division by the rolling standard deviation as a potential zero-division. It analyses the downstream consumers of the z-score to determine the impact of extreme values. It can either insert a guard (clamp the z-score, skip the division when the denominator is below a threshold) or flag the issue for the researcher to resolve manually, depending on configuration.

This kind of analysis is tedious and error-prone when done by hand. It is the kind of thing that gets skipped under deadline pressure and then causes an incident at 2 AM on a Tuesday. sigc makes it automatic.

The Speed Advantage

In our internal benchmarks, the time from signal specification to compiled, verified Rust executable averages 47 seconds for signals of moderate complexity (10-50 operations, 2-5 state variables, single-asset). Complex multi-asset signals with regime detection take up to 3 minutes.

Compare this to the traditional pipeline: days to weeks from prototype to production, with manual code review and testing at each stage. The time savings are not incremental — they are categorical. A researcher can specify a signal, compile it, backtest the production binary against historical data, and have results before lunch. If the signal does not perform, they iterate. The feedback loop shrinks from weeks to hours.

This speed also changes the research methodology. When testing a signal is cheap, researchers test more signals. They explore parameter spaces more thoroughly. They are more willing to discard mediocre ideas and move on. The result is not just faster deployment of individual signals — it is a higher throughput of the entire research process.

Limitations and Future Directions

sigc is not a general-purpose compiler and does not aspire to be one. It targets a specific computational domain — stateful functions over time-series data — and is effective within that domain. Signals that require complex machine learning models, external API calls, or human-in-the-loop decisions are outside its current scope.

The visual specification language, while expressive, has a learning curve. Researchers accustomed to writing Python must adapt to a different way of expressing signal logic. We have found that this adaptation typically takes a few days, after which most researchers report that the visual specification is actually faster than writing code, because the structured interface prevents a large class of errors before they occur.

We are actively working on expanding the verification capabilities — particularly around multi-signal portfolio interactions, where the behaviour of individual signals is correct in isolation but their combined effect on a portfolio may not be — and on supporting additional compilation targets for environments where Rust is not the standard.