Json Logic in Ruby — The last few percent: hot path loops and deleting a class
Part 4 of a series on making the fastest AND most compliant Ruby JSON Logic gem.
After Sprint 4 I ran the profiler again:
GC: 55.36%
Engine.call: 8.1%
MinMaxCollection.resolve: 6.4%
Numerify.numerify: 2.8%
WithErrorHandling: 2.3% + 2.1%
GC at 55% of CPU time. More than half the benchmark was Ruby garbage collecting — not evaluating rules, just cleaning up after itself.
We’d cut allocations by 32%, but 4.38 Arrays per apply is still a lot when you’re running 600 tests thousands of times which means there were specific hot paths still creating unnecessary arrays. Let’s find out where!
#Sprint 5 — Hot path allocation elimination
Before looking at raw allocation counts, there was a structural problem in the iterable operations: map, filter, reduce, all, some and none — that had been there since the beginning.
The double scope push.
For each item in a collection, the old implementation pushed two frames onto the scope stack: one for an index_scope hash ({"index" => i}) and one for the item itself, there should be a way to have this more efficiently, right? The first thing I found is that I was not even using the index frame as I could just traverse the array to access any level n times so I could safely remove it from allocations and from the scope stack.
# Before — two scope frames per item
index_scope = { "index" => 0 }
collection.each_with_index.each_with_object([]) do |(item, index), acc|
index_scope["index"] = index
scope_stack.push(index_scope, index: index) # frame 1
scope_stack.push(item, index: index) # frame 2
begin
acc << on_each(item, filter, scope_stack)
scope_stack.pop
scope_stack.pop
rescue => e
scope_stack.pop
scope_stack.pop
raise e
end
end
The index wasn’t used by any operation the spec defines — it was internal scaffolding left over from an earlier implementation. Removing it collapsed two pushes into one, halved the scope stack churn on every iterable operation, and simplified the error handling from a rescue pattern to ensure.
# After — one scope frame per item
collection.each_with_object([]) do |item, acc|
scope_stack << item
begin
acc << on_each(item, filter, scope_stack)
ensure
scope_stack.pop
end
end
filter — the hidden second pass.
filter was implemented as a thin override of Iterable::Base: It marked non-matching items with nil during the main loop and then called .compact at the end to strip them out:
# filter's on_each
def self.on_each(item, filter, scope_stack)
Truthy.call(Engine.call(filter, scope_stack)) ? item : nil
end
# on_after
def self.on_after(results, _scope_stack)
results.compact # second pass over the array
end
Two passes: one to collect and another one to remove null values. The fix was to give filter its own call implementation that writes only matching items without using a second pass whatsoever.
The each_with_object problem and Ruby 3.4.
After these structural fixes, I replaced .each_with_object with while i < n index loops across all 28 hot-path files.
Ruby 3.4 rewrote Array#each, map, select, and other core iterators in pure Ruby to make them faster under YJIT; which turned each_with_object and similar chained-enumerator patterns slower without YJIT because they now had more Ruby-level indirection. Our competitor json_logic uses plain lambdas with direct iteration, which YJIT inlines aggressively. Our each_with_object chains were at a disadvantage.
while i < n index loops bypass the enumerator stack entirely. They’re predictable for YJIT and fast without it:
# Before — each_with_object, enumerator overhead
collection.each_with_object([]) do |item, acc|
acc << on_each(item, filter, scope_stack)
end
# After — while loop, no enumerator
results = []
i = 0
n = collection.size
while i < n
results << on_each(collection[i], filter, scope_stack)
i += 1
end
results
compare_chain — the slice that wasn’t needed.
Comparison operators (>, >=, <, <=) support chained comparisons: { "<" => [1, 2, 3] } means 1 < 2 < 3 and my implementation iterated over operand pairs:
def compare_chain(operands, &op)
operands[1..].each_with_index do |right, i|
left = operands[i]
return false unless op.call(left, right)
end
true
end
operands[1..] creates a new Array on every call; with comparisons appear in nearly every real rule this adds up fast! So I just replaced it with an indexed loop.
If#call — slice again.
if rules take [condition, then, else] triplets, but the spec also allows chained if-elsif-elsif-else as a flat array. The original used each_slice(2):
rules.each_slice(2) do |condition, consequent|
# ...
end
each_slice allocates an Enumerator plus a new Array for each pair, the fix was pretty much the same for this one.
MinMaxCollection — map vs <<.
# Before
result = []
wrapped.each { |v| result << resolve(v, scope_stack) }
result
# After
wrapped.map { |v| resolve(v, scope_stack) }
This might look like a style preference but it’s not: YJIT has specific optimizations for Array#map that it can’t apply to the manual push pattern, thus the compiled code for map avoids intermediate allocations.
Missing#deep_keys — accumulate instead of cons.
missing checks which keys from a list are absent in the current data; To handle dot-notation paths like "user.name" it needs to walk the entire data hash recursively and collect all fully-qualified key paths. The original did this with a one-liner:
def self.deep_keys(hash)
hash.keys.map { |key| ([key.to_s] << deep_keys(hash[key])).compact.join(".") }
end
On every key at every level of nesting: hash.keys allocates an Array, [key.to_s] allocates another Array, deep_keys(hash[key]) returns yet another Array, << appends it in place, .compact allocates a fourth Array to strip nils, and .join produces a String making a total of four allocations per key recursively.
The fix passes an accumulated prefix string down through recursion and writes results into a shared accumulator array:
def self.deep_keys(hash, prefix, acc)
hash.each do |key, val|
full_key = prefix ? "#{prefix}.#{key}" : key.to_s
val.is_a?(Hash) ? deep_keys(val, full_key, acc) : acc[full_key] = true
end
end
Sprint 5 result: +3-8% across Ruby versions.
Small gains, but they compound. With YJIT the map optimization pays off more:
| Ruby | Sprint 4 | Sprint 5 | Δ |
|---|---|---|---|
| 3.2 YJIT | 1,251k | 1,327k | +6% |
| 3.4 YJIT | 1,226k | 1,319k | +8% |
#Sprint 6 — Deleting a class
ScopeStack class had three methods: current, push, pop.
class ScopeStack
def initialize(data)
@stack = [data || {}]
end
def current = @stack.last
def push(data) = @stack << data
def pop = @stack.pop
end
Which were pretty much a wrapper around a Ruby Array that does exactly what the Array already does, but with method call overhead and an object allocation per apply.
# Before
scope_stack = ScopeStack.new(data)
scope_stack.current # @stack.last
scope_stack.push(frame) # @stack << frame
scope_stack.pop # @stack.pop
# After
scope_stack = [data || {}]
scope_stack.last # current
scope_stack << frame # push
scope_stack.pop # pop
Event though there was still some business methods in the scope stack class to handle level deep access, I made it so they could be module functions instead of instance methods, and they could be passed the scope stack as an argument instead of relying on instance state.
This is the kind of refactor that feels almost too obvious in retrospect. The class existed because it held more logic at some point — the [data, index] pairs from Sprint 4, some validation. As those things got simplified, the class became a thin wrapper around exactly the thing it was wrapping… Once you see it, you can’t unsee it.
#The full picture
From v0.2.14 (before any optimization) to the end of Sprint 6:
| Ruby | v0.2.14 | Sprint 6 | Total Δ |
|---|---|---|---|
| 2.7 no YJIT | 347k | 886k | +155% |
| 3.2 no YJIT | 370k | 847k | +129% |
| 3.2 YJIT | 512k | 1,327k | +159% |
| 3.4 no YJIT | 350k | 785k | +124% |
| 3.4 YJIT | 528k | 1,319k | +150% |
(Local macOS ARM64, all 601 tests.)
+124% to +159% depending on Ruby version. With YJIT on modern Ruby: over 1.3M ops/s, roughly 0.75µs per rule evaluation.
The Linux CI numbers — the ones that count for the vs-competitors comparison — are in the final post.
Part 4 of 5. Previous: Killing the preprocessing passes: DataHash, HashFetch and allocation profiling · Next: 18/18: the results
jsonlogicruby.com · Benchmarks · Playground · rubygems.org/gems/shiny_json_logic · github.com/luismoyano/shiny_json_logic