Hey IH!
Most benchmarks are single-tasking relics. In 2025, we are running local AI models (LLM, Recognition, etc.) and complex data processing at the same time in the browser, demanding a new standard for performance measurement.
I've built SpeedPower.run to solve this critical need for modern, comprehensive benchmarking. Instead of a single, isolated task, our system runs a rigorous set of seven concurrent benchmarks, including core JavaScript operations, high-frequency data exchange simulations, and the execution of five different AI models. This process is specifically designed to force concurrent execution across every available CPU and GPU core in your device, simulating a real-world, multi-tasking environment.
Our benchmark is constructed using the most popular and cutting-edge web AI stack: TensorFlow.js and Transformers.js, ensuring relevance and fidelity to applications being built today.
The Challenge: Traditional scores fail to capture this complexity. Is our overall geometric mean score accurately and transparently reflecting the true concurrent processing power of your browser? We believe our holistic approach provides the most accurate answer.
The test is pure and simple: No network interference, no installation or external dependencies—just a raw measurement of your device's compute capabilities as seen by the browser. See your comprehensive score and performance breakdown here: https://speedpower.run/?indiehacker
I'll be here all day to discuss the specifics of our multi-tasking scoring logic, the selection of the seven benchmarks, and how we derived the geometric mean to best represent concurrent power.
Another benchmark? How is this different from JetStream 2 or Speedometer? I feel like we’ve solved browser speed.
I just checked the 'About' page. They are using Transformers.js v3 for the LLM and Speech tests. That uses WebGPU compute shaders for parallel inference. If you're comparing this to old-school JS benchmarks, you're missing the point. We're talking about asynchronous command queues in the browser. I'd be curious to see how the 'Score Stability' handles thermal throttling over multiple runs.
Spot on! Thermal throttling is the 'invisible variable' in mobile benchmarking.
We don't normalize for it because we want to measure peak real-world capacity. However, that’s exactly why we implemented the 'Warm-up Execution.' We prime the JIT and compile the shaders first, so we aren't measuring 'startup coldness.'
If you run the benchmark three times in a row on a fanless MacBook Air, you will see the score dip. To us, that’s a feature, not a bug, it reveals the device's true sustained compute limit for modern, heavy AI workloads.
Good catch. They mention a 'Warm-up Execution' to prime the caches and JIT, but they also say to run it several times for the maximum score. It seems they are measuring 'Peak Capacity' rather than average sustained performance, which makes sense for bursty AI tasks in a web app.
Speedometer tests how fast a page feels; this tests if your browser can actually handle local LLMs 😉. Most benchmarks are single-threaded relics.
But if I'm running an LLM, isn't that almost entirely a GPU bound task? Why does the main thread communication even matter that much once the model is loaded into VRAM?
That’s a common misconception we’re trying to highlight! You’re right that the matrix multiplication happens on the GPU, but an LLM in a browser isn't a 'set it and forget it' process.
With Transformers.js v3, orchestration, tokenization, KV cache management, and autoregressive decoding still require constant 'handshakes' between the worker and the main thread. If your 'Exchange' performance is poor, the GPU sits idle waiting for the next instruction. We specifically included the SmolLM2-135M test to show that even a 'small' model can be bottlenecked by how efficiently the browser moves data between threads.
My phone's browser got a better score than my 5-year-old desktop. That feels totally unbelievable, haha. I thought the desktop would crush it with more cores. What gives with the mobile anomaly?
That's what we call the "Parallel Paradox," and it's what we find fascinating! We've seen some modern mobile ARM chips show better task switching efficiency than older x86 desktops due to more aggressive, thermal aware scheduling in the mobile browser engines. Raw clock speed isn't the whole story anymore. Our scoring uses a weighted geometric mean, where JavaScript and Exchange efficiency are key factors.
Is this test network-dependent at all? Do I need a gigabit connection for a high score? Always skeptical of benchmarks that don't clearly state that.
A totally fair skepticism. Absolutely not. This is a Zero Network Interference test. All the ~350MB of data (AI models and assets) is fully pre loaded into the browser memory before the timer starts. This is a pure local compute test, not a network speed test.
I love the 'Exchange' benchmark. Most devs ignore the cost of moving data between the main thread and workers. It’s the silent killer of performance.
Exactly. If you're building a realtime speech-to-text app using their moonshine tiny test, your GPU inference might be fast, but if your OffScreen Canvas or Buffer transfers are slow, the UX feels laggy. This is the first tool I've seen that quantifies the IPC overhead specifically.