Why I built the first browser benchmark that gives the true power of your device

by KirkGC

Hey IH!

Most benchmarks are single-tasking relics. In 2025, we are running local AI models (LLM, Recognition, etc.) and complex data processing at the same time in the browser, demanding a new standard for performance measurement.

I've built SpeedPower.run to solve this critical need for modern, comprehensive benchmarking. Instead of a single, isolated task, our system runs a rigorous set of seven concurrent benchmarks, including core JavaScript operations, high-frequency data exchange simulations, and the execution of five different AI models. This process is specifically designed to force concurrent execution across every available CPU and GPU core in your device, simulating a real-world, multi-tasking environment.

Our benchmark is constructed using the most popular and cutting-edge web AI stack: TensorFlow.js and Transformers.js, ensuring relevance and fidelity to applications being built today.

The Challenge: Traditional scores fail to capture this complexity. Is our overall geometric mean score accurately and transparently reflecting the true concurrent processing power of your browser? We believe our holistic approach provides the most accurate answer.

The test is pure and simple: No network interference, no installation or external dependencies—just a raw measurement of your device's compute capabilities as seen by the browser. See your comprehensive score and performance breakdown here: https://speedpower.run/?indiehacker

I'll be here all day to discuss the specifics of our multi-tasking scoring logic, the selection of the seven benchmarks, and how we derived the geometric mean to best represent concurrent power.

KirkGC

posted to

Product Launch

on January 29, 2026

Say something nice to kirkbreton…

Post Comment

3

Another benchmark? How is this different from JetStream 2 or Speedometer? I feel like we’ve solved browser speed.

TheFunnyDev

·
8 hours ago
·
Reply
1. 1
  
  JetStream and Speedometer test classic JS speed.
  This measures modern browser power: WebGPU + concurrent AI workloads.
  
  anwaarulhaque
  
  ·
  4 hours ago
  ·
  Reply
2. 1
  
  I just checked the 'About' page. They are using Transformers.js v3 for the LLM and Speech tests. That uses WebGPU compute shaders for parallel inference. If you're comparing this to old-school JS benchmarks, you're missing the point. We're talking about asynchronous command queues in the browser. I'd be curious to see how the 'Score Stability' handles thermal throttling over multiple runs.
  
  GreenInfraGuru
  
  ·
  8 hours ago
  ·
  Reply
  1. 1
    
    Spot on! Thermal throttling is the 'invisible variable' in mobile benchmarking.
    
    We don't normalize for it because we want to measure peak real-world capacity. However, that’s exactly why we implemented the 'Warm-up Execution.' We prime the JIT and compile the shaders first, so we aren't measuring 'startup coldness.'
    
    If you run the benchmark three times in a row on a fanless MacBook Air, you will see the score dip. To us, that’s a feature, not a bug, it reveals the device's true sustained compute limit for modern, heavy AI workloads.
    
    kirkbreton
    
    ·
    8 hours ago
    ·
    Reply
  2. 1
    
    Good catch. They mention a 'Warm-up Execution' to prime the caches and JIT, but they also say to run it several times for the maximum score. It seems they are measuring 'Peak Capacity' rather than average sustained performance, which makes sense for bursty AI tasks in a web app.
    
    TheFunnyDev
    
    ·
    8 hours ago
    ·
    Reply
3. 1
  
  Speedometer tests how fast a page feels; this tests if your browser can actually handle local LLMs 😉. Most benchmarks are single-threaded relics.
  
  kirkbreton
  
  ·
  8 hours ago
  ·
  Reply
  1. 1
    
    But if I'm running an LLM, isn't that almost entirely a GPU bound task? Why does the main thread communication even matter that much once the model is loaded into VRAM?
    
    TheFunnyDev
    
    ·
    8 hours ago
    ·
    Reply
    1. 1
      
      That’s a common misconception we’re trying to highlight! You’re right that the matrix multiplication happens on the GPU, but an LLM in a browser isn't a 'set it and forget it' process.
      
      With Transformers.js v3, orchestration, tokenization, KV cache management, and autoregressive decoding still require constant 'handshakes' between the worker and the main thread. If your 'Exchange' performance is poor, the GPU sits idle waiting for the next instruction. We specifically included the SmolLM2-135M test to show that even a 'small' model can be bottlenecked by how efficiently the browser moves data between threads.
      
      kirkbreton
      
      ·
      8 hours ago
      ·
      Reply
1

Pretty smooth experience, only took 30 seconds. My main question is: how are you distinguishing between a true browser-engine efficiency lead (like Brave vs. Chrome) and just a thermal throttling difference on the OS/driver side?

mark902

·
an hour ago
·
Reply
1. 1
  
  That's the million-dollar question for any benchmark! We run a quick pre-test to check for baseline thermal status, and we perform a Warm-up Execution to ensure we're measuring peak throughput. We engineered the total runtime to be a short, maximum-load burst to isolate the browser's scheduler efficiency (the Exchange and JavaScript scores) before OS/hardware thermal throttling becomes the dominant factor.
  
  kirkbreton
  
  ·
  an hour ago
  ·
  Reply
1

I'm building a complex dashboard app right now. My biggest bottleneck is garbage collection when I'm running multiple WebWorkers. Are you guys tracking GC pauses in your methodology? That's the one metric I'd love to see.

chessyjaz

·
an hour ago
·
Reply
1. 1
  
  That's an excellent feature request. We are focused on CPU/GPU throughput saturation right now, with a key focus on Web Worker communication in our Exchange benchmark. A metric on GC pause time/frequency during heavy concurrent load would be a perfect addition for our next phase. Mind sharing what framework/library you are using? We'd love to hear more about your real-world use case.
  
  kirkbreton
  
  ·
  an hour ago
  ·
  Reply
2

My phone's browser got a better score than my 5-year-old desktop. That feels totally unbelievable, haha. I thought the desktop would crush it with more cores. What gives with the mobile anomaly?

UXNinja77

·
8 hours ago
·
Reply
1. 1
  
  That's what we call the "Parallel Paradox," and it's what we find fascinating! We've seen some modern mobile ARM chips show better task switching efficiency than older x86 desktops due to more aggressive, thermal aware scheduling in the mobile browser engines. Raw clock speed isn't the whole story anymore. Our scoring uses a weighted geometric mean, where JavaScript and Exchange efficiency are key factors.
  
  kirkbreton
  
  ·
  8 hours ago
  ·
  Reply
1

очень интересно, я бы хотела читать такого рода статьи чаще

MariaLit

·
2 hours ago
·
Reply
1. 1
  
  Thank you so much! I’m really glad you found the article interesting.
  
  We definitely plan to keep sharing these kinds of deep dives. At ScaleDynamics, we’re obsessed with the technical details of the 'Compute Web'—especially how browsers handle the collision of local AI and heavy data processing.
  
  We’ll be posting more about our findings from the SpeedPower.run beta data soon, focusing on how different architectures handle task saturation. If there’s a specific part of browser performance or AI integration you're most curious about, let me know—I’d love to cover it in a future post!
  
  kirkbreton
  
  ·
  an hour ago
  ·
  Reply
1

Love the focus on true concurrent workloads. Most benchmarks don’t reflect how browsers are actually used today AI + heavy JS at the same time. The no-network, no-install approach makes the results feel trustworthy. Curious how the geometric mean avoids hiding CPU vs GPU bottlenecks. Great work 👏

anwaarulhaque

·
4 hours ago
·
Reply
1. 1
  
  Thanks for the kind words! You’ve touched on exactly why we went with the Geometric Mean for the final score.
  
  In traditional benchmarks using the Arithmetic Mean, a massive score in a single category (like raw JS speed) can 'pull up' a terrible score in another (like AI inference). It effectively hides bottlenecks.
  
  By using the Geometric Mean, we ensure that every category matters equally. If a device has a 'bottleneck' where the Exchange score is near zero because of IPC lag, it drags the entire overall score down significantly. It’s a much more 'honest' average for hardware.
  
  Our goal was to make sure you couldn't just throw a fast GPU at the problem and ignore the CPU-to-GPU handshake. If one part of the pipeline is a 'weak link,' the final score will reflect that reality.
  
  Really glad to see the 'no-install' approach is resonating. We wanted to lower the barrier so developers could test their theories on the fly without the friction of a 5GB suite.
  
  kirkbreton
  
  ·
  an hour ago
  ·
  Reply
1

Is this test network-dependent at all? Do I need a gigabit connection for a high score? Always skeptical of benchmarks that don't clearly state that.

Ilearncoding

·
8 hours ago
·
Reply
1. 1
  
  A totally fair skepticism. Absolutely not. This is a Zero Network Interference test. All the ~350MB of data (AI models and assets) is fully pre loaded into the browser memory before the timer starts. This is a pure local compute test, not a network speed test.
  
  kirkbreton
  
  ·
  8 hours ago
  ·
  Reply
1

I love the 'Exchange' benchmark. Most devs ignore the cost of moving data between the main thread and workers. It’s the silent killer of performance.

ScottMcgee

·
8 hours ago
·
Reply
1. 1
  
  Exactly. If you're building a realtime speech-to-text app using their moonshine tiny test, your GPU inference might be fast, but if your OffScreen Canvas or Buffer transfers are slow, the UX feels laggy. This is the first tool I've seen that quantifies the IPC overhead specifically.
  
  kirkbreton
  
  ·
  8 hours ago
  ·
  Reply