Launched a data-driven app with a JSON file instead of a database

by Jeremy

Wanted to share a short story about not over-engineering a new project and just shipping it. I built a small app (www.virtualpostersession.org) a few months ago, which helps people plan events - exactly the kind of thing you'd expect to see a database behind!

However, I didn't want to mess with configuration or scaling or anything so instead of a database backend... I used a simple in-memory cache of a JSON file. Of course I protected access with a mutex, and have atomic writes to disk... but it's ridiculously fast compared to a database AND still nowhere close to using much memory at all. I've yet to see a response time over 100ms, they're typically 50-70ms even with pulling 50+ data structures out of the JSON cache.

Added benefit: Every 30s the code checks for changes and saves to disk, and copies the old JSON file to a datestamped backup - so I've essentially got an archive from launch to present.

At this stage, I have a 1Mb JSON file and a couple large contracts in the works - I think it's about time to start thinking about that database!

So, all that to say: don't overthink your tech stack! Keep it simple and wait until you NEED to switch to a database and you'll have a much better understanding of the usage patterns and how they might inform the schema.

The real kicker: I have a PhD in CS and teach databases to grad students!

Jeremy

posted to

Developers

on October 7, 2020

Say something nice to stridatum…

Post Comment

4

I think you conclusion is totally valid, and we do tend to overthink tech choices OR choose complicated setups that don’t match the complexity to get started.

I’m curious, however, why you didn’t use something like SQLite? It offers a lot of the file-based advantages without requiring you to manage the access/writes, although of course it doesn’t speak JSON natively.

jones_spencera

·
4 years ago
·
Reply
1. 1
  
  TBH I just didn't know enough about how it does locking when accessed from multiple processes, I think it would be ok now that I've done more reading. But I didn't want to commit to it at the start if I would just have to switch later anyway.
  
  stridatum
  
  ·
  4 years ago
  ·
  Reply
  1. 2
    
    @stridatum, that makes sense. I assumed you only had a single process based on your description of the approach you currently had.
    
    jones_spencera
    
    ·
    4 years ago
    ·
    Reply
3

But you started developing a DB of your own essentially 🤣.

most people don't know how to properly do "a mutex, and have atomic writes to disk..." without bugs destroying the entire thing.
"Every 30s the code checks for changes and saves to disk, and copies the old JSON file to a datestamped backup" if this takes off you'd run out of disk space quickly. with 1 MB, you consume almost 100MB a day.. and you won't know how to save diffs and what to do with them...
You did over engineer it while thinking you simplified it
you'r living the not invented here life.
that's not a simple I just use a JSON file story...

Maybe just setup Redis if you care for speed and light and such and want it cache style, can backup to disk and remotes and doesn't even need setting up credentials....

hatkyinc

·
4 years ago
·
Reply
1. 3
  
  Sure anyone could call a JSON file a "DB" but... there's no indexing, no query language, no client libraries, no connection issues, no per-request serialization... etc. A lot fewer places for the code or the integration to break down.
  
  Most people can't do a mutex?? Isn't this taught in like 2nd year of university?? If using a mutex is over-engineering then... wow... that's a sad state for tech
  
  It's taken 2 months to get to 1MB, and the network effects aren't crazy enough for it to "take off" in any sense of the world. 100MB/day would be a worst-case scenario where changes are happening 24/7 and even then the solution is literally just "delete some of the snapshots"
  
  Redis has serious serialization overhead, fixed data structure types requiring custom wrappers, and still have more breakage points... I guarantee you I wouldn't get 50ms round trip times if Redis backed this.
  
  Maybe it's not a "simple JSON file" story, but it's certainly much simpler from a product complexity standpoint which allows me to iterate new features quickly, and it's hella fast.
  
  stridatum
  
  ·
  4 years ago
  ·
  Reply
  1. 1
    
    The production instances of redis where I work usually have a response time of 1-3 ms. Also, I think JSON has to be parsed by the browsers JS engine so it's better to return the JSON object as a string then an object literal and call JSON.parse when it gets to the browser because then the engine can optimize for parsing JSON (this video explains it better than I can, https://www.youtube.com/watch?v=ff4fgQxPaO0).
    
    Congrats on your success with the app!
    
    zacharykearns
    
    ·
    4 years ago
    ·
    Reply
    1. 1
      
      JSON is just the on-disk storage format, it's a standard data structure once it's read into memory. Very little browser code going on, I'm doing fairly minimal front-end interaction so I can cache aggressively if traffic spikes (these are event pages, so not a nice consistent traffic pattern like other projects).
      
      1ms x 50 or more requests still adds up to nearly my current full-page load time. Redis would certainly still be fast enough, just not as fast as it the page is now.
      
      stridatum
      
      ·
      4 years ago
      ·
      Reply
  2. 1
    
    "Isn't this taught in like 2nd year of university?? If using a mutex is over-engineering then... wow... that's a sad state for tech"
    
    I wouldn't know. I was an Journalism major.
    
    whipdancer
    
    ·
    4 years ago
    ·
    Reply
  3. 1
    
    I've seen 1,000s/sec of Redis reads over network take <1ms with rare 7ms spike a few times a week, which the expert still said it's instrumentation issues :shrug:
    
    "no per-request serialization..."
    Once you work with JSON there is serialization, unless your app is just an API that returns a JSON...
    You might be loading it once to memory and doing it on save only but your doing some form of it...
    
    You might exists in your own bubble where everyone you know went to uni.. try to lookup the real stats..
    
    didn't look much, but googled this up https://redislabs.com/blog/redis-as-a-json-store/
    
    I salute your work.
    I just don't think many should follow it unless it's what they fell passionate about. I believe for most people using off-shelf components is better than thinking on things that aren't the core business.
    
    hatkyinc
    
    ·
    4 years ago
    ·
    Reply
    1. 1
      
      Thanks for your opinions and anecdotes.
      
      I quite confident a mutex and a few map data structures are faster and easier to understand and debug than any sort of external dependencies. These are "off-shelf components", in fact language built-ins, that are better than trying to figure out how to shove data into existing databases, which lets me focus on the business instead of the tech.
      
      stridatum
      
      ·
      4 years ago
      ·
      Reply
      1. 1
        
        I truly bow to purist like you. 🙇
        
        I just connect half baked, half related components for 20+ years to compose systems that are too large for any one person to fully know yet find and patch holes where they leaks. Hopefully putting the gauges in just the right way to know about the leak before it's a 4AM total blow out issue.
        
        There isn't a one size fits all, the better your work fits you the better you do and that is all that truly matters (assuming it's also valuable, monetizable...)
        
        hatkyinc
        
        ·
        4 years ago
        ·
        Reply
2

This is awesome. Pieter Levels would be proud.

https://mobile.twitter.com/levelsio/status/1101581928489078784

When you do switch to a database check out Postgres' json fields. You can pretty much set up a simple table with one json column and write one entry per row. Very similar to what it sounds like you are doing now. What's cool is you can then do fast SQL queries reaching into the json with all the power that brings. Good luck!

chr15m

·
4 years ago
·
Reply
1. 2
  
  That's the eventual plan! I've been using Postgres for nearly 20 years now. =)
  Implemented some driver code for HSTORE a number of years ago when there was no JSONB, but the internal indexing work they've done is pretty amazing.
  
  stridatum
  
  ·
  4 years ago
  ·
  Reply
  1. 1
    
    Excellent. I should have known that somebody with the experience to know when to use a memory/text based database would also know of PostgreSQL's deep features.
    
    chr15m
    
    ·
    4 years ago
    ·
    Reply
1

The original Oracle of Bacon uses an in-memory data store.

https://www.oracleofbacon.org/how.php

whipdancer

·
4 years ago
·
Reply