April 26, 2019

Postgresql and optimal text storage

Joshua @randomTest

Lets say i have a blog post - is it better to store the post text in a its own table or could i store it together with the blog object?

#tech-questions

  1. 4

    Honestly, I don't believe you'd scratch any of PostgreSQL's performance bottlenecks with blog posts.

    Do whats faster for you to code, so you can get the results (e.g. business) you want faster.

    A constant reminder for this is @levelsio - usually initiating his projects with a simple spreadsheet, if there's traction he then adds a layer of sqlite, php, a simple vps and cloudflare.

    Edit: typo.

    1. 1

      I 100% agree with sagi, it won't matter at all. From a technical performance standpoint it might not matter either and the good thing with having it in the same table is that you don't need to do joins when you want a list of posts.

      If you still want to find out for fun (not for profit): for questions like this the only way to find out the truth is to try it for yourself. Doesn't matter what it is in theory, if you want to find out which is better on your hardware and specific setup, you have to try it out. Just create a million or so test posts with both setups and query them like you would in typical scenarios.

    2. 1

      Im aware of the horrors of premature optimization, however, this is just the last touch up before Im releasing the beta. Which is why I'd like to know the performance differences the two setups would have.

      But I guess you are right, I could fix it later, especially since I won't actually experience any huge performance difference during beta.

  2. 1

    for blogs i would just store it in a column with a text field. it doesn't impact performance that much anyway. scale later if needed (elasticsearch for searchability and indexed results or something). In general blogs don't require a heavy db load.

  3. 1

    For most blogs it won't matter how you physicalize the schema. If you want to know precisely PostgreSQL will store the text in a separate table transparently anyway. It needs to do this for any variable size object that is greater than the fixed page size (8 kB commonly) using TOAST (The Oversized-Attribute Storage Technique). If you want to dive deep see: https://www.postgresql.org/docs/9.4/storage-toast.html

  4. 1

    this is not worth thinking about. you won't have enough readers to matter for the perf.

    it would be incredibly more valueable to write a blog about the performance effects on SEO or whatever with an A/B test.

  5. 1

    Best left in the same table.

    [From the docs] Any tuple (row) of data that will be bigger than roughly 2000 bytes (in total, with headers) will end up with large values (typically the text/blob columns) being put into TOAST tables with pointers left in the main table to point to the larger blocks of data. This keeps the main table data (IDs, dates, numbers) that you filter on together in the main table for speed.

    If you put the text content in another table you need more queries to maintain everything and that other table will be TOAST-ed if any rows are over 2000 bytes (even if it has just two fields, ID and blog cotent). This effectively doubles the lookups.

    PostgreSQL handles TOAST-ing very efficiently and has done for years. All your applicable blog columns belong in the blog table.

    For reference:
    https://www.postgresql.org/docs/11/storage-toast.html

    1. 1

      Appreciate the help! Thanks.

  6. 1

    What other data is being store with the “blog object”? I don’t think you’ll get much advantage even with a pretty significant amount of traffic. Even then caching would probably help you out anyways.

    Are you planning on doing full text search or just lookups by an indexed primary key? You should be fine for a massive amount of data for the latter.

  7. 1

    This comment was deleted a year ago.

    1. 1

      Hey @csallen!
      Spam alert! ^^^^