1
10 Comments

String vs Integer for enum columns

On a previous project I stored all enums as integers in a database. Such as 1 for Pending, 2 for Running, 3 for Completed and so on.

All was great at start, but months later I had no idea what 4 meant without looking at the code. It’s also very possible that 4 would mean “Not Started” or something that does not come after Complete, for example, which just complicated things even more.

I am now considering either a single Char or a String for my current project.

What’s your opinion? Anything I should be aware of before making a decision?

What data type do you use to store Enum values?
  1. Char (single letter)
  2. String (one or more letters)
  3. Integer
Vote
  1. 2

    Im with @shalvah - go with a database with a proper ENUM type.

    If you're using something like PostgreSQL or MySQL, both have proper ENUM types for a column that have under-the-hood optimizations.

    That said, using an ENUM type can be expensive/inconvenient for various reasons. For example, in PostgreSQL, you can add ENUM options but you cant delete them, nor can you change their order.

    From a normalization perspective, building out a separate table to hold your enumerated values is going to be the best option over all as it will be the most flexible and provide the best options for properly indexing. Thats where I would say "integer" is the best type in the form of a foreign key pointing to a row in a table of possible values.

  2. 2

    Sidebar: you have a fourth option: use a database ENUM field. I wouldn't recommend it z though, because the implementation varies by database type, and it can cause more problems than it's worth.

  3. 2

    It really depends on whether or not this is going to be indexed. Numeric indices are faster and require less storage.
    If you don’t need an index - a string is ok.

    Also - this issue is very minor. Don’t dwell on it too much so you can ship your product sooner! :)

    1. 1

      yeah, exactly! Either will be fine, there are bigger fish to fry at the moment for sure! Just wanted to get some thoughts on it.

  4. 2

    Go with the string until it's a problem and then switch it to something else, maybe a number, when it is a problem. A string is clearer and in modern storage not an issue.

    Another option you didn't list is making it a foreign key to a "states" table. That gets you a bunch of wins: it can be a number, you can look up the string when needed, it's guaranteed to be a valid state.

  5. 1

    Integer and I have ENUM code files in one place so that I know what they mean.

    In code I also don't check if status == 5 for example, that is bad practice. you check like status == StatusENUM::COMPLETE

  6. 1

    So, add a comment. Use named constants for the value.

    So instead of x = 1

    const PENDING = 1;
    x = PENDING;

    1. 1

      Oops, I forgot to mention that I was referring to database column. As for the actual code, it's usually like:

      enum WorflowState {
        Pending = 1,
        Running = 2,
        Completed = 3
      }
      

      So it's fairly explicit. But when looking directly at database rows, it's not :(

      1. 2

        So have a lookup table.

        create table workflow_lookup
        col workflow_name varchar
        col workflow_id integer

        then if you want to know what the id is, you can join to workflow_lookup.

        if you're sure those id's won't be changing, you don't have to use numbers, you can certainly use named ids.

      2. 1

        The fact that you’re explicitly assigning an integer value tells me you should probably be using the same in your database. When that’s encoded and sent to another system e.g. the database, it’s going the be the integer representation.

        If you find the string representation to be more meaningful, then I’d switch to that in both the code and the database.

Recommended Posts