July 13, 2020

An API for video 🧙‍♂️

Colin Bethea @colinjbethea

I believe video is the last untapped form of media for growth on the internet. We obviously have platforms for this already — YouTube, Vimeo, and others, but these are not yours to own.

It should belong to you.

If you (or your users) could generate video in seconds, with rich metadata, formatting, captions, and automatically generated thumbnails — what would you do with it?

Would you create a community, deliver content to grow your business, get feedback from users, or something else?

If you had Stripe for generating high-quality video, what would you do with it?

What would you do with an API for video?
  1. 3

    I apologize for the long rant, this got away from me really quickly. Skip down to "Market Opportunities" for my opinions on video-orientated startups.


    I'm working on a video-based project right now. Video is a bit of a nightmare compared to other forms of media. Here's just a quick list of the friction you run into when you start looking at the problem seriously:

    • Codecs (platform support, patent encumbrance, licensing so difficult you're gonna need a team of lawyers)

    If you want anything resembling efficiency (in terms of bandwidth) on Apple Devices your only choice is H.265. On Android you're gonna be using VP9. On Desktops you have a bit more flexibility, you might even use something esoteric like AV1.


    So straight off the bat you're going to need to deliver the same content in at least two, possibly three different, non-interoperable formats. If you can live with the increased storage and bandwidth costs you might be able to use H.264 (precursor to H.265) as a fallback, but this literally doubles the file size.

    • Resolutions

    Then you need to deliver content in multiple resolutions, YouTube serves content in 7-10 different sizes depending on the source video and display.

    • Container Formats & Adaptive Bitrate Streaming

    Here's another thing where there are multiple competing standards. ABR (https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming) is mandatory for any video product in 2020. Luckily this is relatively easy compared to the codec problem.

    • Storage & Bandwidth

    Video takes up a lot of space. And a lot of bandwidth. The cheapest dependable CDN I could find charges $.005/GB. Most CDNs charge in the realm of $.10/GB to $.25/GB.

    The cheapest dependable storage I could find charges the same $.005/GB. Most providers charge between $0.01/GB to $0.025/GB.

    Keep in mind you also need to account for bandwidth between your systems, not just delivery to your viewers.

    • Encoding / Transcoding

    This is a big one. One approach is to encode the same video a dozen different times. Another is to use live transcoding. A sophisticated implementation might do both (i.e. live transcode for unpopular content to avoid the storage costs, store pre-generated versions for popular content). I found this article simple and digestible when I was first getting my bearings: https://www.wowza.com/blog/what-is-transcoding-and-why-its-critical-for-streaming

    Either option is very computationally expensive.

    • Display

    Presenting video content has its own set of challenges. Luckily there are some good software out there for delivering web video so this one is probably one of the "easier" problems: https://github.com/google/shaka-player, https://videojs.com/, https://plyr.io/ etc.

    • DRM

    Have fun.

    • Scale

    Have fun. Youtube and Netflix both have published dozens of excellent articles on how they accomplish scaling. Naive first-round implementations might be using some sort of edge-based (e.g. Lambda Edge, Cloudflare Workers, etc.) smart routing to cache servers, then to your primary CDNs / Databases, etc.

    So what's to be done?

    I think I've painted a pretty clear picture about why video is hard. There are lots of players in this space that have realized that this is a hard problem and built entire companies around trying to solve these problems.

    One that immediately comes to mind is mux.com (https://www.crunchbase.com/organization/mux-2)

    There's also a very interesting concept from Peer5 (https://www.crunchbase.com/organization/peer5) that aims to solve part of the delivery problem (although this is for streaming vs VOD).

    If you do any searching for comprehensive (mux) solutions or even partial solutions (i.e. encoding from https://coconut.co) you'll find that the price tags are pretty high.

    Market Opportunities

    I'm coming at this problem from a very price-sensitive perspective (due to the nature of the product I'm building) so I think the biggest innovations in this space would have to come from either:

    a) having a unified standard for video (which will probably never happen), or
    b) reducing the cost of transcoding and delivery, or
    c) a technology breakthrough for increasing encoding speed

    I think there's a company called https://cloud.qencode.com/ that tried some sort of cryptocurrency for encoding. I know https://transcodium.com/ launched a crypto product. I wouldn't touch these, but the idea of being able to use idle resources for encoding certainly is an interesting one. Solving the encoding problem "client-side" might be another interesting one (i.e. the videos are transcoded on the client-side with server resources used for validation only).

    That said, if you want to be another mux or bitmovin or similar I think that opportunity is wide open, but it's a tedious problem.

    Another alternative might be to offer something more bespoke, selling to small studios or businesses, a sort of vimeo competitor with a narrower focus.

    1. 2

      A million thanks for the in-depth assessment!

      This is both amazing and daunting, seems like I underestimated the minutiae of the problem - regardless of whether I pursue this full-time or not I think this definitely an interesting and difficult (albeit tedious) domain.

      Going to take a closer look tonight after work and arrange my thoughts.

  2. 2

    Random tangent, I saw this product recently some 360 camera and it had built in ML I guess, not sure if you could call it "vertically integrated" but yeah. The ML part would detect/tag objects and also prep "cool shots" so they're ready for upload to your social media of choice. It was interesting but I also think like wow that seems like a sad life... so I don't know.