We've been around for a while and were owned by GitHub for a bit so spam has always been a problem. We have an automated system, that we refer to as the robots, which scores accounts based on many different things (email, application used to create PDF, location, # of recent uploads, # of slides, etc.) and automatically flags them. It does a good job at catching uploaded presentations that are spammy, but it doesn't do a great job of identifying spam accounts with no activity (a tricky thing to do).
The past few weeks I've built a bunch of tools to fight them, mostly revolving around a combination of email address domain (dns records like mx records) and flagged 700k accounts (in addition to the 100k that were already marked). The tool has a report of top domains used for email address and allows me to view information like MX Records and website headers/content along with samples of users and talks.
Using the tool, I went on a rampage flagging accounts which not only removed spammers from the site but also reduced their ability to complete signup. I've cut signups more than in half, which is a good thing, because now the percentage of real users in signups is higher.