One problem I keep running into when working on SEO and content marketing is that I have way more content ideas to write about than there is time for.
Here is what I have tried to scale the process. Hope fellow IndieHackers would find this helpful for your own content workflows.
The original write-up and my other content is here.
I would also be happy to chat if anyone wanted to bounce ideas on this topic.
The first thought was to hire freelance writers. In this case, I could focus on finding topics and outlining them and have others flesh these out into complete articles. This is a common practice among content marketers.
I hired three writers from different countries on Upwork to write about different topics. The cost came down to about $10 for a 700-word article. The turnaround was fast too. But the quality wasn't immediately publishable. I would still have to spend a lot of time editing the drafts myself, on top of the overhead of working with the writers.
At this point, the hacker in me said, "that's it, I'm turning my attention to automation."
The first content automation approach was to use a generative language model given a prompt. OpenAI's GPT-3 has been making waves with the huge model and cool demos, so it was a good place to start.
At the time of writing, I didn't have access to GPT-3 (by the way, if anyone can hook me up that'd be great). Instead, I used Hugging Face's implementation of GPT-2, which was similar but with fewer parameters.
I would give it an article prompt, say the title or first sentence I had in mind, and GPT-2 would spit out a plausible-sounding paragraph. The output read like human writing, which was good. But it was too free-form and off-topic for my use case. Playing with the model parameters that I could tweak didn't help with this either.
I needed more control over the output topics.
Then I thought, maybe there are more production-ready services I could use. I came across Contentyze here on an IH podcast episode and was excited to try it out.
It promised a lot on the landing page. And it had an interface that was easy to use. But the outputs just weren't usable for the topics that I was writing about. It just seemed like a wrapper on top of the language models I already tried.
I thought of a different approach. What if I took existing content, automatically paraphrased it, and added my unique angle at the end?
I could take the content that ranked on the first page of Google for my topic and repurpose it as a starting point for my draft. And to simplify the problem, I could start with paraphrasing one paragraph (or sentence) at a time.
And if I could crack this, this would be immediately useful for content writers and college students alike.
This direction seemed promising!
I skimmed the latest natural language processing (NLP) research for the task of automatic paraphrasing. The first paper that caught my attention was the Syntax-guided Controlled Generation of Paraphrases (SCGP) paper (2020) by researchers at the Indian Institute of Science, Microsoft Research, and Google.
It worked like this: given an input sentence (the sentence to be paraphrased) and an exemplar sentence, the model would "generate syntax conforming sentences [conforming to the exemplar] while not compromising on relevance". They even included the code and a pre-trained model.
Big promise from a strong team. Let's give it a try!
And...the output was disappointing. I took the pre-trained model and ran it on the test data they provided (that is, not even using my own content). The results were far from even correct sentences. Here's one example:
Input Sentence: why do some people like cats more than dogs ?
Exemplar Sentence: why do some people develop food allergies later in life ?
Output Paraphrase: why do some people prefer dog puppies more than dogs in ?
Fancy approach on paper, but this wouldn't fit my use case.
What about a different approach for automatic paraphrasing? I found a 2018 paper titled "Word Embedding Attention Network" that also provided code and pre-trained models.
The model would use word embeddings (vector representation) to capture the meanings of the words and generate words from decoding the embeddings. That sounded great in theory.
In practice though, the outputs still weren't good enough. Using their pre-trained model, the outputs resembled correct sentences, but only for very simple phrases. For more meaningful sentences, often the model would not paraphrase at all. That is, the output would look exactly the same as the input. That was a deal-breaker for this use case since I couldn't just plagiarize other people's content.
This was my first foray into delegating my writing. Many approaches I took sounded great in theory, but the outputs just weren't good enough at the time of writing. Though directionally I still think this is promising, as the automated techniques get more sophisticated.
To this end, there are a few more ideas I can try:
Thanks again for checking this out. Hope you found this helpful. I write more about automation, product, and other learnings on my site: https://knowledgeartist.org/