1
2 Comments

Bark: Text-to-Speech AI Voice Cloning App & Text-Prompted Generative Audio

Bark is a revolutionary text-to-audio model created by Suno, based on the GPT-style models, which can generate highly realistic, multilingual speech as well as other audio — including music, background noise, and simple sound effects.

With Bark, users can also produce nonverbal communications like laughing, sighing, and crying, making it a versatile tool for a variety of applications.

Bark uses GPT-style models to generate speech with minimal tweaking, producing highly expressive and emotive voices that can capture nuances such as tone, pitch, and rhythm. It offers a fantastic experience that can leave you wondering if you’re listening to human beings.

Notably, Bark supports multiple languages and can generate speech in Mandarin, French, Italian, Spanish, and other languages with impressive clarity and accuracy.

With Bark, you can easily switch between languages and still enjoy high-quality sound effects.

Bark is not only intelligent but also intuitive, making it an ideal tool for individuals and businesses looking to create high-quality voice content for their platforms.

Whether you’re looking to create podcasts, audiobooks, video game sounds, or any other form of voice content, Bark has you covered.

BARK Features
Similar to Vall-E and some other amazing work in the field, Bark uses GPT-style models to generate audio from scratch.

Different from Vall-E, the initial text prompt is embedded into high-level semantic tokens without the use of phonemes.

It can therefore generalize to arbitrary instructions beyond speech that occur in the training data, such as music lyrics, sound effects or other non-speech sounds.

A subsequent second model is used to convert the generated semantic tokens into audio codec tokens to generate the full waveform.

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio

Multilingual Support
Bark supports various languages out-of-the-box and automatically determines the language from input text.

This means that when prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. While English quality is currently the best, other languages are expected to further improve with scaling.

text_prompt = """
Buenos días Miguel. Tu colega piensa que tu alemán es extremadamente malo.
But I suppose your english isn't terrible.
"""
audio_array = generate_audio(text_prompt)

0:00
/0:11

Music Generation
Bark can generate all types of audio, including music. In principle, Bark does not see a difference between speech and music. However, sometimes Bark chooses to generate text as music.

To help it out, users can add music notes around their lyrics.

text_prompt = """
♪ In the jungle, the mighty jungle, the lion barks tonight ♪
"""
audio_array = generate_audio(text_prompt)

0:00
/0:05

Voice/Audio Cloning
Bark has the capability to fully clone voices, including tone, pitch, emotion, and prosody. The model also attempts to preserve music, ambient noise, etc., from input audio.

To mitigate the misuse of this technology, audio history prompts are limited to a set of

Suno-provided, fully synthetic options to choose from for each language.

😉
However, we jailbroke that for y'all in our release (below)
text_prompt = """
I have a silky smooth voice, and today I will tell you about
the exercise regimen of the common sloth.
"""
audio_array = generate_audio(text_prompt, history_prompt="en_speaker_1")

0:00
/0:07

Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.

Speaker Prompts
Users can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. However, these prompts are not always respected, especially if a conflicting audio history prompt is given.

text_prompt = """
WOMAN: I would like an oatmilk latte please.
MAN: Wow, that's expensive!
"""
audio_array = generate_audio(text_prompt)

0:00
/0:06

Below is a list of some known non-speech sounds:

[laughter]
[laughs]
[sighs]
[music]
[gasps]
[clears throat]
— or … for hesitations
♪ for song lyrics
capitalization for emphasis of a word
MAN/WOMAN: for bias towards speaker
Languages Supported
Language Status
English (en) ✅
German (de) ✅
Spanish (es) ✅
French (fr) ✅
Hindi (hi) ✅
Italian (it) ✅
Japanese (ja) ✅
Korean (ko) ✅
Polish (pl) ✅
Portuguese (pt) ✅
Russian (ru) ✅
Turkish (tr) ✅
Chinese, simplified (zh) ✅
Arabic Coming soon!
Bengali Coming soon!
Telugu Coming soon!

BARK "SERPy" Release!
We’ve got some exciting news for you!

Remember Bark, the new Text2Speech model was released recently? 🐶🔊

Well, guess what? We’ve managed to reverse engineer it! 🕵️‍♂️🔧

Introducing Bark: Text2Speech Voice Cloning 🐶
We know that Bark’s creators restricted voice cloning and added “allowed prompts” for safety reasons.

But we believe in freedom and creativity! 🌟

✊ So, we’ve cracked open the code and removed those pesky limitations! 🚫🔓

Bark Unleashed! 🎉🐾
A set of easy-to-use Jupyter notebooks that’ll have you cloning audio with just 5–10 second samples of audio/text pairs in no time! 🎙️📝

Get ready to revolutionize your audio game with Bark Unleashed!

Just follow our simple instructions and let your imagination run wild! 🌈

Happy cloning, folks!

👇 Show some love with an ⬆️ upvote 🙏

https://www.producthunt.com/posts/bark-text-to-speech-ai-voice-cloning

👇 Then swipe your free download!

, Founder of Icon for Bark: Text-to-Speech AI Voice Cloning App
Bark: Text-to-Speech AI Voice Cloning App
on May 2, 2023
  1. 1

    Hi Devin,

    Checked the article, this looks amazing. I was looking for a good text2speech model for a few passion projects of mine.

    What is the License, this is available with?

    Ok, got access... it's Attribution–NonCommercial 4.0, in my understanding whatever is built upon it shall be NonCommercial as well (please correct me if it's wrong, I'm not well versed in Licenses). So one can't have paid tier to their service with extra features if it's built upon it? What if the feature atop this is available in both.. would it be illegal to provide it in paid tier as well, even though subscription is not for feature that uses it?

    ~ Abhishek Kumar
    (building #LuDe)

    1. 1

      the license carries over from the fork of Suno's. So its "Attribution-NonCommercial 4.0 International" i think is how it works

Trending on Indie Hackers
From Ideas to a Content Factory: The Rise of SuperMaker AI User Avatar 27 comments Why Early-Stage Founders Should Consider Skipping Prior Art Searches for Their Patent Applications User Avatar 21 comments Codenhack Beta — Full Access + Referral User Avatar 19 comments I built eSIMKitStore — helping travelers stay online with instant QR-based eSIMs 🌍 User Avatar 18 comments Day 6 - Slow days as a solo founder User Avatar 12 comments Do Patents Really Help Startups Raise Funding? Evidence from the U.S. and Europe User Avatar 11 comments