(2021-12-25) The Kotonoha Sisters Learned To Speak English!

First of all, merry christmas ya filthy animals! Hope you are having fun whether you are celebrating the holidays alone or not.

Okay, so I am a huge fan of the Kotonoha Sisters, Aoi and Akane. Their voices are so soft and warm, it's like if the feeling of rubbing fresh warm laundry out of the dryer against your cheek was turned into an artificial voice. It feels kinda weird that I'm one of the few people outside of Japan who know about them and like them. If you haven't heard them sing, here are 3 songs to get you started: Song 1, Song 2, Song 3. And I have a few more on my SSD page.

The funny thing is that they were not intended to be singing synthesizers when they were released, they were actually made for a Japanese text-to-speech software known as VOICEROID. But that didn't prevent some talented Japanese programmers from making software that could allow them (and other VOICEROID voice waifus) to have the ability to sing songs the way Vocaloids like Hatsune Miku does. The ones I know of are called Vocalshifter and KotonoSync/KotonoTone (wonder why "Kotono" is in the name 🤔). But not that many songs were made using their voices, and most of them were covers of already existing songs, not original songs. And actually most people do use them normally for text-to-speech things. What text-to-speech things? VOICEROID実況! Or in English: Voiceroid Let's Play's!

These Voiceroid let's play videos started to become pretty popular in Japan as early as the mid-2010s, with the most popular Voiceroid character being Yuzuki Yukari, who had some let's play video serieses with her in them get millions or hundreds of thousands of views per episode on NicoNicoDouga around that time. You could say that these were Virtual YouTubers before what we know today as Vtubers became a thing, and they are more "virtual" than Vtubers as there wasn't an actual human using their own voice and their own body to animate an avatar, but rather some otaku who were good at video games, entertaining script writing, and animating virtual puppets with virtual voices. I find it kind of weird that this phenomenon was and still is completely off the radar in western weeb communities. Maybe this will change with some English releases of these Voiceroid characters? Speaking of English releases...

On December 10th, 2021, the Kotonoha Sisters got an official English release, and I bought a copy the day after. The software they are using isn't VOICEROID, but a newer and very similar software called A.I.VOICE that was released earlier this year. This software was only in Japanese, and was Japan-oriented by all measures, but it seems that AI Inc., the developers of this software have decided it's time to branch out internationally, and translated this software into English and made English voices for it. I mainly bought the English Kotonoha Sisters mainly to play around with them and to financially support their development plus the development of waifu text-to-speech software, but I thought that I'd be getting an English equivalent to their Japanese counterpart. That's not the case sadly. They are very limited feature-wise compared to their Japanese version. When I downloaded and installed the A.I.VOICE software with the English Kotonoha Sisters, I noticed it was missing a few menu options and seemingly basic features that I saw in the introduction video for them. I almost sent a bug report email about this, but when I read the online manual and some tweets a little closer, I found out that what I was running was actually perfectly correct, and they were actually not-unintentionally watered-down right now, and that AI Inc. is planning to bring them to parity with their Japanese equivalent Eventually™.

The limitations of the English Kotonoha Sisters hasn't stopped Japanese people from making some content using the them. Here's the first let's play video featuring their English voices:

And here's the first full song that was made by putting Akane and Aoi's English voices through VocalShifter:

There's not much more content out there, there's even very little fan art of the English Kotonoha Sisters, and I've looked into the depths of the weebnet to look for that. Their sales probably weren't that great I'm guessing. Honestly this doesn't really surprise me as few English-speaking people outside of Japan know about them, and few Japanese people know English well enough to want to go buy an English text-to-speech waifu for themselves. It seems like the Kotonoha sisters are more well-known in China than in the west based on me randomly stumbling upon a lot of videos (both niconico reuploads and some original content) featuring them on Bilibili. Apparently AI Inc. is also planning on making Chinese releases of their software and voices soon. I sure hope whatever comes of that not as limited as the English Kotonoha Sisters are now.

Anyways, I did use the software to make some simple content, so here's a quick rundown of the software and what I made. I did make some random "hello world" test reads, but the first #content I made is Aoi reading the GNU/Linux copypasta, shown below. With this, I had to use the pronunciation editor that the software has to edit the pronunciation of "OS", which was automatically pronounced like the "oss" in "boss". And I also changed the pronunciation of "GNU", which was first correctly pronounced like the animal by the software, but is actually pronounced more like "g-new" when referring to the operating system.

Next I wanted to try something that was actually hard to read, so I picked the Major-General's song, which had a lot of old-fashioned and complicated English words that were definitely not something the Japanese developers of A.I.VOICE expected to be read. Editing the pronunciation was a little bit annoying for this tiny project as the pronunciation editor and documentation surrounding it was super barebones and I had to throw together my own little guide to make it easier for myself to efficiently change the pronunciation of a total of 18 words. Below is what my screen looked like as I was editing the pronunciation. I guess the pronunciation editing system isn't too bad and could be efficient if you memorize the phonetic sounds and the code associated with them and regularly use the A.I.VOICE software, but I think the UX of the English pronunciation editor could be a lot better.

And here's the result of all that:

Finally, here's a screencap of all the words I've registered so far and their pronunciation syntax:

So yeah, the software works, but it is damn limited. I feel a little bit let down, but I definitely got my money's worth with just the random copypasta readings I've gotten out of it. I might do some more simple content using them, but I don't think I'll be doing any major projects with the English Kotonoha Sisters until they reach feature parity with their Japanese version. If you want to play around with the English Kotonoha Sisters, you can get yourself a 14 day free trial over here, and you can buy a digital license key at Toyko Otaku Mode. It's Windows only right now by the way. Also, there's a way you can make other programs make A.I.VOICE read whatever you want, which can be useful for things like reading livestream donation messages. If you're good at programming, here's the Github project for that. Have fun.

◀  Return To Blog Feed