A Basic Tutorial on UTAU, OREMO, and SetParam for Making and Using Voicebanks
OREMO
OREMO is a really good program for recording all types of voicebanks. Since OREMO was made specifically for creating UTAU voicebanks, there are a lot of neat features (like auto-saving and pitch monitoring) that help you create a great voicebank.
​
​
Using OREMO
Now it’s time to record your voicebank!
First, please make a new folder specifically for your UTAU sound files.
Next, open up OREMO.
Click the folder icon at the bottom of the screen.
Navigate to the folder you just created for your recordings and select it.
See all the Hiragana characters on the left hand side? Unless you’re fluent in Japanese (or know the characters) you’ll probably have a tough time reading those. So you’ll need to change the reclist.
*NOTE: A “reclist” (a.k.a. a “recording list”) is basically a text file that has all the syllables or phonemes that you want to record. So, for instance, a Japanese reclist would have Hiragana entries “a i u e o ka ki ku ke ko sa shi su se so… etc…” while a Chinese reclist could have “a ai an ang ao ba bai ban bang bao bei ben beng… etc…”
Following this logic, then, you can see that UTAU isn’t limited to just Asian languages like Japanese, Chinese, Korean, and Vietnamese. There are English reclists, German reclists, Spanish reclists, Russian reclists… You name it, and someone has probably done it before.
Go to “File → Load voice list” and select the reclist you want to use. I’d recommend this one here for Japanese, which is the one I use:
https://www.mediafire.com/?gejccojcec2e8a5
(“Romaji,” in case you’re not familiar with the word, is basically the romanization of Japanese characters- a.k.a. the English/Latin pronunciation of characters.)
Now, all the characters are in English! Hooray!!
Let’s get started with the actual recording.
To record a syllable, simply hold the “r” key. Once you press down on the key, hold it a bit to see that it actually starts recording, then say the syllable for about 1-2 seconds or so. Letting go of the “r” key stops recording, and if you do that on accident, you’ll have to start over with that syllable again.
Now, to hear how you did, press the spacebar. If it’s taking some time, just wait. OREMO might get slow sometimes.
To have a high quality voicebank, you’ll probably want to see additional information on your recording. Go to “Show” and make sure “Show Waveform,” “Show Spectrum,” “Show Power,” and “Show F0” are all checked.
-
The waveform and the spectrum tell you when you actually started saying the syllable. That’s where it’s all gray and black.
-
The power tells you how loud you said the syllable. As you can see, it’s measured in decibels.
-
The f0 tells you the pitch of the recording.
Of course, for a good recording, you’d want to make sure that your power and your f0 are consistent throughout all of the recordings. It doesn’t hurt that much if they aren’t exactly the same, since UTAU deals with such issues pretty well, but if you want a high quality voice bank, you’d probably want to keep these in mind.
After you finish recording a syllable, press the “up” or “down” keys (on the keyboard) to move on to the next syllable. Whenever you do this after recording a sound, OREMO automatically saves the recordings into .wav files in the folder you specified earlier.
This is one thing that makes OREMO so much more convenient to use than Audacity. In Audacity you have to manually save every recording to the appropriate .wav file. In OREMO, you don’t need to worry about this. And it’s okay if you have to re-record something after OREMO already saves it. OREMO will replace your old recording file with the new one.
Now go through and record all the syllables.
Try to pronounce the Japanese syllables as closely to the language as you can (don’t butcher it terribly).
Japanese “r” sounds are a bit difficult to do; I’d recommend listening to the syllable in question on YouTube.
Breath sounds are a bit tricky, but the biggest piece of advice I have would be to try not to exaggerate it. It sounds really strained if you do, and everyone can tell.
I’d also advise not recording it too softly. If you do, UTAU could sometimes have a hard time finding it, and might filter it out like other background noise.
*NOTE: If you want a super flexible voicebank, you can also record different pitches. This can be done by clicking the
actual “up arrow” and “down arrow” on the side. Click the “up arrow” and repeat the syllable in a higher pitch, and
do the same for the “down arrow,” except with a lower pitch. If you do this for all the syllables in a voicebank, you’ve
just created a tri-pitch voicebank! The name itself is self-explanatory.
Different pitches can make your UTAU sound better at higher and lower, well, pitches, since while UTAU is pretty good
at pitch changes (it has to be, as it is a software meant for singing), it isn’t always the best at really high or really low
keys.
The different pitches you record don’t necessarily need to be exactly an octave higher or lower, if that’s what you’re
concerned about it. UTAU deals with different pitches well enough, so they don’t sound off-key.
Of course, it’s not entirely necessary to record multiple pitches (I don’t). But a lot of good voicebanks do do this, so I’m just letting the option be known.
*NOTE: Don’t worry about recording the different tones some languages have, like those in Chinese. These tones will not be apparent in your actual recordings. Things like tones will be handled in UTAU itself.