Japanese Pronunciation: A Detailed Guide (With Audio)
A very important (and often underrated) aspect of Japanese that will help you communicate effectively is good pronunciation.
Getting your tongue around a new language can be hard work, but the reality is that proper pronunciation is essential to speaking.
If you can speak clearly, you will be understood - even if your grammar and vocab aren't perfect.
The opposite is not true, however, as perfectly formed sentences mean nothing to a person if they can't understand the sounds coming out of your mouth.
Good pronunciation can also greatly improve your confidence, which means you’ll be more willing to put yourself out there and speak as often as possible.
Like all physical skills, the key to good pronunciation is simple...
You can't train your tongue to shape the right sounds by reading about it. The muscles need to be developed, and your ears need to be trained to identify the subtle differences too.
Although this does generally get harder with age (part of the reason immigrant kids usually have much better pronunciation than their parents), with practice, it can still be learnt.
Quite simply, the more you do it, the easier it gets, and the more natural you will sound.
Below is my detailed guide to Japanese pronunciation. It includes a thorough explanation of all the different sounds in the language, as well as audio for each sound and a few useful words to practice with.
Select which characters to display:
Select whose voice to play:
To begin with, Japanese has only five vowel sounds. While English only has five vowels, they are each pronounced differently when used in different combinations with other letters, bringing the total number of unique vowel sounds up to around 20, depending on a person’s accent. Compared with this, the five sounds in Japanese are easy to learn.
Here are the five vowel sounds:
(Click to play audio)
These five vowels are also the first five “letters” of the "syllabary", the Japanese equivalent of the alphabet. Together, they are often referred to as the “a-line”.
In speech and writing, each of these sounds are used on their own or in combination with consonant sounds to produce other “letters”. For example, the first consonant sound is a “k” sound, but this can only be written or spoken
in combination with one of the above five vowel sounds.
As such, the next five “letters” in the syllabary after the a-line are:
As you have probably guessed, these five sounds can be referred to as the “ka-line”.
It is important to remember that there is no such thing in Japanese as a “k” on its own, and this is the same for all other consonant sounds, with the exception of “n”, as will be explained shortly.
After the ka-line, the pattern continues, starting with “sa” and followed by “ta”, “na”, “ha”, “ma”, “ya”, “ra” and “wa”. There are, however a few exceptions to this basic pattern, so we will now look at each of these lines one by one.
The exception here is that the second sound is “shi”, not “si”.
Pro tip: When typing using IME or similar Japanese language input tools, you do not need to type the "h" to get 「し」 - "si" will do the job and save you a fraction of a second. The same goes for other similar exceptions below, such as "chi" and "tsu".
The exceptions here are the “i” and “u” variations, where “ti” is pronounced “chi”, and “tu” is pronounced “tsu”, as in the word “tsunami”.
No exceptions here. Next:
The third sound here is not a “hu” sound but a “fu” sound, as in Mt. Fuji. It is, however, a lighter sound than the English “f ”, and sounds a bit like the sound you might make when unsuccessfully trying to whistle. Your bottom lip should not touch your teeth.
Also, note that 「は」 is pronounced "wa" when used as a particle. This may seem confusing at first, but the particle 「は」 is so common that it shouldn't take long before you are able to recognise when it is a particle and when it is just part of another word. To learn more about particles and the role they play in Japanese sentences, check out my article on Japanese sentence structure.
Again, no exceptions. Easy.
The ya-line only has the “a”, “u”, and “o” sounds, but is otherwise quite straightforward. The “yi” and “ye” sounds were used once upon a time, but have since died out of the language. As a result, the Japanese currency today is pronounced “en” in Japanese, not “yen”.
Although there are no exceptions in the ra-line, the "r" sound is unquestionably the hardest sound for native English speakers to master. It is usually written as an “R” in romaji, but the sound itself is much lighter than the English “R”, somewhere between an “R” sound and an “L” sound. This is why Japanese people often struggle to distinguish between “R” and “L” when learning English - they use the same sound to cover both letters when speaking English.
The ra-line sounds are achieved by flicking your tongue lightly against the roof of your mouth. Of course, descriptions like this are hard to implement in practice, so like all other sounds, the best way to learn to pronounce the ra-line correctly is to listen and practice repeatedly until your tongue builds up the necessary muscles to make the sound effortlessly.
Although difficult, this is definitely worth the time, as correct pronunciation of the ra-line will make your Japanese sound much better to a native speaker's ears, and this alone can earn you lots of respect.
This line only has the “a” and “o” variations, and the “w” sound is effectively silent in the case of “wo”. The “wo” therefore sounds the same as the “o” from the a-line, but they are used differently in writing and are not interchangeable. Essentially, "wo" is only ever used as a particle, some examples of which can be seen in my other articles, such as this one on sentence structure or this one about word order.
Lastly, we have this:
This “n” is the only consonant that stands alone without a vowel sound attached. It is slightly different to the “n” sound produced in the na-line, although you can get away with a regular “n” sound in most cases.
It is important to note that this “n” sound should always be pronounced as its own syllable, and not blended into other sounds. For example, the name “Shinichi” is actually made up of the sounds shi-n-i-chi (しんいち), with the “n” sound being the lone “n”, not a part of “ni”. This name should therefore be pronounced with a distinct separation of “shin” and “ichi”.
There are a few ways to differentiate this “n” sound from na-line sounds when writing in romaji, with my preferred option being “n'” (“n” followed by an apostrophe). This is only really necessary, however, when the "n" is followed by an a-line sound.
Similarly, when “n” is followed by a na-line character, it is usually written as “nna”, “nni”, etc., to show that there is an “n” sound followed by a separate na-line sound. For example, the commonly known Japanese word for “hello”, sometimes spelled “konichiwa”, actually contains this “n” followed by “ni”, and should therefore be written as “konnichiwa”.
We have now looked at all of the sounds that appear in the main part of the syllabary, but there are more! There are also a couple of important combinations and other points that are vital to achieving correct pronunciation in Japanese which we will cover soon.
But first, we have to look at...
There is another set of "letters" that are strongly related to some of the sounds introduced above as they are, in Japanese terms, simply a transformation of those sounds.
The first line that this applies to is the ka-line. By adding two small lines, known as "dakuten" or "ten-ten", to the upper right of each of the ka-line characters, the hard “k” sound changes into a softer “g” sound as follows:
か ka → が ga
き ki → ぎ gi
く ku → ぐ gu
け ke → げ ge
こ ko → ご go
With just two small lines added to each character, we essentially have a new consonant sound.
These altered characters, however, do not appear in the main syllabary, as they are considered simply as variations of the ka-line. Why? Because the “k” sound and the “g” sound are essentially the same except for one small difference - the “g” sound is voiced, while the “k” sound is not.
If you’re not sure what a voiced or unvoiced sound is, say aloud the English “k” sound alone without a vowel, and compare this with what happens when you do the same with an English “g”. You should notice that your mouth moves in much the same way, but while you don’t use your voice for the “k” sound, you do for the “g”. This is because “g” is a voiced consonant, whereas “k” is not.
So, in Japanese, the unvoiced consonant sounds - that is, all sounds in each of the ka-, sa-, ta- and ha-lines - can be altered to create a voiced sound that is written in a similar way to their unvoiced counterparts. The other lines (na, ma, ya, ra and wa) don't have these because these sounds are already voiced.
Additionally, in some cases, words that normally use the unvoiced sound (eg. the “k” sound) use the voiced sound (eg. “g”) instead when combined with other words, as it may be easier to say. For example, the number “three” is “san” and the word for “floor” (of a building) is “kai”, yet the third floor could be referred to as “san gai”. This kind of adaptation can be seen all throughout the language.
Of course, like the main sounds, there are a few exceptions among these voiced alternatives, so let's look at each line individually.
Like the ka-line itself, these are nice and straightforward.
The one to note here is the second sound, which is pronounced “ji”, not “zi”.
The second sound, “ji”, is effectively the same as that from the modified sa-line above, and is rarely used. (If you need to type it, type "di", as typing "ji" will usually produce the za-line version).
The “dzu” sound is basically a heavier version of the “tsu” sound where the “dz” is a voiced version of the unvoiced “ts” sound. Just be careful, as repeating this sound may lead you towards a career in beat-boxing (sorry...).
This brings us to the last of these unvoiced sounds, the ha-line. However, this line is unique as it actually has two voiced alternatives - a “b” sound and a “p” sound.
Firstly, the “b” sound is made by adding two lines (dakuten) like the others:
Meanwhile, the “p” sound is achieved by adding a small circle (handakuten, or "half" dakuten, since it is considered half-voiced) instead of two lines, as follows:
As you can see, both the “b” and “p” variations of the ha-line are straightforward and don't have any special sounds.
We have now covered all of the individual sounds in Japanese (ie. the ones that just use a single kana character). Now let's look at a few other sounds that are created by combining sounds together, plus a couple of important points to remember when speaking Japanese.
Small ya-line combinations
The three ya-line sounds can be combined with any of the sounds that end in “i” (except for “i” itself from the “a-line”) to produce another variation of sounds.
When written, the ya-line sounds are written smaller than regular characters. For example, “ki” + “small ya” would become “kya”, as if you were saying “ki” and then “ya” but without the “i” sound.
In the case of the sa-line, “shi” is the character with the “i” sound, so instead of “sya”, “syu” and “syo”, combining “shi” with the small ya-line characters produces the sounds “sha”, “shu” and “sho”. This idea also applies to some other sounds, as you will see below.
Here are all of the small ya-line combination versions of the main sounds:
Plus there are the voiced consonant variations:
Note that when a lone “n” sound is followed by a regular ya-line sound, it may be written in romaji as, for example, “n’ya” or "nnya". These should be pronounced as two separate sounds, and not joined together like the “nya”, “nyu” and “nyo” sounds above.
Small "tsu" (double consonants)
Some words, when written in Japanese, contain a small “tsu” inserted between other characters. When this is done, the word is pronounced with a tiny pause where the small “tsu” occurs, followed by an accentuation of the sound that follows the small “tsu”.
This must always be a consonant sound, and usually a hard, unvoiced or half-voiced sound (k, s, t, p). When written in romaji, the small “tsu” is instead written as a double letter.
Examples of words that have a small tsu/double consonant include Sapporo, Hokkaido, Nissan, and Nippon (an alternative to the word “Nihon”, meaning “Japan”, and often chanted by fans at international sporting events).
Even weighting of sounds, and no accents
When spoken, each kana character is given the same weighting, or an equal amount of time, and there is no accent placed on any of the characters.
To demonstrate this, consider the city of Osaka. Many English speakers will naturally put the accent on the first “a” and draw out this sound, so it sounds something like “Osaaka”.
In fact, when written in Japanese, Osaka is actually “おおさか” (”oosaka”). Since each kana character is given equal time, Osaka is actually a four character word pronounced “o-o-sa-ka”, with no accent anywhere, and the “o” sound making up half of the word.
The Japanese word for “hello” is similar. As mentioned earlier, this should actually be pronounced “ko-n-ni-chi-wa”, with a longer “n” sound than most English speakers normally say, and no accent on the first “i” (or anywhere else).
(Note that in hiragana, "konnichiwa" should be written as 「こんにちは」, since the 「は」is the particle pronounced "wa". It's a particle because the word as a whole is a contraction of a longer phrase that is basically never used in full. The same is true for "konbanwa", which appears in the Practice Words section below.)
Another example might be “karate”. Like Osaka, the second syllable is usually accented by English speakers, but in fact equal time and weight should be given to each of “ka”, “ra” and “te”:
Elongated vowel sounds
When a sound is followed immediately by the same vowel sound, it is usually elongated as in the above example of “Osaka”. This applies whether the first of the repeated vowel sounds is paired with a consonant or not. For example, “toori”, meaning “street”, has an elongated “o” sound just the same as that in “Osaka”.
When written in romaji, my preferred method for expressing elongated sounds is with a line on the top of the vowel: ā, ī, ū, ē, ō. We can see this in "tōri" above.
The other main alternative is to repeat the vowel, effectively writing it as it would be typed in hiragana. In this case, however, note that an elongated "ō" sound is sometimes written as "ou", as this is how some such words are written in hiragana, as explained below.
When written in hiragana, elongated vowel sounds are usually expressed using the appropriate a-line character: おいしい (oishī = delicious), じゅう (jū = ten), etc. In the case of “o” sounds, however, the elongation of the “o” is often expressed with an 「う」 instead of an 「お」, such as in 「ありがとう」 (arigatō = thanks) and 「にちようび」 (nichiyōbi = Sunday).
In katakana, rather than using the a-line character, elongated vowel sounds are written with a 「ちょうおんぷ」 (chōonpu), or “long sound mark”: 「ー」. Examples of this can be seen in the words 「ケーキ」 (kēki = cake), 「コーヒー」 (kōhī = coffee) and 「スーパー」 (sūpā = supermarket).
Of course, these sounds are only useful to us if we combine them to form words! Here are some useful words you can use to practice combining some of the sounds introduced above:
See you later
Nice to meet you
*"The "u" part of the "su" sound at the end of "ohayō gozaimasu" and "arigatō gozaimasu" are usually silent, hence these words end sounding like "mas".
**The "i" part of the "shi" sounds in "hajimemashite" and "dō itashimashite" are usually silent, hence these words end sounding like "shte".