環境

Google.Cloud.TextToSpeech.V1 (1.0.0-beta02)

この記事は、まだ TTS がプレリリースのころのものです。
2018 年 6月 19日のバージョンを使用しています。

GCP Text To Speech の準備

テキストを音声に変換することができる TTS (Text to speech) をやってみる。
目標は、好きな文章を音声 (mp3ファイル) で出力すること。

前準備として、 GCP を使える状態にします。
これは、「C# で GCP (無料トライアル) の画像認識をする (VisualStuio 2015)」でやったので割愛します。（課金の設定まで必要ですが、もちろん無料枠で遊びます）

今回は GCP のサービスから Text to speech の API を有効にします。検索に「text」と入れると「Cloud Text-to-speech API」が出てきました。有効にすればおしまいです。
あとは Visual Studio でコーディング。

f:id:shikaku_sh:20181102133321p:plain:w400

念のため、無料枠の料金は、 google でちゃんと確認しておきましょう。あとで変わっていると、よろしくありません。

NuGet で参照の追加

f:id:shikaku_sh:20181102132902p:plain:w450

「Google.Cloud.TextToSpeech.V1 (1.0.0-beta02)」をインストールします。パッケージの検索は、プレリリースを含めるのチェックを有効にします。有効にしないと、検索できません。

コーディング

f:id:shikaku_sh:20181102132732p:plain:w450

Google の Text-to-speech のページは、こんな感じのテストができるようになっています。なので、こんな感じのプログラムを作ることにします。

f:id:shikaku_sh:20181102132806p:plain:h400

私が作ったのは、こんなの。テキストボックスに読み上げるテキストを入力して、ボタンをクリックする。「再生」なら音声を鳴らす。「保存」なら MP3 ファイルを保存します。

日本語は Wavenet の対応が A しか、今はないみたいです。
詳細は Supported Voices です。

Google からも、詳しいやり方のマニュアルがあります。

https://codelabs.developers.google.com/codelabs/cloud-text-speech-csharp/index.html?index=..%2F..%2Findex#0

まずは、音声を鳴らすところから始めます。

var text = SpeechText.Text;
var name = (UseWaveNet.IsChecked ?? false) ? "ja-JP-Wavenet-A" : "ja-JP-Standard-A";
var speed = SpeedSlider.Value;
var pitch = PitchSlider.Value;

var input = new SynthesisInput { Text = text };
var voiceSection = new VoiceSelectionParams
{
    Name = name,
    LanguageCode = "ja-JP",
    SsmlGender = SsmlVoiceGender.Female,
};
var audioConfig = new AudioConfig
{
    AudioEncoding = AudioEncoding.Linear16,
    SpeakingRate = speed,
    Pitch = pitch,
};

var response = _Client.SynthesizeSpeech(input, voiceSection, audioConfig);

using (var memoryStream = new MemoryStream(response.AudioContent.ToArray(), true))
{
    var player = new System.Media.SoundPlayer(memoryStream);
    player.Play();
}

Player はクラス変数にしたほうがよいと思います。

MP3 ファイルの保存は、もっと簡単。
というよりも、マニュアルにも書いてあるとおりです。

var text = SpeechText.Text;
var name = (UseWaveNet.IsChecked ?? false) ? "ja-JP-Wavenet-A" : "ja-JP-Standard-A";
var speed = SpeedSlider.Value;
var pitch = PitchSlider.Value;

var input = new SynthesisInput { Text = text };
var voiceSection = new VoiceSelectionParams
{
    Name = name,
    LanguageCode = "ja-JP",
    SsmlGender = SsmlVoiceGender.Female,
};
var audioConfig = new AudioConfig
{
    AudioEncoding = AudioEncoding.Mp3,
    SpeakingRate = speed,
    Pitch = pitch,
};

var response = _Client.SynthesizeSpeech(input, voiceSection, audioConfig);

using (var output = File.Create("output.mp3"))
{
    response.AudioContent.WriteTo(output);
}

これで完成です。
やってみた感じ簡単でした。これだけでゆっくり音声（みたいなもの）が作れてしまうのはすごいな。

サンプル

「GCP_TextToSpeech_Sample」というサンプルをあげています。

sh1’s diary

プログラミング、読んだ本、資格試験、ゲームとか私を記録するところ

C# で GCP の TextToSpeech (TTS) を試す (Visual Studio 2015)

環境

GCP Text To Speech の準備

NuGet で参照の追加

コーディング

サンプル