1. 准备工作

话不多说,马上开始,首先登录 Azure portal,

1.1 选择 “认知服务”,添加一个新的 Speech 订阅 ,名称随便起

1.2 位置选择 东南亚

1.3 定价层选择 F0

所以选择地区时要选择 神经网络可用区域 才能使用微软xiaoxiao语音


1.3 Speech 部署完成后

点击左侧列表中的 “所有资源”连接,进入资源管理面板

1.4 选择资源,查看密钥

在资源面板点击刚才创建好的 MySpeechService,进入详情后点击 “密钥和终结点”,可以看到已经生成好的密钥,等一下调用 Speech 服务的时候需要用到。


After you've set your subscription key, run this application from your working
directory with this command: python
import os, requests, time
from xml.etree import ElementTree

# This code is required for Python 2.7
try: input = raw_input
except NameError: pass

If you prefer, you can hardcode your subscription key as a string and remove
the provided conditional statement. However, we do recommend using environment
variables to secure your subscription keys. The environment variable is
set to SPEECH_SERVICE_KEY in our sample.
For example:
subscription_key = "Your-Key-Goes-Here"

# if 'SPEECH_SERVICE_KEY' in os.environ:
#     subscription_key = os.environ['SPEECH_SERVICE_KEY']
# else:
#     print('Environment variable for your subscription key is not set.')
#     exit()

class TextToSpeech(object):
    def __init__(self, subscription_key):
        self.subscription_key = subscription_key
        self.tts = input("What would you like to convert to speech: ")
        self.timestr = time.strftime("%Y%m%d-%H%M")
        self.access_token = None

    The TTS endpoint requires an access token. This method exchanges your
    subscription key for an access token that is valid for ten minutes.
    def get_token(self):
        fetch_token_url = "" # 终结点
        headers = {
            'Ocp-Apim-Subscription-Key': self.subscription_key
        response =, headers=headers)
        self.access_token = str(response.text)

    def save_audio(self):
        ShortName = 'zh-CN-XiaoxiaoNeural' # 每月 5000个 字符免费
        # ShortName = 'zh-CN-Yaoyao-Apollo' # 每月 500 万个字符免费

        base_url = ''
        path = 'cognitiveservices/v1'
        constructed_url = base_url + path
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
            'Content-Type': 'application/ssml+xml',
            'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
            'User-Agent': 'YOUR_RESOURCE_NAME'
        xml_body = ElementTree.Element('speak', version='1.0')
        xml_body.set('{}lang', 'en-us')
        voice = ElementTree.SubElement(xml_body, 'voice')
        voice.set('{}lang', 'en-US')
        voice.set('name', ShortName) # Short name for 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
        voice.text = self.tts
        body = ElementTree.tostring(xml_body)

        response =, headers=headers, data=body)
        If a success response is returned, then the binary audio is written
        to file in your working directory. It is prefaced by sample and
        includes the date.
        if response.status_code == 200:
            with open('sample-' + self.timestr + '.wav', 'wb') as audio:
                print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")
            print("Reason: " + str(response.reason) + "\n")

if __name__ == "__main__":
    subscription_key = '8e3efb9cc44f4906b5c8921515cf4f3e'
    app = TextToSpeech(subscription_key)





邮箱地址不会被公开。 必填项已用*标注