ConvertingaudiocontenttotextformatwithPython

时间：2023-03-17 11:48:50 科技观察

Whenrecordingtheconversationofoneormorepeople,itisveryusefultoextractspokenlanguageintotextinahighlyaccurateandautomatedway.Onceconvertedtotext,itcanbeusedforfurtheranalysisorforotherfunctions.Inthistutorial,wewilluseahigh-precisionspeech-to-textwebAPIcalledAssemblyAI(https://www.assemblyai.com/)toextracttextfromMP3recordings(manyotherformatsarealsosupported).在本教程中，音频文件示例下载地址请扫描本文下方二维码添加Python小助手获取，下面是音频输出如下所示的高精度文本转录内容：Anobjectrelationalmapperisacodelibrarythatautomatesthetransferofdatastoredinrelational,databasesintoobjectsthataremorecommonlyusedinapplicationcodeorEMSareusefulbecausetheyprovideahighlevelabstractionuponarelationaldatabasethatallowsdeveloperstowritePythoncodeinsteadofsequeltocreatereadupdateanddelete,dataandschemasintheirdatabase.Developerscanusetheprogramminglanguage.TheyarecomfortablewithtoworkwithadatabaseinsteadofwritingSQL...教程RequirementsInthistutorial,wewillusethefollowingdependencies,whichwillbeinstalledlater.PleasemakesureyoualsohavePython3installedinyourenvironment,preferably3.6orhigher:Wewillusethefollowingdependencytocompletethistutorial:requests2.24.0tomakeHTTPrequeststotheAssemblyAISpeech-to-TextAPIAnAssemblyAIaccount,youYoucanregisterafreeAPIaccesskeyhere(https://app.assemblyai.com/login/)Todownloadallthecodesinthisarticle,pleasescantheQRcodeatthebottomofthisarticleandaddthePythonassistanttogetit.BuildthedevelopmentenvironmentGotothedirectorywherethePythonvirtualenvironmentissaved.我将我的保存在用户主目录下的venvs子目录中。使用以下命令为此项目创建一个新的virtualenv。python3-mvenv~/venvs/pytranscribe用shell命令激活virtualenv：source~/venvs/pytranscribe/bin/activate执行完上面的命令后，命令提示符会改变，所以virtualenv的名字会以原来的命令开头提示格式，如果你的提示只是$，那么它看起来像这样：(pytranscribe)$请记住，你必须在每个使用依赖项的virtualenv中的新终端窗口中激活你的virtualenv。我们现在可以将请求包安装到已激活但为空的virtualenv中。pipinstallrequests==2.24.0查找类似于以下内容的输出以确认已从PyPI正确安装了相应的包。(pytranscribe)$pipinstallrequests==2.24.0Collectingrequests==2.24.0Usingcachedhttps://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whlCollectingcertifi>=2017.4.17（FromRequests==2.24.0）使用CachedHttps：//files.pythonhosted.org/packages/5e/c4/c4/6c4fe72df5343c3432222226f0b4e0b0b0bb4eb0bbb4eb0442e42e42e42e428b48b48b4718b4718bb4718bb4718bbb4nb4718bbb4nb4718bbb4nbpy8bbb418bbb4nbybryty.18bbb4nbybryty.18bb4nbybry.ant.18bbb4nbybry.ant.18bbb4nbbybyby.ant.18bbb4nbbybfim。=1.25.1,<1.26,>=1.21.1(fromrequests==2.24.0)使用缓存https://files.pythonhosted.org/packages/9f/f0/a391d1463ebb1b233795cabfc0ef38d3db4442339de68f847026199e69d7.2py2py5.0-1-any.whlCollectingchardet<4,>=3.0.2(fromrequests==2.24.0)使用缓存https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-npy-3.0.3-3.0any.whlCollectingidna<3,>=2.5(fromrequests==2.24.0)使用缓存https://files.pythonhosted.org/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whlInstallingcollectedpackages:certifi,urllib3,chardet,idna,requestsSuccessfullyinstalledcertifi-2020.6.20chardet-3.0.4idna-2.10requests-2.24.0urllib3-1.25.10我们已经安装了所有必需的依赖项，因此我们可以开始对应用程序进行编码上传、启动和转录音频我们已经完成了开始构建将音频转换为文本的应用程序所需的一切。我们将在三个文件中构建此应用程序：1.upload_audio_file.py：将您的音频文件上传到AssemblyAI服务上的安全位置，以便对其进行处理。如果您的音频文件已经可以通过公共URL访问，则无需执行此步骤，只需按照此快速入门(https://docs.assemblyai.com/overview/getting-started)2.initial_transcription.py：告诉APItoFilesthataretranscribedandstartedimmediately3.get_transcription.py：显示转录的状态（如果它仍在处理中），或显示处理完成时的转录结果创建一个名为pytranscribe的新目录来存储文件，因为我们写他们。然后切换到新的项目目录。mkdirpytranscibecdpytranscribe我们还需要将AssemblyAIAPI密钥导出为环境变量。注册一个AssemblyAI帐户并登录到AssemblyAI仪表板，然后复制“您的API令牌”，如下图所示：exportASSEMBLYAI_KEY=your-api-key-here请注意，必须在每个命令行窗口中使用导出命令以确保此密钥可访问。如果您不在运行脚本的环境中将令牌导出为ASSEMBLYAI_KEY，我们正在编写的脚本将无法访问API。现在我们已经创建了项目目录并将API密钥设置为环境变量，让我们继续处理将音频文件上传到AssemblyAI服务的第一个文件的代码。上传音频文件并转录创建一个名为upload_audio_file.py的新文件并将以下代码放入其中：上传完成wecantheninitiatethetranscriptionAPIcall.ReturnstheAPIJSONifsuccessful,orNoneiffiledoesnotexist."""ifnotos.path.exists(filename):returnNonedefread_file(filename,chunk_size=5242880):withopen(filename,'rb')as_file:whileTrue:dataif_file.chunsize(chunsize)breakyielddataheaders={'authorization':os.getenv("ASSEMBLYAI_KEY")}response=requests.post("".join([API_URL,"upload"]),headersheaders=headers,data=read_file(filename))returnresponse.json()中上面的代码导入了argparse、os和request包，以便我们可以在此脚本中使用它们。API_URL是AssemblyAI服务的基本URL常量。我们用一个参数定义upload_file_to_api函数，filename应该是一个字符串，包含文件的绝对路径及其文件名。在函数中，我们检查文件是否存在，然后使用Request的分块传输编码将大文件流式传输到AssemblyAIAPI。os模块的getenv函数使用带有getenv的导出命令读取命令行上设置的API。确保在运行此脚本的终端中使用导出命令，否则ASSEMBLYAI_KEY值将为空。如有疑问，请使用echo$ASSEMBLY_AI查看该值是否与您的API密钥匹配。要使用upload_file_to_api函数，请将以下代码行添加到upload_audio_file.py文件中，以便我们可以将此代码正确执行为使用python命令调用的脚本：if__name__=="__main__":parser=argparse.ArgumentParser()parser.add_argument("filename")args=parser.parse_args()upload_filename=args.filenameresponse_json=upload_file_to_api(upload_filename)ifnotresponse_json:print("filedoesnotexist")else:print("FileuploadedtoURL:{}".format(response_json['upload_url']))上面的代码创建了一个ArgumentParser对象，它允许应用程序从命令行获取单个参数来指定我们要访问、读取和上传到AssmeblyAI服务文件的对象。如果文件不存在，脚本将显示找不到文件的消息。在路径中我们确实找到了正确的文件，然后使用upload_file_to_api函数中的代码上传了文件。通过使用python命令在命令行上运行它来执行完整的upload_audio_file.py脚本。将FULL_PATH_TO_FILE替换为您要上传的文件的绝对路径，例如/Users/matt/devel/audio.mp3。pythonupload_audio_file.pyFULL_PATH_TO_FILE假定文件位于您指定的位置，当脚本完成上传文件时，它将打印一条带有唯一URL的消息：FileuploadedtoURL:https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638此URL不公开，只能由AssemblyAI服务使用，因此除了您及其转录的API之外，没有其他人可以访问您的文件及其内容。重要的部分是URL的最后一部分，在此示例中为463ce27f-0922-4ea9-9ce4-3353d84b5638。保存此唯一标识符，因为我们需要将其传递给启动转录服务的下一个脚本。开始转录接下来，我们将编写一些代码来开始转录。创建一个名为initial_transcription.py的新文件。将以下代码添加到新文件中。importargparseimportosimportrequestsAPI_URL="https://api.assemblyai.com/v2/"CDN_URL="https://cdn.assemblyai.com/"definitiate_transcription(file_id):"""SendsarequesttotheAPItotranscribeaspecificfilethatwaspreviouslyuploadedtotheAPI.Thiswillnotimmediatelyreturnthetranscriptionbecauseittakesamomentfortheservicetoanalyzeandperformthetranscription,sothereisadifferentfunctiontoretrievetheresults."""endpoint="".join([API_URL,"transcript"])json={"audio_url":"".join([CDN_URL,"upload/{}".format(file_id)])}headers={"authorization":os.getenv("ASSEMBLYAI_KEY"),"content-type":"application/json"}response=requests.post(endpoint,jsonjson=json,headersheaders=headers)returnresponse.json()我们有与前面脚本相同的导入，并添加了一个新的常量CDN_URL，它与AssemblyAI存储上传音频文件的单独URL相匹配。initiate_transcription函数本质上只是向AssemblyAIAPI设置一个HTTP请求，以使用传入的指定URL启动音频文件的转录过程。这就是file_id传递很重要的原因：完成我们告诉AssemblyAI检索的音频文件的URL。通过附加此代码来完成文件，以便可以使用命令行中的参数轻松调用它。if__name__=="__main__":parser=argparse.ArgumentParser()parser.add_argument("file_id")args=parser.parse_args()file_id=args.file_idresponse_json=initiate_transcription(file_id)print(response_json)通过在initiate_transcription文件上运行python命令启动脚本，传入您在上一步中保存的唯一文件标识符。#theFILE_IDENTIFIER在上一步中返回，#looksomethinglikethis:463ce27f-0922-4ea9-9ce4-3353d84b5638pythoninitiate_transcription.pyFILE_IDENTIFIERAPI会将此脚本打印到命令行的JSON响应发回。{'audio_end_at':None,'acoustic_model':'assemblyai_default','text':None,'audio_url':'https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638','speed_boost'：假，'language_model'：'assemblyai_default'，'redact_pii'：假，'confidence'：无，'webhook_status_code'：无，'id'：'gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067'，'status':'queued','boost_param':None,'words':None,'format_text':True,'webhook_url':None,'punctuate':True,'utterances':None,'audio_duration':None,'auto_highlights':False,'word_boost':[],'dual_channel':None,'audio_start_from':None}记下JSON响应中id键的值。这是我们检索成绩单结果所需的成绩单标识符。在这个例子中，它是gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067。将转录标识符复制到您自己的响应中，因为我们将在下一步中需要它来检查转录过程何时完成。检索转录结果我们已经上传并开始了转录过程，因此我们会在结果准备好后立即获得结果。返回结果所需的时间取决于文件的大小，因此下一个脚本将向脚本发送HTTP请求并报告转录状态，或在完成时打印输出。创建名为get_transcription.py的第三个Python文件，并将以下代码放入其中。importargparseimportosimportrequestsAPI_URL="https://api.assemblyai.com/v2/"defget_transcription(transcription_id):"""从API请求转录并返回JSON响应。"""endpoint="".join([API_URL,"transcript/{}".format(transcription_id)])headers={"authorization":os.getenv('ASSEMBLYAI_KEY')}response=requests.get(endpoint,headersheaders=headers)returnresponse.json()if__name__=="__main__":parser=argparse.ArgumentParser()parser.add_argument("transcription_id")args=parser.parse_args()transcription_id=args.transcription_idresponse_json=get_transcription(transcription_id)ifresponse_json['status']=="completed":forwordinresponse_json['words']:print(word['text'],end="")else:print("currentstatusoftranscriptionrequest:{}".format(response_json['status']))上面的代码与其他脚本工具具有相同的导入对象。在这个新的get_transcription函数中，我们只需使用我们的API密钥和上一步中的转录标识符（而不是文件标识符）调用AssemblyAIAPI。我们检索JSON响应并将其返回。在main函数中，我们处理作为命令行参数传入的转录标识符，并将其传递给get_transcription函数。如果来自get_transcription函数的响应JSON包含完成状态，那么我们将打印转录结果。否则，在完成之前打印当前状态，如排队或处理中。使用命令行和上一节的转录标识符调用脚本：pythonget_transcription.pyTRANSCRIPTION_ID如果服务尚未开始处理脚本，它将返回排队如下：currentstatusofttranscriptionrequest:queued当服务当前正在处理音频文件时，它将返回处理：currentstatusofttranscriptionrequest:processing当该过程完成时，我们的脚本将返回转录的文本，正如您在这里看到的：一个对象关系映射器是一个代码库，它自动将存储在关系数据库中的数据传输到应用程序代码或EMS中更常用的对象中，这很有用，因为它们提供了一个高级别，这就是我们完成转录的...（恶毒）你可能想知道如果精度不适合您的情况。这是需要使用提高关键字或短语准确度的方法（https://docs.assemblyai.com/guides/boosting-accuracy-for-keywords-or-phrases）和选择模型的方法匹配数据(https://docs.assemblyai.com/guides/transcribing-with-a-different-acoustic-or-custom-language-model)。您可以使用这些方法中的任何一种来将记录的准确性提高到适合您情况的水平。下一步是什么？我们刚刚完成了一些脚本，这些脚本调用AssemblyAIAPI将带有语音的录音转录为文本输出。您可以查阅文档(https://docs.assemblyai.com/overview/getting-started)以添加一些更高级的功能：支持不同的文件格式转录双声道/立体声录音获取扬声器标签（扬声器隔离）

上一篇：HowtoClearRAMMemoryCache,Cache,andSwapSpaceonLinux

下一篇：simple-side-drawer实现侧边菜单

ConvertingaudiocontenttotextformatwithPython相关文章