最近发现有声读物能极大促进我的睡眠,但每个前面都有一段开场语,想把它剪掉,但是有多个开场语,所以就要用到语音识别判断一下再剪。
前两年在本地搭建过识别的环境,奈何识别准确率不行,只能找找API了,后面有时间再弄本地的吧。下面是几个大厂提供的服务,就我个人使用来看,讯飞 > Google > IBM,
但在中文识别准确度上,讯飞是最强的。
Oracle:
被它的Always Free计划吸了一波粉,但是提供的转写服务不支持中文,pass
IBM
优点:有一定的持续免费额度
缺点:准确度不够,官网访问有点慢
乱写的示例:
#coding:utf-8
'''
@version: python3.8
@author: ‘eric‘
@license: Apache Licence
@contact: steinven@qq.com
@software: PyCharm
@file: ibm.py
@time: 2021/6/16 23:05
'''
from __future__ import print_function
import traceback
apikey = ''
url = ''
from watson_developer_cloud import SpeechToTextV1
service = SpeechToTextV1(
iam_apikey=apikey,
url=url)
import os, re
#总资源文件目录
base_dir = r'36041981'
#子目录,存放已被裁剪好的长度为5s的x2m后缀文件(安卓端,喜马拉雅缓存文件),我估计其实就是常用的音频格式,就改了个后缀名
cliped_dir =os.listdir(os.path.join(base_dir,'clip'))
for each in cliped_dir:
try:
filename = re.findall(r"(.*?).x2m", each) # 取出.mp3后缀的文件名
if filename:
filename[0] += '.x2m'
with open(os.path.join(base_dir, 'clip', filename[0]),
'rb') as audio_file:
recognize_result = service.recognize(
audio=audio_file,
content_type='audio/mp3',
timestamps=False,
#中文模型,CN_BroadbandModel更准确一点
model='zh-CN_NarrowbandModel',
# model='zh-CN_BroadbandModel',
#这两个参数应该是让识别出来的文字更接近于提供的,但实际测试,并没什么用,不知道什么原因
# keywords=list(set([x for x in '曲曲于山川历史为解之谜拓展人生的长度广度人生的长度广度和深度由喜马拉雅联合大理石独家推出探秘类大家好欢迎大家订阅历史未解之谜全记录'])),
#keywords_threshold=0.1,
word_confidence=True).get_result()
if len(recognize_result['results'])==0:
with open('result-1.txt', 'a', encoding='utf-8') as f:
f.write('%s-%s
' % (filename[0], '-'))
continue
final_result = recognize_result['results'][0]['alternatives'][0]['transcript'].replace(' ', '')
with open('result-1.txt', 'a',encoding='utf-8') as f:
f.write('%s-%s
' % (filename[0], final_result))
except:
traceback.print_exc()
print(each)
优点:识别速度快
缺点:要挂代__理访问,需付费
文档:快速入门:使用客户端库,本地音频文件的话,不要用文档中的代码,可参考我下面的
乱写的示例:
# coding:utf-8
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "268675557.mp3")
def transcribe_file(speech_file):
"""Transcribe the given audio file."""
from google.cloud import speech
import io
client = speech.SpeechClient()
with io.open(speech_file, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED,
sample_rate_hertz=16000,
language_code="zh-CN",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
if __name__ == '__main__':
transcribe_file(AUDIO_FILE)
讯飞
优点:有限期的免费额度,识别速度快,中文识别最为准确,国内厂商,开发者上手很容易
缺点:识别速度慢,收费,还挺贵
代码就不贴了,官网很容易找到demo