• webrtc音频处理算法


    音频处理有以下几个模块:

    1.降噪NS

    2.回音消除aec

    3.回声控制acem

    4.音频增益agc

    5.静音检测

    1.降噪NS -noice_suppression.h

    原理:

    维纳滤波原理

    输入信号通过一个线性时不变系统之后产生一个输出信号,使得输出信号尽量逼近期望信号,使其估计误差最小化,

    能够最小化这个估计误差的最优滤波器称为维纳滤波器。

    基本思想:

    对接收到的每一帧带噪语音信号,以对该帧的初始噪声估计为前提,定义语音概率函数,测量每一帧带噪信号的分类特征,使用测量出来的分类特征,计算每一帧基于多特征的语音概率,在对计算出的语音概率进行动态因子(信号分类特征和阈值参数)加权,根据计算出的每帧基于特征的语音概率,修改多帧中每一帧的语音概率函数,以及使用修改后每帧语音概率函数,更新每帧中的初始噪声(连续多帧中每一帧的分位数噪声)估计.

    特征值

    频谱平坦度计算(Compute Spectral Flatness)

    假设语音比噪声有更多的谐波,语音频谱往往会在基频(基音)和谐波中出现峰值,噪声相对平坦。

    计算公式:

    N:STFT后频率点数

    B:频率带数量

    K:频点指数

    j:频带指数

    每个频带有大量频率点,128个频率点可分成4个频带(低带,中低频带,中高频带,高频),每个频带32个频点

    对于噪声Flatness偏大且为常数,而对于语音,计算出的数量则偏下且为变量。

    对应代码:计算特征数据光谱平坦度featuredata[0] 

    // Compute spectral flatness on input spectrum 计算
    // magnIn is the magnitude spectrum
    // spectral flatness is returned in inst->featureData[0]
    void WebRtcNs_ComputeSpectralFlatness(NSinst_t* inst, float* magnIn) {
      int i;
      int shiftLP = 1; //option to remove first bin(s) from spectral measures
      float avgSpectralFlatnessNum, avgSpectralFlatnessDen, spectralTmp;
    
      // comute spectral measures
      // for flatness  跳过第一个频点,即直流频点Den是denominator(分母)的缩写,avgSpectralFlatnessDen是上述公式分母计算用到的  
      avgSpectralFlatnessNum = 0.0;
      avgSpectralFlatnessDen = inst->sumMagn;
      for (i = 0; i < shiftLP; i++) {
        avgSpectralFlatnessDen -= magnIn[i];
      }
      // compute log of ratio of the geometric to arithmetic mean: check for log(0) case
      //TVAG是time-average的缩写,对于能量出现异常的处理。利用前一次平坦度直接取平均返回。  
      for (i = shiftLP; i < inst->magnLen; i++) {
        if (magnIn[i] > 0.0) {
          avgSpectralFlatnessNum += (float)log(magnIn[i]);
        } else {
          inst->featureData[0] -= SPECT_FL_TAVG * inst->featureData[0];
          return;
        }
      }
      //normalize
      avgSpectralFlatnessDen = avgSpectralFlatnessDen / inst->magnLen;
      avgSpectralFlatnessNum = avgSpectralFlatnessNum / inst->magnLen;
    
      //ratio and inverse log: check for case of log(0)
      spectralTmp = (float)exp(avgSpectralFlatnessNum) / avgSpectralFlatnessDen;
    
      //time-avg update of spectral flatness feature
      inst->featureData[0] += SPECT_FL_TAVG * (spectralTmp - inst->featureData[0]);
      //inst->featureData[0] +=0-SPECT_FL_TAVG * inst->featureData[0]
      // done with flatness feature
    }

     代码逻辑

    Compute Spectral Difference 计算频谱差异

    假设:噪声频谱比语音频谱更稳定,因此,假设噪声频谱体形状在任何给定阶段都倾向于保持相同,
    此特征用于测量输入频谱与噪声频谱形状的偏差

    通过相关系数计算差异性

    相关系数计算:cov(x,y)^2/(D(x)D(Y))

     代码:// Compute the difference measure between input spectrum and a template/learned noise spectrum

    // magnIn is the input spectrum
    // the reference/template spectrum is inst->magnAvgPause[i]
    // returns (normalized) spectral difference in inst->featureData[4]
    void WebRtcNs_ComputeSpectralDifference(NSinst_t* inst, float* magnIn) {
      // avgDiffNormMagn = var(magnIn) - cov(magnIn, magnAvgPause)^2 / var(magnAvgPause)
      int i;
      float avgPause, avgMagn, covMagnPause, varPause, varMagn, avgDiffNormMagn;
    
      avgPause = 0.0;
      avgMagn = inst->sumMagn;
      // compute average quantities
      for (i = 0; i < inst->magnLen; i++) {
        //conservative smooth noise spectrum from pause frames
        avgPause += inst->magnAvgPause[i];
      }
    avgPause
    = avgPause / ((float)inst->magnLen); avgMagn = avgMagn / ((float)inst->magnLen); covMagnPause = 0.0; varPause = 0.0; varMagn = 0.0; // compute variance and covariance quantities
    //
    covMagnPause 协方差 varPause varMagn 各自的方差

    for (i = 0; i < inst->magnLen; i++) { covMagnPause += (magnIn[i] - avgMagn) * (inst->magnAvgPause[i] - avgPause); varPause += (inst->magnAvgPause[i] - avgPause) * (inst->magnAvgPause[i] - avgPause); varMagn += (magnIn[i] - avgMagn) * (magnIn[i] - avgMagn); } covMagnPause = covMagnPause / ((float)inst->magnLen); varPause = varPause / ((float)inst->magnLen); varMagn = varMagn / ((float)inst->magnLen); // update of average magnitude spectrum inst->featureData[6] += inst->signalEnergy; //计算相关系数 avgDiffNormMagn = varMagn - (covMagnPause * covMagnPause) / (varPause + (float)0.0001); // normalize and compute time-avg update of difference feature 归一化 avgDiffNormMagn = (float)(avgDiffNormMagn / (inst->featureData[5] + (float)0.0001)); inst->featureData[4] += SPECT_DIFF_TAVG * (avgDiffNormMagn - inst->featureData[4]); }
    
    

    Compute SNR

    根据分位数噪声估计计算前后信噪比。

    后验信噪比:观测到的能量与噪声功率相关输入功率相比的瞬态SNR:

    Y:输入含有噪声的频谱

    N:噪声频谱

    先验SNR是与噪声功率相关的纯净信号功率的期望值

    X:输入的纯净信号(语音信号)

    webrtc实际计算采用数量级而非平方数量级

    纯净信号是未知信号,先验 SNR的估计是上一帧经估计的先验SNR和瞬态SNR的平均值

    ydd:时间平滑参数 值越大 流畅度越高 延迟越大

    H:smooth 上一帧的维纳滤波器

     相关代码计算snrloc snrpri snrlocpost

      1011        // directed decision update of snrPrior
      1012:       snrLocPrior[i] = DD_PR_SNR * previousEstimateStsa[i] + ((float)1.0 - DD_PR_SNR)* snrLocPost[i];
      1013        // post and prior snr needed for step 2
    
      1106        // directed decision update of snrPrior
      1107:       snrPrior = DD_PR_SNR * previousEstimateStsa[i] + ((float)1.0 - DD_PR_SNR)
      1108                   * currentEstimateStsa;

    lrt因素平均值 feature[3]

    // compute feature based on average LR factor
      // this is the average over all frequencies of the smooth log lrt
      logLrtTimeAvgKsum = 0.0;
      for (i = 0; i < inst->magnLen; i++) {
        tmpFloat1 = (float)1.0 + (float)2.0 * snrLocPrior[i];
        tmpFloat2 = (float)2.0 * snrLocPrior[i] / (tmpFloat1 + (float)0.0001);
        besselTmp = (snrLocPost[i] + (float)1.0) * tmpFloat2;
        inst->logLrtTimeAvg[i] += LRT_TAVG * (besselTmp - (float)log(tmpFloat1)
                                              - inst->logLrtTimeAvg[i]);
        logLrtTimeAvgKsum += inst->logLrtTimeAvg[i];
      }
      logLrtTimeAvgKsum = (float)logLrtTimeAvgKsum / (inst->magnLen);
      inst->featureData[3] = logLrtTimeAvgKsum;
      // done with computation of LR factor
    /*
     *  Copyright (c) 2012 The WebRTC project authors. All Rights Reserved.
     *
     *  Use of this source code is governed by a BSD-style license
     *  that can be found in the LICENSE file in the root of the source
     *  tree. An additional intellectual property rights grant can be found
     *  in the file PATENTS.  All contributing project authors may
     *  be found in the AUTHORS file in the root of the source tree.
     */
    
    #ifndef WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_
    #define WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_
    
    #include "typedefs.h"
    
    typedef struct NsHandleT NsHandle;
    
    #ifdef __cplusplus
    extern "C" {
    #endif
    
    /*
     * This function creates an instance to the noise suppression structure
     *
     * Input:
     *      - NS_inst       : Pointer to noise suppression instance that should be
     *                        created
     *
     * Output:
     *      - NS_inst       : Pointer to created noise suppression instance
     *
     * Return value         :  0 - Ok
     *                        -1 - Error
     */
    //输入应创建的降噪实例指针,输出已创建的降噪实例指针,返回成功获失败
    int WebRtcNs_Create(NsHandle** NS_inst); /* * This function frees the dynamic memory of a specified noise suppression * instance. * * Input: * - NS_inst : Pointer to NS instance that should be freed * * Return value : 0 - Ok * -1 - Error */
    //释放降噪指针空间
    int WebRtcNs_Free(NsHandle* NS_inst); /* * This function initializes a NS instance and has to be called before any other * processing is made. * * Input: * - NS_inst : Instance that should be initialized * - fs : sampling frequency * * Output: * - NS_inst : Initialized instance * * Return value : 0 - Ok * -1 - Error */
    //初始化降噪实例、采样率,在其他处理之前调用
    int WebRtcNs_Init(NsHandle* NS_inst, uint32_t fs); /* * This changes the aggressiveness of the noise suppression method. * * Input: * - NS_inst : Noise suppression instance. * - mode : 0: Mild, 1: Medium , 2: Aggressive * * Output: * - NS_inst : Updated instance. * * Return value : 0 - Ok * -1 - Error */
    //设置降噪级数
    输入:需要进行噪声处理的实例、降噪级数
    输出:处理后的实例
    mode 0:轻度,1:中度,2:重度

    //
    int WebRtcNs_set_policy(NsHandle* NS_inst, int mode); /* * This functions does Noise Suppression for the inserted speech frame. The * input and output signals should always be 10ms (80 or 160 samples). * * Input * - NS_inst : Noise suppression instance. * - spframe : Pointer to speech frame buffer for L band * - spframe_H : Pointer to speech frame buffer for H band * - fs : sampling frequency * * Output: * - NS_inst : Updated NS instance * - outframe : Pointer to output frame for L band * - outframe_H : Pointer to output frame for H band * * Return value : 0 - OK * -1 - Error */
    //对插入的语音帧进行处理,输入和输出信号总保持在10ms长度
    //采样个数为80/160
    //输入噪声消除实例 ,L波段语音帧缓冲区的指针,H波段的语音帧缓冲区指针,各自对应的输出指针

    int WebRtcNs_Process(NsHandle* NS_inst, short* spframe, short* spframe_H, short* outframe, short* outframe_H); /* Returns the internally used prior speech probability of the current frame. * There is a frequency bin based one as well, with which this should not be * confused. * * Input * - handle : Noise suppression instance. * * Return value : Prior speech probability in interval [0.0, 1.0]. * -1 - NULL pointer or uninitialized instance. */
    //返回当前帧使用先前语音的概率 输入实例,返回概率
    float WebRtcNs_prior_speech_probability(NsHandle* handle); #ifdef __cplusplus } #endif #endif // WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_

    参考:https://blog.csdn.net/godloveyuxu/article/details/73657931

  • 相关阅读:
    ZOJ3329One Person Game(循环型 数学期望)
    ZOJ3551Bloodsucker (数学期望)
    HDU3853LOOPS (师傅逃亡系列•三)(基础概率DP)
    HDU4035 Maze(师傅逃亡系列•二)(循环型 经典的数学期望)
    阿里云DataV专业版发布,为可视化创造更多可能!
    Logtail提升采集性能
    达摩院首席数据库科学家李飞飞:云原生新战场,我们如何把握先机?
    什么是最佳的视频用户体验?阿里云视频服务四大体验优化实践
    Lambda plus: 云上大数据解决方案
    基于大数据的舆情分析系统架构
  • 原文地址:https://www.cnblogs.com/supermanwx/p/16285787.html
Copyright © 2020-2023  润新知