• Speex 一个开源的声学回声消除器(Acoustic Echo Cancellation)(转)


    前段时间,搞了一阵声学回声消除,非常郁闷,因为没有成功,但可以说学到一点东西吧,至少理论上懂了一点。
    为什么需要声学回声消除呢?在一般的VOIP软件或视频会议系统中,假设我们只有A和B两个人在通话,首先,A的声音传给B,B然后用喇叭放出来,而这时B的MIC呢则会采集到喇叭放出来的声音,然后传回给A,如果这个传输的过程中时延足够大,A就会听到一个和自己刚才说过的话一样的声音,这就是回声,声学回声消除器的作用就是在B端对B采集到的声音进行处理,把采集到声音包含的A的声音去掉再传给A,这样,A就不会听到自己说过的话了。
    声学回声消除的原理我就不说了,这在网上有很多文档,网上缺少的是实现,所以,我在这把一个开源的声学回声消除器介绍一下,希望对有些有人用,如果有人知道怎么把这消除器用的基于实时流的VOIP软件中,希望能一起分享一下。
    这个声学回声消除器是一个著名的音频编解码器speex中的一部分,1.1.9版本后的回声消除器才起作用,以前版本的都不行,我用的也是这个版本,测试表明,用同一个模拟文件,它有效果比INTEL IPP库4.1版中的声学回声消除器的还要好。
    先说编译。首先,从
    www.speex.org上下载speex1.1.9的源代码,解压,打开speex\win32\libspeex中的libspeex.dsw,这个工作区里有两个工程,一个是libspeex,另一个是libspeex_dynamic。然后,将libspeex中的mdf.c文件添加到工程libspeex中,编译即可。
    以下是我根据文档封装的一个类,里面有一个测试程序: //file name: speexEC.h
    #ifndef SPEEX_EC_H
    #define SPEEX_EC_H
    #include <stdio.h>
    #include <stdlib.h>
    #include "speex/speex_echo.h"
    #include "speex/speex_preprocess.h" 
    class CSpeexEC
    {
    public:
    CSpeexEC();
    ~CSpeexEC();
    void Init(int frame_size=160, int filter_length=1280, int sampling_rate=8000); 
    void DoAEC(short *mic, short *ref, short *out);

    protected:
    void Reset();

    private:
    bool      m_bHasInit;
    SpeexEchoState*   m_pState;
        SpeexPreprocessState* m_pPreprocessorState;
    int      m_nFrameSize;
    int      m_nFilterLen;
    int      m_nSampleRate;
    float*      m_pfNoise;
    };

    #endif

    //fine name:speexEC.cpp
    #include "SpeexEC.h"

    CSpeexEC::CSpeexEC()
    {
    m_bHasInit   = false;
    m_pState   = NULL;
    m_pPreprocessorState  = NULL;
    m_nFrameSize   = 160;
    m_nFilterLen   = 160*8;
    m_nSampleRate   = 8000;
    m_pfNoise   = NULL;
    }

    CSpeexEC::~CSpeexEC()
    {
    Reset();
    }

    void CSpeexEC::Init(int frame_size, int filter_length, int sampling_rate)
    {
    Reset(); 

    if (frame_size<=0 || filter_length<=0 || sampling_rate<=0)
    {
      m_nFrameSize  =160;
      m_nFilterLen  = 160*8;
      m_nSampleRate = 8000;
    }
    else
    {
      m_nFrameSize  =frame_size;
      m_nFilterLen  = filter_length;
      m_nSampleRate = sampling_rate;
    }

    m_pState = speex_echo_state_init(m_nFrameSize, m_nFilterLen);
    m_pPreprocessorState = speex_preprocess_state_init(m_nFrameSize, m_nSampleRate);
    m_pfNoise = new float[m_nFrameSize+1];
    m_bHasInit = true;
    }

    void CSpeexEC::Reset()
    {
    if (m_pState != NULL)
    {
      speex_echo_state_destroy(m_pState);
      m_pState = NULL;
    }
    if (m_pPreprocessorState != NULL)
    {
      speex_preprocess_state_destroy(m_pPreprocessorState);
      m_pPreprocessorState = NULL;
    }
    if (m_pfNoise != NULL)
    {
      delete []m_pfNoise;
      m_pfNoise = NULL;
    }
    m_bHasInit = false;
    }

    void CSpeexEC:DoAEC(short* mic, short* ref, short* out)
    {
    if (!m_bHasInit)
      return;

    speex_echo_cancel(m_pState, mic, ref, out, m_pfNoise);
        speex_preprocess(m_pPreprocessorState, (__int16 *)out, m_pfNoise);
        
    }

    可以看出,这个回声消除器类很简单,只要初始化一下就可以调用了。但是,要注意的是,传给回声消除器的两个声音信号,必须同步得非常的好,就是说,在B端,接收到A说的话以后,要把这些话音数据传给回声消除器做参考,然后再传给声卡,声卡再放出来,这有一段延时,这时,B再采集,然后传给回声消除器,与那个参考数据比较,从采集到的数据中把频域和参考数据相同的部分消除掉。如果传给消除器的两个信号同步得不好,即两个信号找不到频域相同的部分,就没有办法进行消除了。
    测试程序:

    #define NN 160
    void main()
    {
    FILE* ref_fd, *mic_fd, *out_fd;
    short ref[NN], mic[NN], out[NN];
    ref_fd = fopen ("ref.pcm", "rb"); //打开参考文件,即要消除的声音
    mic_fd = fopen ("mic.pcm",  "rb");//打开mic采集到的声音文件,包含回声在里面
    out_fd = fopen ("echo.pcm", "wb");//消除了回声以后的文件

    CSpeexEC ec;
    ec.Init();

    while (fread(mic, 1, NN*2, mic_fd))
       {
          fread(ref, 1, NN*2, ref_fd);  
          ec.DoAEC(mic, ref, out);
          fwrite(out, 1, NN*2, out_fd);
       }
      
       fclose(ref_fd);
       fclose(mic_fd);
       fclose(out_fd);
    }

      以上的程序是用文件来模拟回声和MIC,但在实时流中是大不一样的,在一般的VOIP软件中,接收对方的声音并传到声卡中播放是在一个线程中进行的,而采集本地的声音并传送到对方又是在另一个线程中进行的,而声学回声消除器在对采集到的声音进行回声消除的同时,还需要播放线程中的数据作为参考,而要同步这两个线程中的数据是非常困难的,因为稍稍有些不同步,声学回声消除器中的自适应滤波器就会发散,不但消除不了回声,还会破坏原始采集到的声音,使被破坏的声音难以分辨。我做过好多尝试,始终无法用软件来实现对这两个线程中的数据进行同步,导致实现失败,希望有经验的网友们一起分享一下这方面的经验。



    示例代码:


    Sample code

    This section shows sample code for encoding and decoding speech using the Speex API. The commands can be used to encode and decode a file by calling:  
    % sampleenc in_file.sw | sampledec out_file.sw 
    where both files are raw (no header) files encoded at 16 bits per sample (in the machine natural endianness).

    sampleenc.c

    sampleenc takes a raw 16 bits/sample file, encodes it and outputs a Speex stream to stdout. Note that the packing used is NOT compatible with that of speexenc/speexdec.

    #include <speex/speex.h>
    #include <stdio.h>
    /*The frame size in hardcoded for this sample code but it doesn't have to be*/
    #define FRAME_SIZE 160
    int main(int argc, char **argv)
    {
    char *inFile;
    FILE *fin;
    short in[FRAME_SIZE];
    float input[FRAME_SIZE];
    char cbits[200];
    int nbBytes;
    /*Holds the state of the encoder*/
    void *state;
    /*Holds bits so they can be read and written to by the Speex routines*/
    SpeexBits bits;
    int i, tmp;
    /*Create a new encoder state in narrowband mode*/
    state = speex_encoder_init(&speex_nb_mode);
    /*Set the quality to 8 (15 kbps)*/
    tmp=8;
    speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);
    inFile = argv[1];
    fin = fopen(inFile, "r");
    /*Initialization of the structure that holds the bits*/
    speex_bits_init(&bits);
    while (1)
    {
    /*Read a 16 bits/sample audio frame*/
    fread(in, sizeof(short), FRAME_SIZE, fin);
    if (feof(fin))
    break;
    /*Copy the 16 bits values to float so Speex can work on them*/
    for (i=0;i<FRAME_SIZE;i++)
    input[i]=in[i];
    /*Flush all the bits in the struct so we can encode a new frame*/
    speex_bits_reset(&bits);
    /*Encode the frame*/
    speex_encode(state, input, &bits);
    /*Copy the bits to an array of char that can be written*/
    nbBytes = speex_bits_write(&bits, cbits, 200);
    /*Write the size of the frame first. This is what sampledec expects but
    it's likely to be different in your own application*/
    fwrite(&nbBytes, sizeof(int), 1, stdout);
    /*Write the compressed data*/
    fwrite(cbits, 1, nbBytes, stdout);
    }
    /*Destroy the encoder state*/
    speex_encoder_destroy(state);
    /*Destroy the bit-packing struct*/
    speex_bits_destroy(&bits);
    fclose(fin);
    return 0;
    }
    

    sampledec.c

    sampledec reads a Speex stream from stdin, decodes it and outputs it to a raw 16 bits/sample file. Note that the packing used is NOT compatible with that of speexenc/speexdec.

    #include <speex/speex.h>
    #include <stdio.h>
    /*The frame size in hardcoded for this sample code but it doesn't have to be*/
    #define FRAME_SIZE 160
    int main(int argc, char **argv)
    {
    char *outFile;
    FILE *fout;
    /*Holds the audio that will be written to file (16 bits per sample)*/
    short out[FRAME_SIZE];
    /*Speex handle samples as float, so we need an array of floats*/
    float output[FRAME_SIZE];
    char cbits[200];
    int nbBytes;
    /*Holds the state of the decoder*/
    void *state;
    /*Holds bits so they can be read and written to by the Speex routines*/
    SpeexBits bits;
    int i, tmp;
    /*Create a new decoder state in narrowband mode*/
    state = speex_decoder_init(&speex_nb_mode);
    /*Set the perceptual enhancement on*/
    tmp=1;
    speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);
    outFile = argv[1];
    fout = fopen(outFile, "w");
    /*Initialization of the structure that holds the bits*/
    speex_bits_init(&bits);
    while (1)
    {
    /*Read the size encoded by sampleenc, this part will likely be
    different in your application*/
    fread(&nbBytes, sizeof(int), 1, stdin);
    fprintf (stderr, "nbBytes: %d\n", nbBytes);
    if (feof(stdin))
    break;
    /*Read the "packet" encoded by sampleenc*/
    fread(cbits, 1, nbBytes, stdin);
    /*Copy the data into the bit-stream struct*/
    speex_bits_read_from(&bits, cbits, nbBytes);
    /*Decode the data*/
    speex_decode(state, &bits, output);
    /*Copy from float to short (16 bits) for output*/
    for (i=0;i<FRAME_SIZE;i++)
    out[i]=output[i];
    /*Write the decoded audio to file*/
    fwrite(out, sizeof(short), FRAME_SIZE, fout);
    }
    /*Destroy the decoder state*/
    speex_decoder_destroy(state);
    /*Destroy the bit-stream truct*/
    speex_bits_destroy(&bits);
    fclose(fout);
    return 0;
    }
    



     


     




    开源 H323 协议中封装的使用参考代码:


    /*
    * speexcodec.cxx
    *
    * Speex codec handler
    *
    * Open H323 Library
    *
    * Copyright (c) 2002 Equivalence Pty. Ltd.
    *
    * The contents of this file are subject to the Mozilla Public License
    * Version 1.0 (the "License"); you may not use this file except in
    * compliance with the License. You may obtain a copy of the License at
    * http://www.mozilla.org/MPL/
    *
    * Software distributed under the License is distributed on an "AS IS"
    * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
    * the License for the specific language governing rights and limitations
    * under the License.
    *
    * The Original Code is Open H323 Library.
    *
    * The Initial Developer of the Original Code is Equivalence Pty. Ltd.
    *
    * Contributor(s): ______________________________________.
    *
    * $Log: speexcodec.cxx,v $
    * Revision 1.20  2002/12/08 22:59:41  rogerh
    * Add XiphSpeex codec. Not yet finished.
    *
    * Revision 1.19  2002/12/06 10:11:54  rogerh
    * Back out the Xiph Speex changes on a tempoary basis while the Speex
    * spec is being redrafted.
    *
    * Revision 1.18  2002/12/06 03:27:47  robertj
    * Fixed MSVC warnings
    *
    * Revision 1.17  2002/12/05 12:57:17  rogerh
    * Speex now uses the manufacturer ID assigned to Xiph.Org.
    * To support existing applications using Speex, applications can use the
    * EquivalenceSpeex capabilities.
    *
    * Revision 1.16  2002/11/25 10:24:50  craigs
    * Fixed problem with Speex codec names causing mismatched capabilities
    * Reported by Ben Lear
    *
    * Revision 1.15  2002/11/09 07:08:20  robertj
    * Hide speex library from OPenH323 library users.
    * Made public the media format names.
    * Other cosmetic changes.
    *
    * Revision 1.14  2002/10/24 05:33:19  robertj
    * MSVC compatibility
    *
    * Revision 1.13  2002/10/22 11:54:32  rogerh
    * Fix including of speex.h
    *
    * Revision 1.12  2002/10/22 11:33:04  rogerh
    * Use the local speex.h header file
    *
    * Revision 1.11  2002/10/09 10:55:21  rogerh
    * Update the bit rates to match what the codec now does
    *
    * Revision 1.10  2002/09/02 21:58:40  rogerh
    * Update for Speex 0.8.0
    *
    * Revision 1.9  2002/08/21 06:49:13  rogerh
    * Fix the RTP Payload size too small problem with Speex 0.7.0.
    *
    * Revision 1.8  2002/08/15 18:34:51  rogerh
    * Fix some more bugs
    *
    * Revision 1.7  2002/08/14 19:06:53  rogerh
    * Fix some bugs when using the speex library
    *
    * Revision 1.6  2002/08/14 04:35:33  craigs
    * CHanged Speex names to remove spaces
    *
    * Revision 1.5  2002/08/14 04:30:14  craigs
    * Added bit rates to Speex codecs
    *
    * Revision 1.4  2002/08/14 04:27:26  craigs
    * Fixed name of Speex codecs
    *
    * Revision 1.3  2002/08/14 04:24:43  craigs
    * Fixed ifdef problem
    *
    * Revision 1.2  2002/08/13 14:25:25  craigs
    * Added trailing newlines to avoid Linux warnings
    *
    * Revision 1.1  2002/08/13 14:14:59  craigs
    * Initial version
    *
    */

    #include <ptlib.h>

    #ifdef __GNUC__
    #pragma implementation "speexcodec.h"
    #endif

    #include "speexcodec.h"

    #include "h323caps.h"
    #include "h245.h"
    #include "rtp.h"

    extern "C" {
    #include "speex/libspeex/speex.h"
    };


    #define new PNEW

    #define XIPH_COUNTRY_CODE       0xB5  // (181) Country code for United States
    #define XIPH_T35EXTENSION       0
    #define XIPH_MANUFACTURER_CODE  0x0026 // Allocated by Delta Inc

    #define EQUIVALENCE_COUNTRY_CODE       9  // Country code for Australia
    #define EQUIVALENCE_T35EXTENSION       0
    #define EQUIVALENCE_MANUFACTURER_CODE  61 // Allocated by Australian Communications Authority, Oct 2000

    #define SAMPLES_PER_FRAME        160

    #define SPEEX_BASE_NAME "Speex"

    #define SPEEX_NARROW2_H323_NAME    SPEEX_BASE_NAME "Narrow-5.95k{sw}"
    #define SPEEX_NARROW3_H323_NAME    SPEEX_BASE_NAME "Narrow-8k{sw}"
    #define SPEEX_NARROW4_H323_NAME    SPEEX_BASE_NAME "Narrow-11k{sw}"
    #define SPEEX_NARROW5_H323_NAME    SPEEX_BASE_NAME "Narrow-15k{sw}"
    #define SPEEX_NARROW6_H323_NAME    SPEEX_BASE_NAME "Narrow-18.2k{sw}"

    H323_REGISTER_CAPABILITY(SpeexNarrow2AudioCapability, SPEEX_NARROW2_H323_NAME);
    H323_REGISTER_CAPABILITY(SpeexNarrow3AudioCapability, SPEEX_NARROW3_H323_NAME);
    H323_REGISTER_CAPABILITY(SpeexNarrow4AudioCapability, SPEEX_NARROW4_H323_NAME);
    H323_REGISTER_CAPABILITY(SpeexNarrow5AudioCapability, SPEEX_NARROW5_H323_NAME);
    H323_REGISTER_CAPABILITY(SpeexNarrow6AudioCapability, SPEEX_NARROW6_H323_NAME);

    #define XIPH_SPEEX_NARROW2_H323_NAME    SPEEX_BASE_NAME "Narrow-5.95k(Xiph){sw}"
    #define XIPH_SPEEX_NARROW3_H323_NAME    SPEEX_BASE_NAME "Narrow-8k(Xiph){sw}"
    #define XIPH_SPEEX_NARROW4_H323_NAME    SPEEX_BASE_NAME "Narrow-11k(Xiph){sw}"
    #define XIPH_SPEEX_NARROW5_H323_NAME    SPEEX_BASE_NAME "Narrow-15k(Xiph){sw}"
    #define XIPH_SPEEX_NARROW6_H323_NAME    SPEEX_BASE_NAME "Narrow-18.2k(Xiph){sw}"

    H323_REGISTER_CAPABILITY(XiphSpeexNarrow2AudioCapability, XIPH_SPEEX_NARROW2_H323_NAME);
    H323_REGISTER_CAPABILITY(XiphSpeexNarrow3AudioCapability, XIPH_SPEEX_NARROW3_H323_NAME);
    H323_REGISTER_CAPABILITY(XiphSpeexNarrow4AudioCapability, XIPH_SPEEX_NARROW4_H323_NAME);
    H323_REGISTER_CAPABILITY(XiphSpeexNarrow5AudioCapability, XIPH_SPEEX_NARROW5_H323_NAME);
    H323_REGISTER_CAPABILITY(XiphSpeexNarrow6AudioCapability, XIPH_SPEEX_NARROW6_H323_NAME);

    /////////////////////////////////////////////////////////////////////////

    static int Speex_Bits_Per_Second(int mode) {
        void *tmp_coder_state;
        int bitrate;
        tmp_coder_state = speex_encoder_init(&speex_nb_mode);
        speex_encoder_ctl(tmp_coder_state, SPEEX_SET_QUALITY, &mode);
        speex_encoder_ctl(tmp_coder_state, SPEEX_GET_BITRATE, &bitrate);
        speex_encoder_destroy(tmp_coder_state); 
        return bitrate;
    }

    static int Speex_Bytes_Per_Frame(int mode) {
        int bits_per_frame = Speex_Bits_Per_Second(mode) / 50; // (20ms frame size)
        return ((bits_per_frame+7)/8); // round up
    }

    OpalMediaFormat const OpalSpeexNarrow_5k95(OPAL_SPEEX_NARROW_5k95,
                                               OpalMediaFormat::DefaultAudioSessionID,
                                               RTP_DataFrame::DynamicBase,
                                               TRUE,  // Needs jitter
                                               Speex_Bits_Per_Second(2),
                                               Speex_Bytes_Per_Frame(2),
                                               SAMPLES_PER_FRAME, // 20 milliseconds
                                               OpalMediaFormat::AudioTimeUnits);

    OpalMediaFormat const OpalSpeexNarrow_8k(OPAL_SPEEX_NARROW_8k,
                                             OpalMediaFormat::DefaultAudioSessionID,
                                             RTP_DataFrame::DynamicBase,
                                             TRUE,  // Needs jitter
                                             Speex_Bits_Per_Second(3),
                                             Speex_Bytes_Per_Frame(3),
                                             SAMPLES_PER_FRAME, // 20 milliseconds
                                             OpalMediaFormat::AudioTimeUnits);

    OpalMediaFormat const OpalSpeexNarrow_11k(OPAL_SPEEX_NARROW_11k,
                                              OpalMediaFormat::DefaultAudioSessionID,
                                              RTP_DataFrame::DynamicBase,
                                              TRUE,  // Needs jitter
                                              Speex_Bits_Per_Second(4),
                                              Speex_Bytes_Per_Frame(4),
                                              SAMPLES_PER_FRAME, // 20 milliseconds
                                              OpalMediaFormat::AudioTimeUnits);

    OpalMediaFormat const OpalSpeexNarrow_15k(OPAL_SPEEX_NARROW_15k,
                                              OpalMediaFormat::DefaultAudioSessionID,
                                              RTP_DataFrame::DynamicBase,
                                              TRUE,  // Needs jitter
                                              Speex_Bits_Per_Second(5),
                                              Speex_Bytes_Per_Frame(5),
                                              SAMPLES_PER_FRAME, // 20 milliseconds
                                              OpalMediaFormat::AudioTimeUnits);

    OpalMediaFormat const OpalSpeexNarrow_18k2(OPAL_SPEEX_NARROW_18k2,
                                               OpalMediaFormat::DefaultAudioSessionID,
                                               RTP_DataFrame::DynamicBase,
                                               TRUE,  // Needs jitter
                                               Speex_Bits_Per_Second(6),
                                               Speex_Bytes_Per_Frame(6),
                                               SAMPLES_PER_FRAME, // 20 milliseconds
                                               OpalMediaFormat::AudioTimeUnits);


    /////////////////////////////////////////////////////////////////////////

    SpeexNonStandardAudioCapability::SpeexNonStandardAudioCapability(int mode)
      : H323NonStandardAudioCapability(1, 1,
                                       EQUIVALENCE_COUNTRY_CODE,
                                       EQUIVALENCE_T35EXTENSION,
                                       EQUIVALENCE_MANUFACTURER_CODE,
                                       NULL, 0, 0, P_MAX_INDEX)
    {
      PStringStream s;
      s << "Speex bs" << speex_nb_mode.bitstream_version << " Narrow" << mode;
      PINDEX len = s.GetLength();
      memcpy(nonStandardData.GetPointer(len), (const char *)s, len);
    }


    /////////////////////////////////////////////////////////////////////////

    SpeexNarrow2AudioCapability::SpeexNarrow2AudioCapability()
      : SpeexNonStandardAudioCapability(2) 
    {
    }


    PObject * SpeexNarrow2AudioCapability::Clone() const
    {
      return new SpeexNarrow2AudioCapability(*this);
    }


    PString SpeexNarrow2AudioCapability::GetFormatName() const
    {
      return SPEEX_NARROW2_H323_NAME;
    }


    H323Codec * SpeexNarrow2AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_5k95, 2, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    SpeexNarrow3AudioCapability::SpeexNarrow3AudioCapability()
      : SpeexNonStandardAudioCapability(3) 
    {
    }


    PObject * SpeexNarrow3AudioCapability::Clone() const
    {
      return new SpeexNarrow3AudioCapability(*this);
    }


    PString SpeexNarrow3AudioCapability::GetFormatName() const
    {
      return SPEEX_NARROW3_H323_NAME;
    }


    H323Codec * SpeexNarrow3AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_8k, 3, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    SpeexNarrow4AudioCapability::SpeexNarrow4AudioCapability()
      : SpeexNonStandardAudioCapability(4) 
    {
    }


    PObject * SpeexNarrow4AudioCapability::Clone() const
    {
      return new SpeexNarrow4AudioCapability(*this);
    }


    PString SpeexNarrow4AudioCapability::GetFormatName() const
    {
      return SPEEX_NARROW4_H323_NAME;
    }


    H323Codec * SpeexNarrow4AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_11k, 4, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    SpeexNarrow5AudioCapability::SpeexNarrow5AudioCapability()
      : SpeexNonStandardAudioCapability(5) 
    {
    }


    PObject * SpeexNarrow5AudioCapability::Clone() const
    {
      return new SpeexNarrow5AudioCapability(*this);
    }


    PString SpeexNarrow5AudioCapability::GetFormatName() const
    {
      return SPEEX_NARROW5_H323_NAME;
    }


    H323Codec * SpeexNarrow5AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_15k, 5, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    SpeexNarrow6AudioCapability::SpeexNarrow6AudioCapability()
      : SpeexNonStandardAudioCapability(6) 
    {
    }


    PObject * SpeexNarrow6AudioCapability::Clone() const
    {
      return new SpeexNarrow6AudioCapability(*this);
    }


    PString SpeexNarrow6AudioCapability::GetFormatName() const
    {
      return SPEEX_NARROW6_H323_NAME;
    }


    H323Codec * SpeexNarrow6AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_18k2, 6, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNonStandardAudioCapability::XiphSpeexNonStandardAudioCapability(int mode)
      : H323NonStandardAudioCapability(1, 1,
                                       XIPH_COUNTRY_CODE,
                                       XIPH_T35EXTENSION,
                                       XIPH_MANUFACTURER_CODE,
                                       NULL, 0, 0, P_MAX_INDEX)
    {
      // FIXME: To be replaced by an ASN defined block of data
      PStringStream s;
      s << "Speex bs" << speex_nb_mode.bitstream_version << " Narrow" << mode;
      PINDEX len = s.GetLength();
      memcpy(nonStandardData.GetPointer(len), (const char *)s, len);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNarrow2AudioCapability::XiphSpeexNarrow2AudioCapability()
      : XiphSpeexNonStandardAudioCapability(2) 
    {
    }


    PObject * XiphSpeexNarrow2AudioCapability::Clone() const
    {
      return new XiphSpeexNarrow2AudioCapability(*this);
    }


    PString XiphSpeexNarrow2AudioCapability::GetFormatName() const
    {
      return XIPH_SPEEX_NARROW2_H323_NAME;
    }


    H323Codec * XiphSpeexNarrow2AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_5k95, 2, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNarrow3AudioCapability::XiphSpeexNarrow3AudioCapability()
      : XiphSpeexNonStandardAudioCapability(3) 
    {
    }


    PObject * XiphSpeexNarrow3AudioCapability::Clone() const
    {
      return new XiphSpeexNarrow3AudioCapability(*this);
    }


    PString XiphSpeexNarrow3AudioCapability::GetFormatName() const
    {
      return XIPH_SPEEX_NARROW3_H323_NAME;
    }


    H323Codec * XiphSpeexNarrow3AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_8k, 3, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNarrow4AudioCapability::XiphSpeexNarrow4AudioCapability()
      : XiphSpeexNonStandardAudioCapability(4) 
    {
    }


    PObject * XiphSpeexNarrow4AudioCapability::Clone() const
    {
      return new XiphSpeexNarrow4AudioCapability(*this);
    }


    PString XiphSpeexNarrow4AudioCapability::GetFormatName() const
    {
      return XIPH_SPEEX_NARROW4_H323_NAME;
    }


    H323Codec * XiphSpeexNarrow4AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_11k, 4, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNarrow5AudioCapability::XiphSpeexNarrow5AudioCapability()
      : XiphSpeexNonStandardAudioCapability(5) 
    {
    }


    PObject * XiphSpeexNarrow5AudioCapability::Clone() const
    {
      return new XiphSpeexNarrow5AudioCapability(*this);
    }


    PString XiphSpeexNarrow5AudioCapability::GetFormatName() const
    {
      return XIPH_SPEEX_NARROW5_H323_NAME;
    }


    H323Codec * XiphSpeexNarrow5AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_15k, 5, direction);
    }


    /////////////////////////////////////////////////////////////////////////

    XiphSpeexNarrow6AudioCapability::XiphSpeexNarrow6AudioCapability()
      : XiphSpeexNonStandardAudioCapability(6) 
    {
    }


    PObject * XiphSpeexNarrow6AudioCapability::Clone() const
    {
      return new XiphSpeexNarrow6AudioCapability(*this);
    }


    PString XiphSpeexNarrow6AudioCapability::GetFormatName() const
    {
      return XIPH_SPEEX_NARROW6_H323_NAME;
    }


    H323Codec * XiphSpeexNarrow6AudioCapability::CreateCodec(H323Codec::Direction direction) const
    {
      return new SpeexCodec(OpalSpeexNarrow_18k2, 6, direction);
    }


    /////////////////////////////////////////////////////////////////////////////

    const float MaxSampleValue   = 32767.0;
    const float MinSampleValue   = -32767.0;

    SpeexCodec::SpeexCodec(const char * name, int mode, Direction dir)
      : H323FramedAudioCodec(name, dir)
    {
      PTRACE(3, "Codec\tSpeex mode " << mode << " " << (dir == Encoder ? "en" : "de")
             << "coder created");

      bits = new SpeexBits;
      speex_bits_init(bits);

      if (direction == Encoder) {
        coder_state = speex_encoder_init(&speex_nb_mode);
        speex_encoder_ctl(coder_state, SPEEX_GET_FRAME_SIZE, &encoder_frame_size);
        speex_encoder_ctl(coder_state, SPEEX_SET_QUALITY,    &mode);
      } else {
        coder_state = speex_decoder_init(&speex_nb_mode);
      }
    }

    SpeexCodec::~SpeexCodec()
    {
      speex_bits_destroy(bits);
      delete bits;

      if (direction == Encoder)
        speex_encoder_destroy(coder_state); 
      else
        speex_decoder_destroy(coder_state); 
    }


    BOOL SpeexCodec::EncodeFrame(BYTE * buffer, unsigned & length)
    {
      // convert PCM to float
      float floatData[SAMPLES_PER_FRAME];
      PINDEX i;
      for (i = 0; i < SAMPLES_PER_FRAME; i++)
        floatData[i] = sampleBuffer[i];

      // encode PCM data in sampleBuffer to buffer
      speex_bits_reset(bits); 
      speex_encode(coder_state, floatData, bits); 

      length = speex_bits_write(bits, (char *)buffer, encoder_frame_size); 

      return TRUE;
    }


    BOOL SpeexCodec::DecodeFrame(const BYTE * buffer, unsigned length, unsigned &)
    {
      float floatData[SAMPLES_PER_FRAME];

      // decode Speex data to floats
      speex_bits_read_from(bits, (char *)buffer, length); 
      speex_decode(coder_state, bits, floatData); 

      // convert float to PCM
      PINDEX i;
      for (i = 0; i < SAMPLES_PER_FRAME; i++) {
        float sample = floatData[i];
        if (sample < MinSampleValue)
          sample = MinSampleValue;
        else if (sample > MaxSampleValue)
          sample = MaxSampleValue;
        sampleBuffer[i] = (short)sample;
      }

      return TRUE;
    }



    VC++ 中使用 API的 char 单字节压缩代码示例:

    Encoding and decoding problem in speex 1.0.4

    Subject: Encoding and decoding problem in speex 1.0.4
    List-id: speex-dev.xiph.org
    Hi,
                I am using the speex 1.0.4 library from Windows.
                I have posted my problem before but didn't get a solution. I am doing an
                VOIP project
                in which i am recording sound and streaming it to the peer. I wanted to
                encode and decode
                wav files that brought me to this site.
                I am recording sound in the following format:-
                m_WaveFormatEx.wFormatTag          = WAVE_FORMAT_PCM;
                m_WaveFormatEx.nChannels           = 1;
                m_WaveFormatEx.wBitsPerSample      = 8;
                m_WaveFormatEx.cbSize              = 0;
                m_WaveFormatEx.nSamplesPerSec      = 8000;
                m_WaveFormatEx.nBlockAlign         = 1;
                m_WaveFormatEx.nAvgBytesPerSec     = 8000;
                The recording is as follows :-
                When the buffer(size = 2000 bytes) gets filled with sound data a
                function with the body shown
                below is called.
                LPWAVEHDR lpHdr = (LPWAVEHDR) lParam;
                if(lpHdr->dwBytesRecorded==0 || lpHdr==NULL)
                return ERROR_SUCCESS;
                ::waveInUnprepareHeader(m_hRecord, lpHdr, sizeof(WAVEHDR));
                Here lpHdr->lpData contains the audio data in a character array.
                Now here I want to use Speex codec for encoding the data so the encoding
                function is
                called (I am thankful to Tay YueWeng for the function).
                char *encode(char *buffer, int &encodeSize)
                {
                char   *encodedBuffer = new char[RECBUFFER/2];            /*
                RECBUFFER = 2000 */
                short   speexShort;
                float  speexFloat[RECBUFFER/2];
                void   *mEncode       = speex_encoder_init(&speex_nb_mode);
                /*Initialization of the structure that holds the bits*/
                speex_bits_init(&mBits);
                // Convert the audio to a short then to a float buffer
                int    halfBufferSize = RECBUFFER/2;
                for (int i = 0; i < halfBufferSize; i++)
                {
                memcpy(&speexShort, &buffer[i*2], sizeof(short));
                speexFloat[i]     = speexShort;
                }
                // Encode the sound data using the float buffer
                speex_bits_reset(&mBits);
                speex_encode(mEncode, speexFloat, &mBits);
                encodeSize            = speex_bits_write(&mBits, encodedBuffer,
                RECBUFFER/2);
                /*Destroy the encoder state*/
                speex_encoder_destroy(mEncode);
                /*Destroy the bit-stream struct*/
                speex_bits_destroy(&mBits);
                // Return the encoded buffer
                return encodedBuffer;
                }
                Here i noticed that though my captured audio data is 2000 bytes the
                compressed form is
                always 38 bytes. In the speexFloat array above i get values in the range
                -32767 to +32767.
                Is it correct. Also after calling the 'speex_encode' function the first
                160 values in the
                input float array i.e. speexFloat is changed (why does it happen?Is
                anything abnormal).
                Further after calling the above function for testing I decode the
                returned encoded data
                immediately by calling the decoding function shown bellow :-
                char *decode (char *buffer, int encodeSize)
                {
                char *decodedBuffer   = new char[RECBUFFER];
                short speexShort;
                float speexFloat[RECBUFFER/2];
                // Decode the sound data into a float buffer
                void  *mDecode        = speex_decoder_init(&speex_nb_mode);
                /*Initialization of the structure that holds the bits*/
                speex_bits_init(&mBits);
                int    halfBufferSize = RECBUFFER/2;
                speex_bits_reset(&mBits);
                speex_bits_read_from(&mBits, buffer, encodeSize);
                speex_decode(mDecode, &mBits, speexFloat);
                // Convert from float to short to char
                for (int i = 0; i < halfBufferSize; i++)
                {
                speexShort = speexFloat[i];
                memcpy(&decodedBuffer[i*2], &speexShort, sizeof(short));
                }
                /*Destroy the decoder state*/
                speex_encoder_destroy(mDecode);
                /*Destroy the bit-stream truct*/
                speex_bits_destroy(&mBits);
                // Return the buffer
                return decodedBuffer;
                }
                After decoding using the above function only the first 160 values in the
                decodedBuffer array is
                changed. i.e i encoded an 2000 byte audio data to get a 38 byte encoded
                audio data. On decoding
                the 38 byte audio data i get an decompressed 160 byte data. I don't
                understand whats going
                wrong. I checked all the messages posted in this newsgroup and did'nt
                find an answer so i am
                posting this code hoping that it gets solved soon.  Thanks in advance.
                
  • 相关阅读:
    mongodb实验
    hbase实验
    oracle数据库的安装
    3ds的fbi无线传输
    2018年春阅读计划---阅读笔记6
    2018年春阅读计划---阅读笔记5
    2018年春阅读计划---阅读笔记4
    php写一个简单的计算器
    2018年春阅读计划---阅读笔记3
    脚本之家的一个meta的帖子
  • 原文地址:https://www.cnblogs.com/myitm/p/2160691.html
Copyright © 2020-2023  润新知