• Audio Coding


     

    http://www.ece.umassd.edu/Faculty/acosta/ICASSP/ICASSP_1996/html/ic96s212.htm

    Audio Coding

    Chair: Marina Bosi, Dolby Labs

     Home


    A bi-dimensional coding scheme applied to audio bitrate reduction

    Authors:

    Laurent MainardCCETT (France) 
    Michel LeverCCETT (France)

    Volume 2, Page 1017

    Abstract:

    In this paper we present an audio bidimensional encoding scheme. Taking advantage of a new complex filterbank, and of a regular lattice associated with a new hexagonal projection kernel, this scheme provides each step of the encoder and of the decoder with fast algorithms, which keeps the overall complexity low. Moreover variable or fix length encodings are available a without look-up table. Result show a very good quality at 80 kbit/s for monophonic signals, and a significant improvement with respect to normalized algorithms of a similar complexity.

    Acrobat PDF file of scanned paper:  ic961017.pdf

    Acrobat PDF file of original paper:  ic961017.pdf

     TOP



    Audio Coding with a Dynamic Wavelet Packet Decomposition Based on Frequency-Varying Modulated Lapped Transforms

    Authors:

    Marcus PuratTechnical University of Berlin (Germany) 
    Peter NollTechnical University of Berlin (Germany)

    Volume 2, Page 1021

    Abstract:

    Optimum time-frequency decompositions are very useful in audio coding applications, because the signal energy can be maximally concentrated even for the wide variety of audio signal characteristics. Moreover, this signal representation is particularly well suited for a perceptual weighting of the quantization noise. The well known tree structure of cascaded 2-channel filterbanks allows a very flexible optimization, leading to a signal adaptive, dynamic wavelet packet decomposition. A major drawback of this technique are strong spectral side lobes which produce clearly audible aliasing in perceptual coders. In this paper we present a new dynamic wavelet packet decomposition, based on modulated lapped transforms, which allows the same flexibility while avoiding the disadvantage mentioned above. We propose a scheme for low bit rate audio coding that efficiently exploits the high energy concentration. This new codec yields excellent audio quality at about 55 kb/s for monophonic signals.

    Acrobat PDF file of scanned paper:  ic961021.pdf

    Acrobat PDF file of original paper:  ic961021.pdf

    Sound files associated with this paper.

    •  0479_a.wav Piano signal prior to encoding-decoding
    •  0479_c.wav Male speech signal prior to encoding-decoding
    •  0479_e.wav Triangle signal prior to encoding-decoding
    •  0479_b.wav Piano signal following encoding-decoding (54kb/s)
    •  0479_d.wav Male speech signal following encoding-decoding (64kb/s)
    •  0479_f.wav Triangle signal following encoding-decoding (64kb/s)

     TOP



    A Test of MPEG Using Time-inverted Spoken Audio

    Authors:

    Thomas McLaughlinLibrary of Congress (U.S.A.) 
    John CooksonLibrary of Congress (U.S.A.) 
    Lloyd RasmussenLibrary of Congress (U.S.A.)

    Volume 2, Page 1025

    Abstract:

    We excerpted a 20 second sample from aDAT-mastered talking book segment and coded it at 32 and 48 kbit/sec using MPEG I, layer 3. We also coded the same segment at 80 kbit/sec using MPEG I, layer 2. We then coded a time-inverted version of the material in the same way. After decoding, we put the inverted segments back into normal sequence and compared them with the corresponding segments coded in normal temporal order. We did the comparison by means of an ABX test with volunteer listeners. Naive listeners were unable to reliably distinguish between material coded in normal temporal order and the same material coded in inverted order. Trained listeners could reliably make the distinction in layer 3 at 32 and 48 kbit/sec but not in layer 2 at 80 kbit/sec.

    Acrobat PDF file of scanned paper:  ic961025.pdf

     TOP



    Extension and Complexity Reduction of TwinVQ Audio Coder

    Authors:

    Takehiro MoriyaNTT Human Interface Laboratories (Japan) 
    Naoki IwakamiNTT Human Interface Laboratories (Japan) 
    Kazunaga IkedaNTT Human Interface Laboratories (Japan) 
    Satoshi MikiNTT Human Interface Laboratories (Japan)

    Volume 2, Page 1029

    Abstract:

    This paper proposes two novel techniques for TwinVQ (Transform domain Weighted Interleave VQ) high-quality audio coding scheme for lower rates than 64 kbit/s. One is an extension of the weighted interleave technique to time and input channel domains as well as the frequency domain. The other is an efficient representation scheme of the spectral envelope by means of a interpolated square root LPC (Linear Predictive Coding) spectrum.

    Acrobat PDF file of scanned paper:  ic961029.pdf

    Acrobat PDF file of original paper:  ic961029.pdf

     TOP



    Minimising the Effects of Subband Quantisation of the Time Domain Aliasing Cancellation Filter Bank

    Authors:

    Conrad JakobRoyal Melbourne Institute of Technology (Australia) 
    Alan BradleyRoyal Melbourne Institute of Technology (Australia)

    Volume 2, Page 1033

    Abstract:

    The effect of the quantisation of filter bank subbands has been analysed by incorporating quantisation noise models into the Time Domain Aliasing Cancellation (TDAC) filter bank. We have found expressions for the reconstruction error of the quantised TDAC system in terms of several signal correlated components, and an uncorrelated component. These expressions allow easy identification of subjectively annoying errors, and provide the framework for a subjective optimisation of the quantisation process. Research has been carried out on alternative quantiser models and methods of quantiser-compensation.

    Acrobat PDF file of scanned paper:  ic961033.pdf

     TOP



    Speech Analysis and Coding Using a Multi-Resolution Sinusoidal Transform

    Authors:

    David V. AndersonGeorgia Institute of Technology (U.S.A.)

    Volume 2, Page 1037

    Abstract:

    The sinusoidal transform, as developed by Quatieri and McAulay, provides a sparse representation for speech signals by taking advantage of psychoacoustic masking. The currently reported work takes the sinusoidal transform one step further by considering the frequency resolution abilities of the human auditory system in more detail. The new transform is based on the wavelet principle of variable resolution in time/frequency analysis. Specifically, a sinusoidal transform is developed which uses quadrature mirror filter (QMF) banks to obtain better time resolution at high frequencies and better frequency resolution at low frequencies. This naturally provides a perceptually improved allocation of the sinusoids. The new transform matches the human auditory system better than its predecessor and it also matches speech signals well, both fricative sounds and voiced speech. The QMF based ST is then shown to be equivalent to a more efficient FFT based implementation.

    Acrobat PDF file of scanned paper:  ic961037.pdf

    Acrobat PDF file of original paper:  ic961037.pdf

    Sound files associated with this paper.

    •  0809_a.wav Unprocessed speech
    •  0809_b.wav Processed speech with 60 msec window, 4 bands, limit of 8 peaks per band
    •  0809_c.wav Processed speech with 40 msec window, 4 bands, limit of 12 peaks per band

     TOP



    Audio coding using the wavelet packet transform and a combined scalar-vector quantization

    Authors:

    Simon BolandQueensland University of Technology (Australia) 
    Mohamed DericheQueensland University of Technology (Australia)

    Volume 2, Page 1041

    Abstract:

    This paper investigates a hybrid scalar-vector quantization scheme for coding high quality audio signals. A Wavelet Packet Transform (WPT) is used to decompose the audio signal into frequency bands slightly finer than the critical band divisions. A masking model computation is then used as input to the hybrid quantization scheme, where scalar quantization is used for coding the subbands from 0-5.5 kHz, and vector quantization is used for coding the subbands from 5.5-22 kHz. The performance of the proposed coder is assessed from Segmental Signal-to-Noise Ratios (SNR) and the perceived quality for a number of signals. The perceived quality is determined from informal comparisons between the uncoded signals at the original bitrate of 705 kb/s, and the same signals coded with (1) the proposed coder at 80 kb/s, (2) a coder using only scalar quantization at both 128 kb/s and 96 kb/s, and (3) the MPEG layer III coder at 64 kb/s. The comparisons indicate that very good coder quality is possible with the proposed coder at bitrates of approximately 80 kb/s. This represents a saving of about 16 kb/s over full scalar quantization with a similar quality. Further bitrate reduction with the proposed coder is possible by entropy coding of the scalar quantized transform coefficients and the VQ indices.

    Acrobat PDF file of scanned paper:  ic961041.pdf

     TOP



    Low Bit Rate High Quality Audio Coding with Combined Harmonic and Wavelet Representations

    Authors:

    Khaled N. HamdyUniversity of Minnesota (U.S.A.) 
    Murtaza AliUniversity of Minnesota (U.S.A.) 
    Ahmed H. TewfikUniversity of Minnesota (U.S.A.)

    Volume 2, Page 1045

    Abstract:

    In this paper, we describe a novel high quality audio coding method using adaptive signal representation, based on sinusoidal and wavelet analysis of signals. First, we perform a harmonic analysis of the signal to remove strong periodic structures or tones from the signal. Then we carry out wavelet analysis that are useful in tracking the transients of the signal. These transients are then removed from the wavelet coefficients. The remaining coefficients have broadband noise-like structure. Since this method separates out tones (sinusoids), transients, and broadband noise, we may use tonal, noise, and temporal masking information to individually encode the tones and the wavelet coefficients. Our experiments suggest that this method yields a nominal bit rate of 1 bit/sample for high quality audio compression.

    Acrobat PDF file of scanned paper:  ic961045.pdf

    Acrobat PDF file of original paper:  ic961045.pdf

     TOP



    A High Performance Software Implementation Of MPEG Audio Encoder

    Authors:

    Manoj KumarIBM T.J. Watson Research Center (U.S.A.) 
    Mohammad ZubairIBM T.J. Watson Research Center (U.S.A.)

    Volume 2, Page 1049

    Abstract:

    The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard defines the decoding process and also the syntax of the coded bitstream. However, there is room for having different implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of five improvement over a straightforward implementation on the IBM PowerPC, Model 250.

    Acrobat PDF file of scanned paper:  ic961049.pdf

    Acrobat PDF file of original paper:  ic961049.pdf

     TOP



    Audio Compression At Low Bit Rates Using A Signal Adaptive Switched Filterbank

    Authors:

    Deepen SinhaAT&T Bell Laboratories (U.S.A.) 
    James D. JohnstonAT&T Bell Laboratories (U.S.A.)

    Volume 2, Page 1053

    Abstract:

    A perceptual audio coder typically consists of a filterbank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filterbank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates ( < 64 kbps for stereo). We propose a novel switched filterbank scheme which switches between a MDCT and a wavelet filterbank based on signal characteristics. A tree structured wavelet filterbank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.

    Acrobat PDF file of scanned paper:  ic961053.pdf

    Acrobat PDF file of original paper:  ic961053.pdf

     TOP


  • 相关阅读:
    android Dialog 底部弹出
    L2-023. 图着色问题(暴力)
    L2-023. 图着色问题(暴力)
    L2-022. 重排链表
    L2-022. 重排链表
    L2-020. 功夫传人(dfs+vector 或者 邻接矩阵+dij+优先队列)
    L2-020. 功夫传人(dfs+vector 或者 邻接矩阵+dij+优先队列)
    愿天下有情人都是失散多年的兄妹(bfs)
    愿天下有情人都是失散多年的兄妹(bfs)
    循环赛日程表(分治)
  • 原文地址:https://www.cnblogs.com/gaozehua/p/2431449.html
Copyright © 2020-2023  润新知