音视频开发进阶指南（二）

音视频开发进阶指南（二）
1.

　　在ffplay中音画同步的实现方式其实有三种，分别是：　　

以音频为主时间轴作为同步源；（ffplay的默认方式），ubuntu16下测试偶有卡顿，效果比下面两种的好
```
ffplay 32037.mp4 -sync audio
```
以视频为主时间轴作为同步源；（音频播放会有重复渲染，拖长音）
```
ffplay 32037.mp4 -sync video
```
以外部时钟为主时间轴作为同步源。（偶有卡顿，音频渲染异常，变音）
```
ffplay 32037.mp4 -sync ext
```
2.

首先要声明的是，播放器接收到的视频帧或者音频帧，内部都会有
时间戳（PTS时钟）来标识它实际应该在什么时刻进行展示。实际的对
齐策略如下：比较视频当前的播放时间和音频当前的播放时间，如果视
频播放过快，则通过加大延迟或者重复播放来降低视频播放速度；如果
视频播放慢了，则通过减小延迟或者丢帧来追赶音频播放的时间点。关
键就在于音视频时间的比较以及延迟的计算，当然在比较的过程中会设
置一个阈值（Threshold），若超过预设的阈值就应该做调整（丢帧渲染
或者重复渲染），这就是整个对齐策略。

3.ffmpeg命令对音频、视频文件的渲染、转换，合成，拆分，见P113

4.

　　统一下术语，具体如下。

·　　容器／文件（Conainer/File）：即特定格式的多媒体文件，比如MP4、flv、mov等。

·　　媒体流（Stream）：表示时间轴上的一段连续数据，如一段声音数
据、一段视频数据或一段字幕数据，可以是压缩的，也可以是非压缩
的，压缩的数据需要关联特定的编解码器。
·　　数据帧／数据包（Frame/Packet）：通常，一个媒体流是由大量的
数据帧组成的，对于压缩数据，帧对应着编解码器的最小处理单元，分
属于不同媒体流的数据帧交错存储于容器之中。
·　　编解码器：编解码器是以帧为单位实现压缩数据和原始数据之间
的相互转换的。

5.FFmpeg API的使用

　　5.1 extern“C”的解释

　　作为一种面向对象的语言，C++支持函数的重载，而面向过程的C
语言是不支持函数重载的。同一个函数在C++中编译后与其在C中编译
后，在符号表中的签名是不同的，假如对于同一个函数：
void decode(float position, float duration)
在C语言中编译出来的签名是_decoder，而在C++语言中，一般编译
器的生成则类似于_decode_float_float。虽然在编译阶段是没有问题的，
但是在链接阶段，如果不加extern“C”关键字的话，那么将会链接
_decoder_float_float这个方法签名；而如果加了extern“C”关键字的话，
那么寻找的方法签名就是_decoder。而FFmpeg就是C语言书写的，编译
FFmpeg的时候所产生的方法签名都是C语言类型的签名，所以在C++中
引用FFmpeg必须要加extern“C”关键字。

　　5.2注册协议、格式与编解码器

　　使用FFmpeg的API，首先要调用FFmpeg的注册协议、格式与编解

码器的方法，确保所有的格式与编解码器都被注册到了FFmpeg框架
中，当然如果需要用到网络的操作，那么也应该将网络协议部分注册到
FFmpeg框架，以便于后续再去查找对应的格式。代码如下：
avformat_network_init();
av_register_all();
文档中还有一个方法是avcodec_register_all（），其用于将所有编
解码器注册到FFmpeg框架中，但是av_register_all方法内部已经调用了
avcodec_register_all方法，所以其实只需要调用av_register_all就可以
了。

　　5.3.打开媒体文件源，并设置超时回调

注册了格式以及编解码器之后，接下来就应该打开对应的媒体文件
了，当然该文件既可能是本地磁盘的文件，也可能是网络媒体资源的一
个链接，如果是网络链接，则会涉及不同的协议，比如RTMP、HTTP
等协议的视频源。打开媒体资源以及设置超时回调的代码如下：
AVFormatContext *formatCtx = avformat_alloc_context();
AVIOInterruptCB int_cb = {interrupt_callback, (__bridge void *)(self)};
formatCtx->interrupt_callback = int_cb;
avformat_open_input(formatCtx, path, NULL, NULL);
avformat_find_stream_info(formatCtx, NULL);
　　5.4.寻找各个流，并且打开对应的解码器
上一步中已打开了媒体文件，相当于打开了一根电线，这根电线里
面其实还有一条红色的线和一条蓝色的线，这就和媒体文件中的流非常
类似了，红色的线代表音频流，蓝色的线代表视频流。所以这一步我们
就要寻找出各个流，然后找到流中对应的解码器，并且打开它。
寻找音视频流：
for(int i = 0; i < formatCtx->nb_streams; i++) {
AVStream* stream = formatCtx->streams[i];
if(AVMEDIA_TYPE_VIDEO == stream->codec->codec_type) {
// 视频流
videoStreamIndex = i;
} else if(AVMEDIA_TYPE_AUDIO == stream->codec->codec_type ){
// 音频流
audioStreamIndex = i;
}
}
打开音频流解码器：
AVCodecContext * audioCodecCtx = audioStream->codec;
AVCodec *codec = avcodec_find_decoder(audioCodecCtx ->codec_id);
if(!codec){
// 找不到对应的音频解码器
}
int openCodecErrCode = 0;
if ((openCodecErrCode = avcodec_open2(codecCtx, codec, NULL)) < 0){
// 打开音频解码器失败
}
打开视频流解码器：
AVCodecContext *videoCodecCtx = videoStream->codec;
AVCodec *codec = avcodec_find_decoder(videoCodecCtx->codec_id);
if(!codec) {
// 找不到对应的视频解码器
}
int openCodecErrCode = 0;
if ((openCodecErrCode = avcodec_open2(codecCtx, codec, NULL)) < 0) {
// 打开视频解码器失败
}
　　5.5.初始化解码后数据的结构体
知道了音视频解码器的信息之后，下面需要分配出解码之后的数据
所存放的内存空间，以及进行格式转换需要用到的对象。
构建音频的格式转换对象以及音频解码后数据存放的对象：
SwrContext *swrContext = NULL;
if(audioCodecCtx->sample_fmt ！= AV_SAMPLE_FMT_S16) {
// 如果不是我们需要的数据格式
swrContext = swr_alloc_set_opts(NULL,
outputChannel, AV_SAMPLE_FMT_S16, outSampleRate,
in_ch_layout, in_sample_fmt, in_sample_rate, 0, NULL);
if(!swrContext || swr_init(swrContext)) {
if(swrContext) {
swr_free(&swrContext);
}
}
audioFrame = avcodec_alloc_frame();
}
构建视频的格式转换对象以及视频解码后数据存放的对象：
AVPicture picture;
bool pictureValid = avpicture_alloc(&picture,
PIX_FMT_YUV420P,
videoCodecCtx->width,
videoCodecCtx->height) == 0;
if (!pictureValid){
// 分配失败
return false;
}
swsContext = sws_getCachedContext(swsContext,
videoCodecCtx->width,
videoCodecCtx->height,
videoCodecCtx->pix_fmt,
videoCodecCtx->width,
videoCodecCtx->height,
PIX_FMT_YUV420P,
SWS_FAST_BILINEAR,
NULL, NULL, NULL);
videoFrame = avcodec_alloc_frame();
　　5.6.读取流内容并且解码
打开了解码器之后，就可以读取一部分流中的数据（压缩数据），
然后将压缩数据作为解码器的输入，解码器将其解码为原始数据（裸数
据），之后就可以将原始数据写入文件了：
AVPacket packet;
int gotFrame = 0;
while(true) {
if(av_read_frame(formatContext, &packet)) {
// End Of File
break;
}
int packetStreamIndex = packet.stream_index;
if(packetStreamIndex == videoStreamIndex) {
int len = avcodec_decode_video2(videoCodecCtx, videoFrame,
&gotFrame, &packet);
if(len < 0) {
break;
}
if(gotFrame) {
self->handleVideoFrame();
}
} else if(packetStreamIndex == audioStreamIndex) {
int len = avcodec_decode_audio4(audioCodecCtx, audioFrame,
&gotFrame, &packet);
if(len < 0) {
break;
}
if(gotFrame) {
self->handleVideoFrame();
}
}
}
　　5.7.处理解码后的裸数据
解码之后会得到裸数据，音频就是PCM数据，视频就是YUV数
据。下面将其处理成我们所需要的格式并且进行写文件。
音频裸数据的处理：
void* audioData;
int numFrames;
if(swrContext) {
int bufSize = av_samples_get_buffer_size(NULL, channels,
(int)(audioFrame->nb_samples * channels),
AV_SAMPLE_FMT_S16, 1);
if (!_swrBuffer || _swrBufferSize < bufSize) {
swrBufferSize = bufSize;
swrBuffer = realloc(_swrBuffer, _swrBufferSize);
}
Byte *outbuf[2] = { _swrBuffer, 0 };
numFrames = swr_convert(_swrContext, outbuf,
(int)(audioFrame->nb_samples * channels),
(const uint8_t **)_audioFrame->data,
audioFrame->nb_samples);
audioData = swrBuffer;
} else {
audioData = audioFrame->data[0];
numFrames = audioFrame->nb_samples;
}
接收到音频裸数据之后，就可以直接写文件了，比如写到文件
audio.pcm中。
视频裸数据的处理：
uint8_t* luma;
uint8_t* chromaB;
uint8_t* chromaR;
if(videoCodecCtx->pix_fmt == AV_PIX_FMT_YUV420P ||
videoCodecCtx->pix_fmt == AV_PIX_FMT_YUVJ420P){
luma = copyFrameData(videoFrame->data[0],
videoFrame->linesize[0],
videoCodecCtx->width,
videoCodecCtx->height);
chromaB = copyFrameData(videoFrame->data[1],
videoFrame->linesize[1],
videoCodecCtx->width / 2,
videoCodecCtx->height / 2);
chromaR = copyFrameData(videoFrame->data[2],
videoFrame->linesize[2],
videoCodecCtx->width / 2,
videoCodecCtx->height / 2);
} else{
sws_scale(_swsContext,
(const uint8_t **)videoFrame->data,
videoFrame->linesize,
0,
videoCodecCtx->height,
picture.data,
picture.linesize);
luma = copyFrameData(picture.data[0],
picture.linesize[0],
videoCodecCtx->width,
videoCodecCtx->height);
chromaB = copyFrameData(picture.data[1],
picture.linesize[1],
videoCodecCtx->width / 2,
videoCodecCtx->height / 2);
chromaR = copyFrameData(picture.data[2],
picture.linesize[2],
videoCodecCtx->width / 2,
videoCodecCtx->height / 2);
}
接收到YUV数据之后也可以直接写入文件了，比如写到文件
video.yuv中。
　　5.8.关闭所有资源
解码完毕之后，或者在解码过程中不想继续解码了，可以退出程
序，当然，退出的时候，要将用到的FFmpeg框架中的资源，包括
FFmpeg框架对外的连接资源等全都释放掉。
关闭音频资源：
if (swrBuffer) {
free(swrBuffer);
swrBuffer = NULL;
swrBufferSize = 0;
}
if (swrContext) {
swr_free(&swrContext);
swrContext = NULL;
}
if (audioFrame) {
av_free(audioFrame);
audioFrame = NULL;
}
if (audioCodecCtx) {
avcodec_close(audioCodecCtx);
audioCodecCtx = NULL;
}
关闭视频资源：
if (swsContext) {
sws_freeContext(swsContext);
swsContext = NULL;
}
if (pictureValid) {
avpicture_free(&picture);
pictureValid = false;
}
if (videoFrame) {
av_free(videoFrame);
videoFrame = NULL;
}
if (videoCodecCtx) {
avcodec_close(videoCodecCtx);
videoCodecCtx = NULL;
}
关闭连接资源：
if (formatCtx) {
avformat_close_input(&formatCtx);
formatCtx = NULL;
}
以上就是利用FFmpeg解码的全部过程了，其中包括打开文件流、
解析格式、解析流并且打开解码器、解码和处理，以及最终关闭所有资
源的操作。

6.

　　FFmpeg源码结构
相关阅读:
BZOJ3238 [Ahoi2013]差异 SA+单调栈
 BZOJ2754 [SCOI2012]喵星球上的点名 SA+莫队+树状数组
 Luogu P3251 [JLOI2012]时间流逝期望dp
Luogu P3962 [TJOI2013]数字根 st
BZOJ3619 [Zjoi2014]璀灿光华构造+dfs
Codeforces 990G 点分治+暴力
 express基础项目创建
 Node.js 中使用 ES6 中的 import / export 的方法大全
 bootstrap 辅助工具
 python实现FTP服务器
原文地址：https://www.cnblogs.com/wddx5/p/13373791.html