VideoToolBox硬解码H264

对于H264和VideoToolBox还不熟悉的童鞋一定下先看下

我们先看下demo效果

整体的流程是：

从摄像头获取视频裸数据->使用VideoToolBox编码成H264->使用VideoToolBox解码成image->使用openGLES绘制展示

demo中上面的是实时摄像头的预览图，下方的是解码后渲染图，gif不能显示完整，可以下载demo代码看详细效果。

VideoToolbox解码主要流程

编码部分可以看上一篇的文章。这里重点讲一下解码流程

1 2	//解码nalu裸数据 -(void) decodeNalu:(uint8_t *)frame size:(uint32_t)frameSize

传入nalu的裸数据和数据frameSize

我们知道

一个原始的NALU单元结构如下
[StartCode][NALU Header][NALU Payload]三部分。

StartCode，是一个NALU单元开始，必须是00 00 00 01 或者00 00 01。

对于VideoToolBox的NALU前四个字节并不是StartCode，而是FrameSize，所以这里我们写入frameSize到前四个字节中

//填充nalu size 去掉start code 替换成nalu size
    uint32_t nalSize = (uint32_t)(frameSize - 4);
    uint8_t *pNalSize = (uint8_t*)(&nalSize);
    frame[0] = *(pNalSize + 3);
    frame[1] = *(pNalSize + 2);
    frame[2] = *(pNalSize + 1);
    frame[3] = *(pNalSize);

再次之前我们先读取NALU Header判断类型

1 2	//获取nalu type int nalu_type = (frame[4] & 0x1F);

区分关键帧还是sps和pps，已经B，P其他帧

switch (nalu_type)
    {
        case 0x05:
            //关键帧
            if([self initH264Decoder])
            {
                pixelBuffer = [self decode:frame size:frameSize];
            }
            break;
        case 0x07:
            //sps
            _spsSize = frameSize - 4;
            _sps = malloc(_spsSize);
            memcpy(_sps, &frame[4], _spsSize);
            break;
        case 0x08:
        {
            //pps
            _ppsSize = frameSize - 4;
            _pps = malloc(_ppsSize);
            memcpy(_pps, &frame[4], _ppsSize);
            break;
        }
        default:
        {
            // B/P frame
            if([self initH264Decoder])
            {
                pixelBuffer = [self decode:frame size:frameSize];
            }
            break;
        }

这里我们可以看到读到关键帧或者B/P其他视频帧的时候我们才去initH264Decoder，初始化VideoToolBox解码，这是因为sps和pps里面包含了视频宽高，以及解码相关参数，必须先获取到sps和pps构建CMVideoFormatDescriptionRef，才能初始化VideoToolBox解码session

封装CMVideoFormatDescriptionRef

@interface H264DecodeTool(){
    
    //解码session
    VTDecompressionSessionRef _decoderSession;
    
    //解码format 封装了sps和pps
    CMVideoFormatDescriptionRef _decoderFormatDescription;
    
    //sps & pps
    uint8_t *_sps;
    NSInteger _spsSize;
    uint8_t *_pps;
    NSInteger _ppsSize;
    
}

 const uint8_t* const parameterSetPointers[2] = { _sps, _pps };
    const size_t parameterSetSizes[2] = { _spsSize, _ppsSize };
    
    //用sps 和pps 实例化_decoderFormatDescription
    OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,
                                                                          2, //参数个数
                                                                          parameterSetPointers,
                                                                          parameterSetSizes,
                                                                          4, //nal startcode开始的size
                                                                          &_decoderFormatDescription);

初始化VideoToolBox Session

NSDictionary* destinationPixelBufferAttributes = @{
                                                           (id)kCVPixelBufferPixelFormatTypeKey : [NSNumber numberWithInt:kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange],
                                                           //硬解必须是 kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
                                                           //                                                           或者是kCVPixelFormatType_420YpCbCr8Planar
                                                           //因为iOS是  nv12  其他是nv21
                                                           (id)kCVPixelBufferWidthKey : [NSNumber numberWithInt:1280],
                                                           (id)kCVPixelBufferHeightKey : [NSNumber numberWithInt:960],
                                                           //这里宽高和编码反的 两倍关系
                                                           (id)kCVPixelBufferOpenGLCompatibilityKey : [NSNumber numberWithBool:YES]
                                                           };

        
        
        VTDecompressionOutputCallbackRecord callBackRecord;
        callBackRecord.decompressionOutputCallback = didDecompress;
        callBackRecord.decompressionOutputRefCon = (__bridge void *)self;
        status = VTDecompressionSessionCreate(kCFAllocatorDefault,
                                              _decoderFormatDescription,
                                              NULL,
                                              (__bridge CFDictionaryRef)destinationPixelBufferAttributes,
                                              &callBackRecord,
                                              &_decoderSession);
        VTSessionSetProperty(_decoderSession, kVTDecompressionPropertyKey_ThreadCount, (__bridge CFTypeRef)[NSNumber numberWithInt:1]);
        VTSessionSetProperty(_decoderSession, kVTDecompressionPropertyKey_RealTime, kCFBooleanTrue);

iOS下硬解码只可以使用:
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange:YUV420P
kCVPixelFormatType_420YpCbCr8Planar:NV12

YUV420P和NV12是两种不同的图像数据格式，有兴趣的童鞋可以自行查阅下资料。

需要注意的是kCVPixelBufferWidthKey，kCVPixelBufferHeightKey这里指定的宽和高，和实际视频的宽高是反的，两倍关系。
我们录制的视频是640 * 480，所以这里传入1280和960

解码回调

//解码回调
static void didDecompress( void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration ){
    CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
    
    //持有pixelBuffer数据，否则会被释放
    *outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
    H264DecodeTool *decoder = (__bridge H264DecodeTool *)decompressionOutputRefCon;
    if (decoder.delegate)
    {
        [decoder.delegate gotDecodedFrame:pixelBuffer];
    }
}

这里retain一次回调的pixelBuffer，也就是图像裸数据。然后回调。

渲染

渲染部分使用了APPLE的一个demo Layer，渲染CVImageBufferRef，原理是使用opengl。这块后面在OpenGL专题再做详解，这里不再累述。

总结

H264编码是很复杂的，但是由于框架的封装，事实上平时我们项目中使用的现有API硬件编解码也还是很方便的。理解了流程和原理是最重要的。当然demo仅仅是实现了基本编解码，很多异常处理，例如退到后台，session报错异常，前台恢复等在实际商业项目中是必然需要考虑的。

demo下载地址:iOS-VideoToolBox-demo
也来练习下吧。