Software optimization based on H.264 decoder

1 Introduction

H.264 is based on the coding standards previously formulated by ITU-T, ISO / IEC and other organizations. It is the same as most of the current international video compression standards, such as H.264, H.263, MPEG-2 MPEG-4 is a hybrid coding technology that uses block-based discrete cosine transform and quantization. The block-based discrete cosine transform has high compression rate and low computational complexity. Easy to implement and other advantages. H.264 has the following features: 50% less bit rate than H.263 + and MPEG-4 (SP); strong adaptability to channel delay; improved error recovery capability; complexity can be designed in stages to adapt to different Application of complexity; introduce advanced technology, including 4 × 4 integer transform, intra-frame intra-space prediction, 1/4 pixel precision motion estimation new technology brings higher coding ratio, while greatly increasing the complexity of the algorithm. Therefore, H.264 technology has been widely used in high-definition video codec equipment.

Video decoding algorithms such as entropy decoding, inverse quantization, inverse transform, intra prediction, inter-frame luminance interpolation, inter-frame chroma interpolation, and de-blocking filtering are called core modules, reducing the waiting time of these core modules to speed up the work of the decoder Of great significance. In this paper, on the DSP-BF533 platform, using the idea of ​​software pipelining, a new type of optimized design scheme is proposed for collaborative work between software modules.

2 H.264 decoder principle

The H.264 encoder structure system consists of the following parts: network data extraction layer (NAL), VAL buffer, entropy decoding, inverse scan and inverse quantization and inverse transform, inter prediction, intra prediction, image reference frame buffer, and Block filtering, as shown in Figure 1. First, the NAL unit data is obtained from the code stream, and the sequence parameter set, image parameter set, and image data are parsed through RBSP. Store the data and parameters in the VCL buffer, and then entropy decode in the video coding layer (VCL Table). The entropy decoding module (VLD) parses all parameters and reference image indexes, etc., and provides various control information and residual data. Through inverse quantization and inverse change, first convert one-dimensional data into a two-dimensional array or matrix, and then map the sequence of transform coefficient quantization values ​​to corresponding coordinates through the inverse scanning process. There are two modes: inverse zig_zag scanning and inverse field scanning. Then read the data to read and make judgments, intra prediction and inter prediction, then integrate all the data of prediction and inverse transform and inverse quantization, and finally perform block filtering, which can greatly reduce the blockiness caused by prediction and quantization, thus Get better subjective image quality and objective performance. At the same time, the restored image can also be selected as the reference frame for the subsequent processed image.

H.264 decoder principle

3 DSP-BF533 decoder design and optimization

3.1 Decoder software design block diagram

According to the characteristics of DSP-BF533's embedded memory controller (DMA), design a decoding process that integrates DMA, as shown in Figure 2. Add two steps related to DMA to the ordinary decoder. Step 1 is to read data from off-chip memory; step 2 is to output the processed data to off-chip memory.

The specific process can be seen from Figure 2: ① The top data is divided for the next macroblock, and the data before the residual data is divided. At the same time provide intra prediction, reference image index and vector for decoding; ② start DMA to read the segmented data, which also needs to read the decoded reference image index and vector; ③ perform intra prediction on the image data; ④ use the bottom segment read Inverse transform and inverse quantization of the input mapping data; ⑤ Reconstruct the image by filtering; ⑥ Output the image data to the off-chip and on-chip memory through DMA; ⑦ Perform bottom data division on the next macroblock, and then take out the mapping data for download A macroblock decoding uses mocking.

In order to avoid the DSP core waiting for the DMA to read the human data, the decoded data is divided into the top data and the bottom data from the macroblock in advance, the top data includes the data before the residual data, and the remaining data is the bottom data. If the data has been divided in advance when a P frame arrives, then DMA starts. When the DSP core is decoding the current macroblock, the DMA reads in the next macroblock. If the reference data of the current macroblock needs to be used, this data can also be input to the on-chip memory through DMA after decoding. Because the data at the top of the current macroblock has no reference value for the filtering of the next macroblock, the data at the top of these macroblocks is DMA transferred to the external memory. The first macroblock of this design does not enter the decoding process, because a series of reference images and parameters are not set in the initial state, so the first macroblock is only to set the decoder reference image and parameter line initialization, as the next macroblock Use for decoding. Macro block data division and DMA data reading can be performed in parallel during decoding, that is, the parameters of the next macro block can be set and the decoded data can be read when the current macro block is executed, which can reduce the waiting time between each module ,Improve work efficiency. The process that can be executed in parallel is shown as an elliptical box in FIG. 2.

3.2 New algorithm of software pipeline

In many designs, processes such as decoding parameter preparation, decoding, and DMA data output are executed serially in order. The design arranges these three processes for parallel execution, making full use of the parallel execution of DSP-BF533 instructions to reduce software modules. The waiting time between.

The following uses a 4 × 4 macroblock matrix as an example. First, the 4 × 4 matrix is ​​marked with the coordinates of 4 rows and 4 columns, and then the program processing is divided into 5 stages. Their states correspond to 1, 2, 4, 8, 16, for state machine calculation, as listed in Table 1. CAVLC is a process of parsing the read data and providing data such as parameters and reference images for subsequent image integration and reconstruction. Hl_decode is an advanced decoding process, that is, a process of comprehensively reconstructing images according to prepared conditions. DMA is the transfer process of decoded data. Analysis according to Table 1 and Table 2: When a new frame of image arrives, the current state label is 1, and only CAVLC is executed at this time; when the coordinate is x = 1, y = 0, enter the second state, the current State label is 2, CAVLC and hl_decode are executed in parallel; when running to coordinate x = 1, y = 1, enter the third state, label is 4, 3 modules are executed in parallel at the same time; when coordinate y> 4, enter the first The four states are labeled 8, and only hl_decode and DMA are executed in parallel. CAVLC has completed preparations for decoding all macroblocks; then judge x> 0 and enter the fifth state. The label is 16, and only the DMA module is running at this time.

Therefore, when the first macroblock is decoded, it is in state 1, then 4 consecutive macroblocks are in state 2, then 11 consecutive macroblocks enter state 3, then 1 macroblock is in state 4, and the last 3 macroblocks enter the state 5.

If it is assumed that the execution time of CAVLC A, the execution time of hl_decode B, the execution time of DMA C, the total execution time of ordinary algorithms T = 16A + 16B + 16C; the method time proposed in this paper T2 = A + 16B + 3C Reduced program execution time.

4 Test results

Test Claire.cif and Pairs.cif on the DSP-BF533 test platform. From the results of the test analysis: the optimized results improve the decoding rate and meet the real-time application requirements. The results are listed in Table 3.

5 Conclusion

For mobile video terminal applications, according to the characteristics of DSP, a new software pipeline algorithm is proposed to make the cooperation between modules closer, make better use of the free time of program operation, reduce program waiting time, and increase the decoding rate. Experimental testing The program has reached the real-time decoding requirements for CIF images, and is further optimized in the future to achieve higher and more reliable decoding efficiency, making the design based on DSP-BF533 fully scalable from wireless 3G network, digital TV, to IP network , Media storage formats and other different fields.

Led Underground Lamp is a new type of buried decoration lamp with high brightness LED as the light source and LED constant current drive as the driving mode.Widely used in square, park, leisure places, such as outdoor lighting, outdoor as well as park, square, garden greening, lawn, flower beds, pedestrian street, waterfalls, fountains, water etc illumination, add luster for life.

LED Underground Lamp

Product size

LED Underground Lamp


Technical parameters 

Main material

High - pressure cast aluminum, 92% high - light steel screen printing glass

The light colored temperature

Red/Green/Yellow/Amber/Vermilion/Acid blue/3000K/4000K/5000K/6000K

 

Surface treatment

UV  Polyester powder coating

Light distribution device

Reflector lamp cup series:6°

Optical lens series:8°/15°/25°/45°/10×30°/10×60°/20×40°

The light body color

Dark grey

Working voltage

DC24V

Safeguard procedures

Silica gel ring compacted waterproof

way to install

Embedded part

Average lifetime

350mA20000hours,500mA15000hours

rated power

/

Control mode

/

Level of protection

IP65

operating ambient temperature

-25℃~50℃

Working environment humidity

10%~90%







LED Underground Lamp

Led Underground Lamp ,Led Underground Headlamp,Led Cap Lamp Underground Mining,Underground Led Cap Lamps

Jiangsu chengxu Electric Group Co., Ltd , https://www.satislighting.com