ABSTRACT
H.266/VVC adopts a QuadTree-plus-MultiType tree (QTMT) coding-unit (CU) split structure to improve efficiency at the cost of high time complexity. Speeding up VVC coding while minimizing quality degradation is critical for practical applications. We propose predicting the coding depth and split type of an optimally coded 32 × 32 CU (CU32 × 32) to perform only a subset of exhaustive rate-distortion optimization (RDO) operations: (1) To predict the depth of an optimally coded CU32 × 32, we train a convolutional neural network (CNNdepth). CNNdepth outputs a label specifying a depth range subset by which the controller can execute EarlySkip or EarlyTerminate to reduce time complexity. (2) To predict the split type, we train random forest classifiers (RFCtype). The corresponding RDO operations can be omitted if the RFCtype classifies one CU32 × 32 as not of a specified split type. Experiments show that CNNdepth and RFCtype work seamlessly, reducing execution time by up to 69% and 39.16% on average, with a 0.7% increase in BDBR compared to the default VTM-7.0. Additionally, the proposed method yields the highest balanced time reduction rate of 61.5%.
CO EDITOR-IN-CHIEF:
ASSOCIATE EDITOR:
Nomenclature
BTD/QTD | = | Binary-tree/quadtree split depth |
CCS | = | Coding control system |
CUn×n | = | One CU with size n×n |
CUk | = | K-th Coding Unit (32 × 32) |
CUlow/CUhigh | = | One CU from a low/high-resolution video |
CNNdepth(CUk) | = | Depth prediction model for a cuk |
classdepth(CUk) | = | The largest BTD in a cuk |
= | The center feature vector of the | |
DT | = | Decision tree |
ES/ET | = | Early stop/early termination mode |
HTT/VTT | = | Horizontal/vertical ternary tree split |
HBT/VBT | = | Horizontal/vertical binary tree split |
K | = | K: dimension of a label vector (EquationEquation 4 |
ℓi | = | Label of classdepth |
ℓopt | = | The predicted label by cnndepth (cuk) |
m | = | Batch size in the deep learning process. (EquationEquation 4 |
mi | = | The i-th classifier output of rfcmode |
NO | = | Not split mode |
QP | = | Quantizer parameter |
QTMT | = | Quadtree plus multitype tree |
RDC | = | Rate-distortion cost |
RDO | = | Rate-distortion optimization |
RFCtype(CUk) | = | The split type classification model |
rfcqt | = | Binary classifier to perform QT or not |
= | Ratio of helpful classification of type mi | |
UVG266 | = | A VVC encoder based on the Kvazaar HEVC encoder |
VTM | = | VVC Test Model (reference software) |
VVenC | = | Fast VVC encoder implementation |
= |
| |
= | The i-th input feature vector (EquationEquation 4 | |
= | One-hot and smoothed label vectors of | |
= | The k(i) is the index of correct label of | |
∆T and | = | Time reduction rate and its balanced one |
Acknowledgment
The authors express their gratitude to the reviewers for their insightful comments and suggestions, which have greatly improved this manuscript. Special thanks to Mr. Chong-How Fan for his assistance with editing and formatting during the revision process.
Disclosure statement
No potential conflict of interest was reported by the author(s).