468
Views
0
CrossRef citations to date
0
Altmetric
Articles

View synthesis for the mixed-resolution MVD format

, , , &
Pages 27-31 | Received 28 Sep 2012, Accepted 04 Feb 2013, Published online: 09 Apr 2013

Abstract

The depth information in multiview 3D video sequences is mainly utilized to synthesize new video sequences at virtual viewpoints. Decode depth maps, however, have typical coding artifacts, such as blocking noise, according to the compression ratio. Furthermore, the depth maps are even resized to reduce the sizes of the compressed bitstreams. Defects in depth maps degrade the quality of the synthesized view. This paper proposes a segmentation-based edge-preserving filter and a depth-gradient-based view synthesis algorithm for mixed-resolution multiview videos, to improve the quality of the synthesized views.

1. Introduction

To transfer multiview 3D video data, depth information is often accompanied by color texture data to generate new video sequences at random viewpoints. This is because it is impractical to transfer all the information required to cover a wide range of viewing angles to describe a 3D scene for an arbitrary multiview display. An approach to increasing the efficiency of communicating multiview video data is sending a limited number of texture information along with additional auxiliary data to generate additional video sequences at any required view position.

One example of such auxiliary data is depth information. Depth maps are utilized to provide the 3D coordinates of a pixel at a virtual viewpoint in a view synthesis algorithm. Then, it is possible to generate a scene at any desired viewpoint. The multiview video plus depth (MVD) 3D format is composed of color texture videos and the corresponding depth maps. The MVD format has a limited number of views, usually two or three. The MVD format is actively used as a default format for test sequences for 3D video compression tools in MPEG Citation1.

When an image is compressed by block-based compression algorithms such as H.264/AVC, the decoded outputs have a blocky noise at a lower bit rate. Such compression noise in depth maps results in the degradation of the synthesized virtual views because a noisy depth map provides inaccurate disparity information. Furthermore, depth maps in 3D multiview video sequences for use in view synthesis sometimes have a reduced resolution for efficient compression. Such mixed-resolution MVD format with three views is illustrated in .

Figure 1. Mixed-resolution MVD format with low-resolution depth maps.

Figure 1. Mixed-resolution MVD format with low-resolution depth maps.

The bottom line of this approach is that depth information is used only for virtual view synthesis and not for the display itself. By reducing the resolution of the depth maps, the bit rate of the compressed MVD bit stream can be further reduced without sacrificing the quality of the synthesized views. Contrary to what is expected, however, blurred depth edges in a low-resolution depth map can degrade the synthesized views, especially around the object boundaries. shows the effect of a low-resolution depth on view synthesis.

Figure 2. Color image warped into a target viewpoint from the (a) left side and (b) right side.

Figure 2. Color image warped into a target viewpoint from the (a) left side and (b) right side.

The depth map is generated in a low resolution not only for compression efficiency but also to physically limit the sensor resolution when it is captured from a range sensor such as a TOF camera or a Z-cam, or for complexity when it is estimated from multiview images. Hence, it is important to recover the quality of the depth information by bringing back the original resolution before view synthesis. The characteristics of depth maps are different from those of color videos. Generally, they have less texture and more prominent edges than color videos. Furthermore, it is especially important to preserve and enhance the edges around object depth maps to improve the synthesized-view quality.

There have been several approaches to improving depth data for view synthesis via edge-preserving filtering, such as the use of a joint bilateral filter to correct misaligned depth boundaries while suppressing the depth estimation noise. In Citation2 Citation3, a joint bilateral filter was applied during down/upsampling to alleviate the aliasing artifact and to improve the temporal stability. Oh et al. Citation4 proposed a depth reconstruction filter based on a bilateral filter to recover the object boundaries distorted by coding errors. A shape-adaptive filter that utilizes the texture information of the color video was proposed in Citation5. In Citation6, extended joint bilateral filtering was applied to upsampling in a spatial domain, and motion-compensated frame interpolation was used for upsampling in the temporal domain.

This paper proposes an edge-preserving depth map filter and a depth-gradient-based view synthesis algorithm for improving the quality of the synthesized views. The proposed filter is based on a segmentation approach of the local block of a depth map. It can reduce the coding noise, such as the blocking artifact, in a depth map at the same time to enhance the blurred edges during upsampling. Compared with the previous approaches, which reduce the synthesis artifact by correcting the depth edge locations, credibility was considered in the view synthesis process in the proposed approach, and a credibility map that provides weight parameters for pixel mixing was defined.

2. Filtering by segmentation

An edge-preserving filter based on the segmentation of the local block of a depth map is applied to reduce the noise in a depth map and at the same time to enhance the blurred edges in it. In the proposed approach, unlike in the joint bilateral filtering approaches, the corresponding color information is not used in the filtering process. Color information is helpful in correcting wrong edges, which are caused by the depth estimation process, but additional frame memory and complexity is needed if hardware implementation is to be considered. Moreover, the effect of edge correction is insufficient for the graphics depth data, which have an exact edge alignment with color, or for the estimated data, which usually have extended boundaries compared to color. The objective of the proposed filtering is to enhance the blurred edges and to reduce the coding noise rather than to correct the wrong edges.

In a local block with a small area, the depth data are composed of depth data with a small variance if they are not around an edge. Around the edges, due to the lack of texture in a depth map, it can be supposed that the local block can be separated into a small number of subregions, as shown in .

Figure 3. Filtering by local depth segmentation.

Figure 3. Filtering by local depth segmentation.

The segmentation filter H can be expressed as follows:

where M is a set of representative values of the subregions of a local block around the filtered pixel. Set M can be calculated using any clustering algorithm, such as the k-means clustering algorithm Citation7, as follows:
The segmentation filter employs a simple clustering algorithm in a local region around a filtered depth pixel. Depth maps without noise usually have less texture and clear edges around the objects. Within a small local block, the depth map can be clearly divided into a small number of subregions. The subroutines of the proposed segmentation filter consist of local block segmentation, subregion merging, and pixel substitution routines.

The segmentation filter first divides a local block into a given number of subregions by applying a simple clustering algorithm, such as the k-means algorithm. The representative value of each subregion may be a mean pixel value, as used in the proposed algorithm. As the filter divides a local region into a given number of subregions around the edges, it can preserve and enhance the blurred edges. Furthermore, as the proposed segmentation filter is able to merge subregions according to the differences between the adjacent subregions, it is possible to reduce the noise around the edges without blurring the object edges.

When the local block is segmented into two subregions using the k-means clustering algorithm, the number of subclasses is fixed, and the local block is always divided into two regions. This may cause artifacts in the depth maps by enhancing the irregularity even in a flat region. The irregularity may be from the depth estimation process or the measuring process.

Therefore, the filter merges the separated subregions into one subregion if the subregions have similar representative values. The absolute difference between the representative values is compared with a given threshold. The threshold depends on the noise level of a depth map, which is in sum closely related to the quantization parameter of the encoding process when it is compressed.

In the proposed filter, the segmentation algorithm can be applied on a 1D local block vertically and horizontally as well as two-dimensionally. Through this separable operation, the computational complexity can be reduced, as illustrated in . To further reduce the computational complexity, the 1D segmentation filter can be applied only in the horizontal direction, as used in this report. This is a reasonable option for view synthesis, using only the horizontal disparity.

Figure 4. 1D separable segmentation filter.

Figure 4. 1D separable segmentation filter.

3. Depth-gradient-based view synthesis

In Citation8 Citation9, an approach to borrowing high-frequency components from the adjacent high-resolution depth images was studied. In , however, there are no high-resolution depth maps that can be used for high-frequency compensation. Instead, it can be noticed that the boundaries inside the solid-line rectangles show clear edges, unlike the boundaries in the dotted-line rectangles in .

This is because in the solid-line region, the foreground objects occlude the background regions during the warping process, and as a result, the boundaries of the fore ground objects are preserved. Meanwhile, in the dotted-line region, the background regions are disoccluded and leave holes in the corresponding regions. The idea behind the depth-gradient-based view synthesis algorithm is to make use of such an observation. The algorithm imposes more weights on the left-or right-side of boundary pixels, which are well preserved in a warped input image for pixel blending. This can be performed by estimating the directionality of the boundaries. The directionality of the boundaries in the input images is estimated by computing the gradient of the depth maps. The process is summarized in , where the gradient is obtained using the Sobel filter.

Figure 5. Edge detection and depth gradient estimation.

Figure 5. Edge detection and depth gradient estimation.

When the filter is applied, the edges rising from the left to the right side generate positive values, and the ones falling from the left to the right side produce negative values. These values are normalized between 0 and 1 to constitute a weight or credibility map. The credibility is a set of weights calculated from the preceding procedures. Before applying the filter, low-pass filtering can be applied to control the smoothness of the output gradient values. To maintain the consistency of the positive or negative sign of the gradient values for both the left- and right-side inputs, it is necessary to adjust the direction of the filtering operation. An example of the resulting credibility map is shown in . Then, pixel blending can be implanted using the following pixel blending equation:

where P V is the pixel value of a target viewpoint, W is the weight parameter from the credibility map, and P L and P R are the pixel values from the left- and right-side input images. W is made by adding two credibility maps, which are computed from the edge maps of the left and right depth information after warping to the target viewpoint. P L and P R are also the warped results of the left and right input images.

Figure 6. Credibility map computed from gradient maps.

Figure 6. Credibility map computed from gradient maps.

The proposed depth-gradient-based view synthesis method is highly dependent on the accuracy of the edge gradient information of depth. If the reconstructed depth maps are coded at low bitrates, the blocky artifact can affect the edge detection process. At that point, the segmentation filter described in Section 3 is useful for suppressing such coding noise and for eliminating unintended depth edges.

4. Experiment results

shows the synthesis results from VSRS 3.5 and the proposed method. For VSRS configuration, the 1D parallel synthesis mode was used, with the ViewBlending option on, which calculates the mean of the left and right warped images to come up with the synthesis result. In the proposed method, two warped images are blended using the proposed credibility map. An image synthesized using VSRS 3.5 with bilinear depth upsampling is shown in . Unlike in , the artifacts are barely noticeable in the synthesized image from the proposed algorithm in .

Figure 7. View synthesis results from (a) VSRS 3.5 and (b) depth-gradient-based view synthesis.

Figure 7. View synthesis results from (a) VSRS 3.5 and (b) depth-gradient-based view synthesis.

To show the effect of the proposed segmentation filter on the view synthesis process, a set of three view video sequences were encoded with four QPs and were subsequently decoded using the MPEG3DVAVC-compatible Test Model (3DV ATM) version 0.4. The test conditions were the same as the common test conditions in Citation1.

The decoded output was upsampled through bilinear interpolation in the test model software. The output was then filtered through the proposed segmentation. shows the experiment setup and the PSNR computation method. Virtual views were synthesized at six points between all the three views, using depth maps filtered using the proposed filter. The PSNR of the filtered and unfiltered synthesized views were computed with respect to the synthesized views from the uncompressed original color texture and the depth maps before the encoding process. The BD bitrate Citation10 method was employed for the comparison of the filtered and unfiltered depth maps.

Figure 8. Experiment setup and PSNR computation.

Figure 8. Experiment setup and PSNR computation.

shows the results of the coding and synthesis experiment. Seven MPEG test sequences (Poznan Hall2, Poznan Street, Undo Dancer, GT Fly, Kendo, Balloons, and Newspaper) were used. The results show that the BD rate gain in terms of the synthesized PSNR was −4.32% on average.

Table 1. Coding and synthesis results for the proposed algorithm

5. Conclusion

An approach of view synthesis for mixed-resolution multiview videos was proposed. Depth information has the characteristics of having less texture and clear edges compared to color images, and most of the synthesis artifacts arise in the object boundaries. A segmentation-based filter was applied to enhance the depth edges, and the ringing artifacts around the object boundaries were reduced through view synthesis, considering the edge gradient information. The effectiveness of the proposed algorithm was proved by coding and synthesis experiments using the MPEG reference software.

References

  • D.R. Heiko Schwarz, Common Test Conditions for 3DV Experimentation, ISO/IEC JTC1/SC29/WG11 Doc. N12745, 2012.
  • Gangwal , O. P. and Berretty , R.-P. M. 2009 . Depth Map Post-Processing for 3D-TV . Int. Conf. on Consumer Electronics , : 1 – 2 .
  • Riemens , A. K. , Gangwal , O. P. , Barenbrug , B. and Berretty , R.-P. M. 2009 . Multi-Step Joint Bilateral Depth Upsampling . SPIE , 7257 ( VCIP )
  • Oh , K. J. , Yea , S. , Vetro , A. and Ho , Y. S. 2009 . IEEE Signal Process. Lett. , 16 ( 9 ) : 747 – 750 . (doi:10.1109/LSP.2009.2024112)
  • Ekmekcioglu , E. , Mrak , M. , Worrall , S. T. and Kondoz , A. M. 2009 . Electron. Lett. , 45 ( 7 ) : 353 – 354 . (doi:10.1049/el.2009.3682)
  • Choi , J. , Min , D. , Ham , B. and Sohn , K. 2009 . Spatial and Temporal Up-Conversion Technique for Depth Video . IEEE Int. Conf. Image Processing , : 3525 – 3528 .
  • MacQueen , J. B. Some Methods for Classification and Analysis of Multivariate Observations . Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability . pp. 281 – 297 .
  • S.S. Lee, S. Lee, J.J. Lee, and H.C. Wey, View Synthesis for Mixed Resolution Multiview 3D Videos, 3DTV-CON, 1, 2011.
  • Lee , S. S. , Lee , S. , Wey , H. C. and Park , D. S. 2012 . Virtual View Interpolation at Arbitrary View Points for Mixed Resolution 3D Videos . Proc. SPIE , 8288
  • G. Bjontegaard, Calculation of Average PSNR Differences between RD-Curves, VCEG Contribution VCEGM33, 2001.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.