Zhao Chun1
Jeon Byeungwoo1
-
(Department of Electrical and Computer Engineering, Sungkyunkwan University / Suwon
16419, Korea
{zhaochun83, bjeon}@skku.edu
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Light field representation, Light field coding, All-in-focus image, Depth map, Focal stack reconstruction
1. Introduction
Light field (LF) cameras can capture light coming at all directions from every point
of a scene [1]. The rich data make it possible to realize various applications [2,3] for refocusing, depth estimation, viewing angle change, three-dimensional (3D) object
reconstruction, etc. With the recent explosive interest in implementing and improving
augmented reality (AR) and virtual reality (VR) systems [4], demand is increasing for rich information to provide more realistic visual experiences.
While the light field image is one of the most important content sources, its data
require much more storage space or incur high transmission costs. The collection and
management of such large amounts of LF data is not easy for many practical applications,
and therefore, efficient representation and compression with low computational requirements
is essential for practical light field data storage, transmission, and display [5]. LF data have a lot of redundancy since the data will capture image information from
different viewpoints of the same scene [6,7].
While the developments for representation and compression of LF data have so far concentrated
on how to maximally compress the data in general, very little attention has been given
to compression with special emphasis on keeping certain selected functionalities from
being greatly affected. Among the different application requirements listed in Table
1, it may be desirable for a certain functionality to be less affected by compression
than the others. For example, a smart phone [8,9] in daily casual use may only need the refocusing function for post-processing. In
this regard, the motivation of this paper differs from the many existing compression
approaches in that we design a representation and coding scheme for light field images
that keeps the refocusing functionality of LF images as faithfully as possible at
relatively low computational complexity.
The rest of the paper is organized as follows. We briefly review related work in Section
2. Section 3 describes the proposed representation and coding scheme in detail. Experiment
results are given in Section 4, and Section 5 concludes the paper.
Table 1. Light Field Image Coding with Emphasis on Selected Functionality.
Functionality
|
Coding with special emphasis on a certain functionality
|
Refocusing
|
Compression for refocusing can generate refocused images quite well with compressed
data
|
Viewing Angle Change
|
Compression of the view angle can generate different images quite well with compressed
data
|
Exposure Adjust
|
Compression of exposure adjustments can generate different images quite well with
compressed data
|
2. Related Work
Recently, many researchers have worked on advanced representation and coding techniques
to reduce redundancy in light field images. Some work provided comprehensive evaluation
of LF image coding schemes after grouping them into two main coding strategies [10-13]. The first strategy relates to the international standards in JPEG Pleno Part 1 (Framework)
[14] and Part 2 (LF coding) [15]. They support MuLE [16] and WaSP [17] as coding modes. Standardization of the JPEG LF image coding framework was described
in [18], and its 4D-Transform coding solution was explained in [19]. The core framework of the second strategy compresses LF data using the High Efficiency
Video Coding (HEVC) scheme by forming multiple views of light field images into one
pseudo-video sequence. Chen et al. [20] proposed a disparity-guided sparse coding scheme for light field data based on structural
key sub-aperture views. Jiang et al. [21] developed an LF compression scheme using a depth image-based view synthesis technique
in which a small subset of views is compressed using HEVC inter-coding tools, and
an entire light field is reconstructed using the subset. Jiang and colleagues [22] introduced another LF compression scheme based on homographic low-rank approximation
in which the LF views are aligned by homography and then compressed using HEVC. Han
et al.~[23] compressed a pseudo-video sequence consisting of central sub-aperture images and
a sequence consisting of residual images between the central image and adjacent images
using HEVC.
Additionally, we noted studies on converting a light field to a new representation
before encoding. Le Pendu et al. [24] used light field data for Fourier disparity layer (FDL) representation under which
the root image is encoded with FDL layers. This technique was shown to provide higher
coding performance than JPEG-based MuLE [16] and WaSP [17], which belong to the first category of LF coding schemes. Therefore, in a performance
evaluation of our proposed method, Le Pendu et al.’s FDL-based scheme [24] was one of the anchors for comparison. Duong et al. [25] proposed representing LF data in a focal stack (FS) in order to compress the given
LF data as a pseudo-video sequence using HEVC. This compression scheme was specifically
designed with the refocusing application in mind, showing that about 50% of the amount
is saved by compressing focal stack data consisting of sampled refocusing images instead
of compressing a pseudo-video sequence formed with sub-aperture views. Thus, the encoding
scheme with the FS [25] was also taken for comparison.
In this paper, we keep the refocus functionality from being affected by compression,
as it is in [24] and [25], but in a different way. We represent light field data in the form of one single
all-in-focus (AIF) image and its depth map, both of which are compressed using the
well-known HEVC compression technique. The proposed scheme not only covers the full
refocus range, but also achieves higher compression. Fig. 1 illustrates the proposed scheme together with two well-known anchors of the FDL-based
method [24] and the method compressing the images in a focal stack [25] as a pseudo-video sequence. The proposed representation and compression methods are
shown in Fig. 1(c), and the detailed AIF image rendering and depth map generation are in Fig. 1(d). As illustrated in Fig. 1, a focal stack is generated by shifting and adding, as explained in [25]. Assume there are $K$ refocused images in a focal stack, and the $k$th refocused
image is $I_{k\_ org}\left(x,y\right)$, where its distance from the aperture plane
is $F'=\alpha F$ in which $\alpha =F'/F$ is defined as the relative depth, written
as
where $L^{\left(u,v\right)}$ represents a sub-aperture image at position $\left(u,v\right)$
from the main lens, and ($u\left(1-\frac{1}{\alpha }\right),v\left(1-\frac{1}{\alpha
}\right))$ is a shift offset in the $x,y$ direction.
3. The Proposed Scheme
In this section, we address the proposed representation and compression scheme, which
can keep the refocus functionality as much as possible under compression. Unlike existing
methods that encode sub-aperture image sequences [20,21,23], the focal stack [25], or the hierarchical FDL [24], we first represent light fields as all-in-focus images and a depth map, and then
encode them. During decoding, a focal stack consisting of multiple images having different
focus levels is reconstructed from the compressed all-in-focus image using the depth
map. Fig. 1(c) shows the main structure of the proposed framework consisting of three parts: refocusing
representation, all-in-focus image and depth map generation, and post-focal-stack
reconstruction at the decoder.
3.1 Proposed Representation
The proposed light field representation aims at faithfully maintaining refocusing
functionality during compression. The refocusing functionality refers to how flexibly
and accurately a desired refocused image can be generated. The array of the refocused
image is called the focal stack [26]. However, such a focal stack demands a huge volume of data.
Fig. 1. Different frameworks for light field representation and coding: (a) coding with the FDL model[24]; (b) coding with the focal stack[25]; (c) the proposed scheme with emphasis on the refocusing capability; (d) generation of the all-in-focus image and depth map in the proposed scheme.
In the proposed scheme, the AIF image and the depth map are used to represent the
light field image to be encoded and transmitted for applications that put the emphasis
on the refocusing functionality. The all-in-focus image and the depth map can replace
a focal stack since they are bi-directional (that is, the all-in-focus image and the
depth map can be generated from a focal stack), and the focal stack can be reconstructed
from the all-in-focus image and the depth map as well. Refocused images at any depth
can be generated from the decoded AIF image and the depth map by using a defocusing
filter. These two conversions are used before encoding and after decoding, respectively.
The AIF and the depth map can effectively provide refocusing functionality.
The advantages of the proposed scheme are analyzed below. The first advantage is the
refocus coverage range. Since users may like to refocus at any depth, the refocusing
capability should be able to cover all potential refocusing ranges. Duong et al. [25] represented the light field with a focal stack that includes 24 refocused images
before compression, and thus, the refocusing range is limited to the 24 images. However,
since the AIF and the depth map data in the proposed scheme are encoded and transmitted,
any refocused image can be generated from the decoded AIF image with help from the
depth map by using a defocusing filter. Second, in terms of storage, representation
and compression using sub-aperture images [20,21,23], a focal stack [25], or the hierarchical FDL [24] are much heavier, because the proposed compression deals with only one AIF image
and one gray-level depth map. The third advantage is the generation complexity of
the refocused image at the decoder. Complexity is an important factor in practical
applications. In the FDL [24], the refocused image generation process should convert the Fourier disparity layer
to sub-aperture images. It further calculates shifting slopes and adds all sub-aperture
images for display rendering. The focal stack-based scheme [25] compresses only a few sample depth slices. Thus, pixel-wise interpolation should
be executed among relevant neighboring sample depth slices if the target refocused
depth is not the sampled depth. However, in our case, any refocused image can be generated.
There are in-focus pixels and out-of-focus pixels in one refocused image. The in-focus
pixels are directly obtained from the all-in-focus image, and the out-of-focus pixels
are obtained by defocusing the relevant all-in-focus image using a predefined filter.
Our proposed method demands very low computational complexity.
Fig. 2. Volumetric comparison of light field refocusing representations.
Fig. 3. The proposed difference focus measure with adaptive refinement.
3.2 All-In-Focus Image and Depth Map Generation
To generate the AIF image and the depth map, we investigated several state-of-the-art
methods. There are learning-based depth map estimation algorithms, most of which are
based on a fully convolutional neural network [27-30], and they provide high accuracy but with high complexity. On the other hand, rule-based
depth estimation methods [31] and AIF image rendering methods [34] that utilize a focal stack have relatively low complexity. In this paper, in order
to make the trade-off between accuracy and complexity in the encoding process as a
whole, we utilize the focal stack to render both the all-in-focus image and the depth
map. Therein, we define the focus map, which indicates how well a given pixel is focused.
The degree of focus is measured by a selected focus measure [32,33]. The more in-focus a pixel is, the higher its value in the focus map. Note that in
out-of-focus regions where blurred texture-rich pixels, blurred edges, or artifacts
are statistically abundant, most of the well-known focus measures, such as LAP2 [35], STA2 [36], GRA7 [37], and RDF [31], may suffer from focus measure error due to high variance, which is typically seen
in the in-focus regions. To overcome this problem, a new, very simple focus measure
is proposed, which is shown in Fig. 3 and named the difference focus measure.
The difference between two focus maps, one from focal stack $I_{k\_ org}$ and the
other from guided-filtered focal stack $I_{k\_ GF}$, is designed to counteract any
variance. A difference focus map $F_{k\_ d}$ is defined as
where$~ F_{k\_ org}=FM\left(I_{k\_ org}\right)$ and $F_{k\_ GF}=FM\left(I_{k\_ GF}\right),$
in which $I_{k\_ GF}$ is a smoothed focal stack that maintains the boundary while
smoothing the others by a guided filter $G\left(.\right)$ [38], denoted as $I_{k\_ GF}=G\left(I_{k\_ org},I_{k\_ org}\right)$. $FM$(.) indicates
the focus measure of choice, and in this paper, the ring difference filter [31] is selected owing to its robustness coming from incorporating both local and non-local
characteristics in the filtering window. An example of the difference focus map is
shown in Fig. 4. The first-row images are when the entire image is out-of-focus, for which the proposed
difference focus map shows a correct focus level (=0), whereas the focus maps from
$I_{k\_ org}$ or $I_{k\_ GF}~ $ incorrectly detect the out-of-focus edge area as the
in-focus region.
Additionally, adaptive refinement is applied to the proposed difference focus map,
$F_{k\_ d}$, to more clearly clarify in-focus and out-of-focus regions. The in-focus
region (the white region) in $F_{k\_ d}$ is enhanced; the out-of-focus region (the
black region) in $F_{k\_ d}$ is smoothed with a Gaussian filter to remove occasional
errors caused by noise or artifacts. To avoid gaps, a blending process on $F_{k\_
d}$ and on the one after refinement is executed to generate final focus map $F_{k}$.
To render the all-in-focus image as seen in Fig. 1(d), the best in-focus pixels at each$~ $position are collected. That is, for a pixel
at a position ($x,y)$, its best in-focus pixel value is selected from among $I_{{1_{\_
org}}}\left(x,y\right),\,\,I_{2{\_ _{org}}}\left(x,y\right),$ $\ldots ,\,\,I_{K\_
org}\left(x,y\right)$ by referring to the focus map, $F_{k}\left(x,y\right),$ $k=1,\ldots
,K$. The one giving the maximum focus at position $(x,y)$ from among the $K$ refocused
images is selected as the best in-focus pixel, and its image index, denoted by $k\max
\left(x,y\right)$, is decided as follows:
Fig. 4. An example of the proposed difference focus map: (a) an image in focal stack $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (b) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}$ from image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{org}}$; (c) focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$ from guided-filtered image $\boldsymbol{I}_{\boldsymbol{k}\_ \boldsymbol{GF}}$; (d) the proposed difference focus map $\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{org}}-\boldsymbol{F}_{\boldsymbol{k}\_ \boldsymbol{GF}}$.
Fig. 5. AIF images and depth maps (1st and 2nd rows are comparisons of the rendered AIF images; the 3rd row is a comparison of generated depth maps): (a) Jeon et al.’s method[31]; (b) Chantara and Ho’s method[34]; (c) the proposed difference focus measure with adaptive refinement. GT: ground truth.
The best in-focus pixels at all $\left(x,y\right)$positions are collected to form
the rendered all-in-focus image as described in
Depth map $D$ is a collection of pixel-wise indices to the focal stack images that
indicate the maximum focus:
Fig. 6. The proposed focal stack reconstruction.
A comparison experiment was carried out for the proposed method and two state-of-the-art
methods [31,34] that also utilize a focal stack. The AIF image and depth map result shown in Fig. 5 demonstrate that the proposed method is much closer to ground truth, being cleaner
and with higher contrast, better quality, and fewer artifacts.
3.3 Proposed Focal Stack Reconstruction
At the decoder, the focal stack is reconstructed from the AIF image and its depth
map. The proposed reconstruction method is explained in this section. The number of
images in a focal stack corresponds to the resolution of depth map. Each depth level
corresponds to one image in the focal stack.
In generating refocus image $I_{k\_ est}$, which is focused at the $k$th depth, there
are two cases to consider: in-focus and out-of-focus pixels. For in-focus pixels,
that is, $D\left(x,y\right)=k$, the pixel values are directly available in the AIF
image; for out-of-focus pixels, that is, $D\left(x,y\right)\neq k$, the pixel values
are obtained by defocusing the AIF image with a blur filter where the defocusing strength
depends on the distance between depth $D\left(x,y\right)$ and target depth $k$. When
refocusing at depth level $k$, the estimated $k$th image in focal stack $I_{k\_ est}\left(x,y\right)$
at position $\left(x,y\right)$ is computed as follows:
where $f\left(\sigma \right)$ is a defocusing filter in which Gaussian blur is used,
and * is the convolution operator. A higher value for $\sigma $ indicates a higher
blur strength. Defocusing filter parameter $~ \sigma $ is a function of $\Delta k$,
which is the depth distance between target focus depth $k$ and depth level $D\left(x,y\right)$
at the given pixel position, $\left(x,y\right)$. $~ $
Fig. 6 depicts our method for focal stack reconstruction. The marked rectangles are local
areas at different depth levels. Depending on the related depth level in the depth
map, the defocusing strength of the green rectangle is weak, and the blur strength
of the orange rectangle is strong. How strong or weak is represented by blur parameter
$\sigma $, and therefore, the proper blur parameter is essential in order to generate
the focal stack accurately. We define the difference between the generated pixel in
the focal stack from (7) and the pixel in the original focal stack from (1) and (2) as follows:
Note that a smaller value for $V$ implies higher accuracy from the $\sigma $ value.
Parameter $\sigma $ is a function of $\Delta k$, as shown in (8). To define function $g\left(.\right)$, we first select $N$ pairs of $\left(\Delta
k,\sigma \right)$ values, and then these $N$ pairs are fitted to a linear function,
as shown in Fig. 7(b).
Regarding the $N$ pairs of $\left(\Delta k,\sigma \right)$ values, $\Delta k$ should
be set to cover the range specified by $k=1,2,\ldots ,K-1$. For each $\Delta k$, an
appropriate$~ \sigma $ value is calculated by a full search. The search flowchart
is shown in Fig. 7(a). For example, to estimate pixel $I_{k\_ est}\left(x,y\right)$ with $\Delta k=1$,
an appropriate $\sigma $ value is set as follows: using an initial value, $\sigma
=\sigma _{0}$, calculate $V=V_{0}$ with (7) and (10); set $g=1$; update $\sigma =\sigma +g\times \Delta \sigma $ and calculate $V_{i}$;
if $V_{i}<V_{i-1}$, then keep the sign of $g$ the same as before and update $\sigma
=\sigma +g\times \Delta \sigma $; otherwise, change the sign of $g$ to its opposite,
$g=g\times \left(-1\right)$, and update $\sigma =\sigma +g\times \Delta \sigma $;
keep updating the $\sigma $ value until $V_{i}<V_{THD}$ or $i>I$. Here, $V_{THD}$
is a predefined threshold for a small $V$ value, and $I$ is a predefined number of
iterations. The $N$ pairs are clustered into different groups according to $\Delta
k$, and are then curve-fitted using a linear function model; lastly, the fitted linear
function is presented in Fig. 7(b). This fitting model shows that the higher the value of depth distance $\Delta k$,
the higher the value of defocusing filter strength parameter $\sigma $.
Fig. 7. Decision on the defocusing filter parameter $\boldsymbol{\sigma }$: (a) the search process for $\boldsymbol{\sigma }$ (defocusing filter parameter); (b) linear fitting of defocusing filter parameter $\boldsymbol{\sigma }$.
4. Performance Evaluation
In this section, we compare the proposed method with two state-of-the-art representation
and compression methods: one is Le Pendu’s method [24], which represents a light field image as the Fourier Disparity Layer and encodes
the FDL layers as a pseudo-sequence using HEVC; the other is Duong’s method [25], which converts a light field image to a focal stack, and compresses it as a pseudo-video
sequence using HEVC. In the experiment, the proposed method also employs the HEVC
reference software (HM) version 16.17 [39] for encoding and decoding to keep the same test condition in the two state-of-the-art
methods. The configuration of the encoder is set as follows: the GOP structure is
I-B-B-B, as in [40]; test with the six LF data (I01 to I06) in the JPEG-Pleno dataset [41] (Bikes, Danger de Mort, Flowers, Stone Pillars Outside, Fountain Vincent 2, and Ankylosaurus
and Diplodocus 1) captured with a Lytro Illum camera.
The performance comparison was made in both terms of $PSNR$ of YUV video and refocusing
capability loss due to compression. The $PSNR$ values for each focal stack image were
averaged to obtain a representative PSNR value associated with the LF data. It is
denoted as LF-PSNR and computed as in (10) where $I_{k\_ comp}$ is the k-th reconstructed focal stack image using (7) at the decoder, and $I_{k\_ org}$ is the anchor focal stack image rendered from the
light field data as seen in (1).
The LF-PSNR performance of the proposed method and the two anchor methods are compared
in Fig. 8, calculated as follows:
Fig. 8 demonstrates that our proposed method attained the highest LF-PSNR among the three
methods, especially at low bits per pixel (bpp). For example, for the I01 image, when
bpp was 0.01, the proposed method’s LF-PSNR was 1.24dB higher than FDL [24] and 1.74 dB higher than FS representation and compression [25]. When bpp was 0.02, the proposed method’s LF-PSNR values were 0.34dB higher than
FDL [24] and 0.09dB higher than the FS [25]. With the I01~I06 results, the average LF-PSNR gain was about 2.38dB and 1.60dB higher
than FDL [24] and FS [25], respectively, at bits per pixel less than 0.03 in most cases. In the other methods,
a higher bits per pixel leads to less compression loss in the representation of LF
data transmitted to the encoder; that is, to the Fourier disparity layers in FDL [24] or to the focal stack in FS [25], and thus, the focal stack PSNR increases according to the reduced coding loss in
higher bits per pixel. In our scheme, sent to the encoder are the all-in-focus images
and depth maps from which the focal stacks are reconstructed. While less compression
loss happens at the depth map at higher bits per pixel, unless the accuracy quality
of the estimated depth map is sufficient enough, a consequential PSNR increase in
the focal stack is expected to be limited, even as the bits per pixel get higher.
This explains why the focal stack PSNR performance from the proposed scheme was not
always higher than the other methods with high bits per pixel. It also suggests future
research work for improving the accuracy of the depth map estimation, so our scheme
can keep gaining PSNR at higher bits per pixel as well.
We analyzed the refocusing capability loss, $LF-RL$, evaluated as the ratio of absolute
differences for the two focus maps, $F_{k\_ comp}$ and $F_{k\_ org}$, as calculated
in (12), where RL stands for refocusing loss. $F_{k\_ comp}$ is computed using the reconstructed
focal stack, $I_{k\_ comp}$, that is, $F_{k\_ comp}=FM\left(I_{k\_ comp}\right)$,
and $F_{k\_ org}$ is the focus map of the original (that is, uncompressed) focal stack
image of $I_{k\_ org}$, that is, $~ F_{k\_ org}=FM\left(I_{k\_ org}\right).$ In the
experiment, we set $K=64,$ which was the depth map resolution. For focus measure operator
$FM$ in (3), the proposed difference focus measured in Section 3.2 was applied. The range of
the refocusing capability loss, $LF-RL$, was 0 to 1, where a higher value indicates
higher loss:
Fig. 8. PSNR comparison of the proposed and state-of-the-art FDL representation & compression[24], and FS representation & compression[25].
Fig. 9. Refocusing capability loss ($\boldsymbol{LF}-\boldsymbol{RL}$) comparison of the proposed and state-of-the-art FDL representation & compression[24]and FS representation & compression[25].
Fig. 9 compares the refocusing capability loss ($LF-RL$) for the proposed and the two state-of-the-art
methods. The result shows that the proposed method attained the minimum loss in refocusing
capability at the same compression ratio. For example, at bpp = 0.01, FDL [24], FS [25], and the proposed method had refocusing capability losses of 0.30, 0.35, and 0.16,
respectively. That means the refocusing capability loss under the proposed method
was smaller by 14% and 19% compared to FDL [24] and FS [25], respectively. Thus, in practical applications targeting low transmission speeds
or less storage space, such as in mobile phones or head-mounted display devices, the
proposed method is a good choice.
In our experiment, different bits per pixel are realized with different QP settings
from 17 to 42. Fig. 9 also indicates that the refocusing capability loss was less than 0.2 when bpp≤0.05
(about QP$\leq $32). The refocusing capability is perceived as almost intact when
$LF-RL$≤0.2 according to our internal subjective perceptual evaluation. Thus, coding
with QP$\leq $32 is thought of as an allowable range for practical applications as
far as refocusing functionality is concerned.
5. Conclusion
In this paper we have presented an efficient representation and coding scheme for
light field data designed to pay special attention to keeping the refocusing functionality
as uncompromised as possible. We designed a scheme in which LF data are represented
by an all-in-focus image and a depth map, where the AIF/depth map package is encoded
with HEVC. After decoding, the refocused focal stack is estimated by convoluting the
compressed all-in-focus image with a defocusing function where the strength of the
defocusing filter is controlled according to the desired focus level. Our experiment
results indicated that at the same compression ratio, the proposed representation
and coding strategy had a 2.38 dB average PSNR improvement compared to the state-of-the-art
Le Pendu FDL [24], and a 1.60dB improvement over Duong’s FS representation and coding method [25]. At the decoder, the proposed method had smaller refocusing capability losses at
16.2% and 17.8% lower than the two well-known state-of-the-art methods [24,25]. The proposed representation and coding approach with an all-in-focus image and a
depth map was shown to provide good compression performance while maintaining the
refocusing capability very well.
ACKNOWLEDGMENTS
This research was supported by the Basic Science Research Program through the National
Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2020R1A2C2007673).
REFERENCES
Li H., Guo C., Jia S., 2017, High-Resolution Light-Field Microscopy, in Frontiers
in Optics 2017, OSA Technical Digest (online) (Optica Publishing Group), paper FW6D.3,
Tsai D., Dansereau D. G., Peynot T., Corke P., 2017, Image-based visual servoing with
light field cameras, IEEE Robotics and Automation Letters, Vol. 2, No. 2, pp. 912-919
Dricot A., Jung J., Cagnazzo M., Pesquet B., Dufaux F., Kovács P. T., 2015, Adhikarla,Subjective
evaluation of Super Multi-View compressed contents on high-end light-field 3D displays,
Signal Processing: Image Communication, Vol. 39, pp. 369-385
Vetro A., Yea S., Matusik W., Pfister H., Zwicker M., Mar. 2011, Method and system
for acquiring, encoding, decoding and displaying 3D light fields, U.S. Patent No.
7,916,934.29
Wu G., et al. , 2017, Light field image processing: An overview, IEEE Journal of Selected
Topics in Signal Processing, Vol. 11.7, pp. 926-954
Rerabek M., Bruylants T., Ebrahimi T., Pereira F., Schelkens P., ICME 2016 grand challenge:
Light-field image compression, Call for proposals and evaluation procedure 2016.
Takahashi K., Naemura T., 2016, Layered light-field rendering with focus measurement,
Signal Processing: Image Communication, Vol. 21, No. 6, pp. 519-530
Kim M., et al. , Mobile terminal and control method for the mobile terminal, 2018
Nov.20, US10135963B2
Light Field Selfie Camera for smartphones, Wooptix Company, Wooptix Company
Brites C., Ascenso J., Pereira F., Jan. 2021, Lenslet Light Field Image Coding: Classifying,
Reviewing and Evaluating, in: IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 31, No. 1, pp. 339-354
Viola I., Řeřábek M., Ebrahimi T., 2017, Comparison and evaluation of light field
image coding approaches, IEEE Journal of selected topics in signal processing, Vol.
11, No. 7, pp. 1092-1106
Conti C., Soares L. D., Nunes P., 2020, Dense Light Field Coding: A Survey, IEEE Access,
Vol. 8, pp. 49244-49284
Avramelos V., Praeter J. D., Van Wallendael G., Lambert P., Jun. 2019, Light field
image compression using versatile video coding, in: Proc. IEEE 9th Int. Conf. Consum,
Electron, pp. 1-6
2020, ISO/IEC 21794-1:2020 Information technology - Plenoptic image coding system
(JPEG Pleno) - Part 1: Framework
2021, ISO/IEC 21794-2:2021 Information technology - Plenoptic image coding system
(JPEG Pleno) - Part 2: Light field coding
de Carvalho M. B., Pereira M. P., Alves G., da Silva E. A. B., Pagliari C. L., Pereira
F., et al. , Oct. 2018, A 4D DCT-based lenslet light field codec, in: Proc. 25th IEEE
Int. Conf. Image Process. (ICIP), pp. 435-439
Astola P., Tabus I., Nov. 2018, Hierarchical warping merging and sparse prediction
for light field image compression, in: Proc. 7th Eur. Workshop Vis. Inf. Process.
(EUVIP), pp. 1-6
Astola P., da Silva Cruz L. A., et al. , Jun. 2020, JPEG Pleno: Standardizing a coding
framework and tools for plenoptic imaging modalities, ITU J. ICT Discoveries, Vol.
3, No. 1, pp. 1-15
De Oliveira Alves G., et al. , 2020, The JPEG Pleno Light Field Coding Standard 4D-Transform
Mode: How to Design an Efficient 4D-Native Codec, IEEE Access, Vol. 8, pp. 170807-170829
Chen J., Hou J., Chau L. P., 2017, Light field compression with disparity-guided sparse
coding based on structural key views, IEEE Transactions on Image Processing, Vol.
27.1, pp. 314-324
Jiang X., Le Pendu M., Guillemot C., 2017, Light field compression using depth image
based view synthesis, in: International Conference on Multimedia & Expo Workshops
(ICMEW), IEEE, pp. 19-24
Jiang X., Le Pendu M., Farrugia R. A., Guillemot C., 2017, Light field compression
with homography-based low-rank approximation, IEEE Journal of Selected Topics in Signal
Processing, Vol. 11.7, pp. 1132-1145
Han H., Xin J., Dai Q., Sep. 2018, Plenoptic image compression via simplified subaperture
projection, Pacific Rim Conference on Multimedia, Springer, Cham, pp. 274-284
Le Pendu M., Ozcinar C., Smolic A., 2020, Hierarchical Fourier Disparity Layer Transmission
For Light Field Streaming, in: IEEE International Conference on Image Processing (ICIP),
pp. 2606-2610
Duong V. V., Canh T. N., Huu T. N., Jeon B., Dec. 2019, Focal stack based light field
coding for refocusing applications, Journal of Broadcast Engineering, Vol. 24, No.
7, pp. 1246-1258
Ng R., Levoy M., Brédif M., et al. , 2005, Light field photography with a hand-held
plenoptic camera, Computer Science Technical Report CSTR, Vol. 2, No. 11, pp. 1-11
Shin C., Jeon H. G., Yoon Y., Kweon I. S., Kim S. J., 2018, Epinet: A fully-convolutional
neural network using epipolar geometry for depth from light field images, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4748-4757
Mun J. H., Ho Y. S., 2018, Depth Estimation from Light Field Images via Convolutional
Residual Network, in: Asia-Pacific Signal and Information Processing Association Annual
Summit and Conference (APSIPA ASC), IEEE, Vol. ieee, pp. 1495-1498
Li K., Zhang J., Sun R., Zhang X., Gao J., 2020, EPI-based Oriented Relation Networks
for Light Field Depth Estimation, arXiv preprint arXiv:2007.04538
Zhou W., Zhou E., Yan Y., Lin L., Lumsdaine A., 2019, Learning Depth Cues from Focal
Stack for Light Field Depth Estimation, in: 2019 IEEE International Conference on
Image Processing (ICIP), pp. 1074-1078
Jeon H. G., Surh J., Im S., Kweon I. S., 2019, Ring difference filter for fast and
noise robust depth from focus, IEEE Trans. on Image Processing, Vol. 29, pp. 1045-1060
Pertuz S., Puig D., Garcia M. A., 2013, Analysis of focus measure operators for shape-from-focus,
Pattern Recognition, Vol. 46, No. 5, pp. 1415-1432
Zhao C., Jeon B., 2022, Refocusing Metric of Light Field Image using Region-Adaptive
Multi-Scale Focus Measure, in IEEE Access
Chantara W., Ho Y. S., 2016, Focus Measure of Light Field Image Using Modified Laplacian
and Weighted Harmonic Variance, in: Proceedings of the International Workshop on Advanced
Image Technology, pp. 6-8
Nayar S. K., Nakagawa Y., 1994, Shape from focus[J], IEEE Transactions on Pattern
analysis and machine intelligence, Vol. 16, No. 8, pp. 824-831
Wee C. Y., Paramesran R., 2008, Image sharpness measure using eigenvalues, in: IEEE
9th International Conference on Signal Processing, pp. 840-843
Pech-Pacheco J. L., Cristóbal G., Chamorro-Martinez J., 2000, Diatom autofocusing
in brightfield microscopy: a comparative study, in: Proceedings 15th International
Conference on Pattern Recognition, Vol. 3, pp. 314-317
He K., Sun J., 2015, Fast guided filter, arXiv preprint arXiv:1505.00996
HEVC reference software, HM 16.17.
Canh T. N., Duong V. V., Jeon B., Jan. 2019, Boundary handling for video based light
field coding with a new hybrid scan order, in: Proc. Inter. Workshop on Advanced Image
Tech., pp. 1-4
Řeřábek M., Ebrahimi T., 2016, New Light Field Image Dataset, in: 8th International
Workshop on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal
Author
Chun Zhao received a BS in 2005 and an MS in 2008 from the Department of Electronics
Science and Technology, North University of China, Shanxi, China. She joined the MS
exchange student program in 2008, and started working in 2016 toward a PhD, in the
Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon,
Korea. From 2008 to 2014, she worked in the Research & Design Center, Samsung Electronics,
Korea, on Image/Video Enhancement algorithm development and System on Chip (SOC) design,
implementing an algorithm based on FPGA/Chip and RTL design. Since 2015, she has been
a senior engineer for the Visual Display Business, Samsung Electronics, Korea, where
she worked on practical algorithm development for various displays by analyzing panel
characteristics. Her research interests include multimedia signal processing, panel
color calibration, machine learning, and light field refocusing representation.
Byeungwoo Jeon (M’90, SM’02) received a BS (Magna Cum Laude) in 1985 and an MS
in 1987 from the Department of Electronics Engineering, Seoul National University,
Seoul, Korea, and received a PhD from the School of Electrical Engineering, Purdue
University, West Lafayette, USA, in 1992. From 1993 to 1997, he was in the Signal
Processing Laboratory, Samsung Electronics, Korea, where he worked on research and
development of video compression algorithms, design of digital broadcasting satellite
receivers, and other MPEG-related research for multimedia applications. Since September
1997, he has been at Sungkyunkwan University (SKKU), Korea, where he is currently
a professor. His research interests include multimedia signal processing, video compression,
statistical pattern recognition, and remote sensing. He served as Project Manager
of Digital TV and Broadcasting in the Korean Ministry of Information and Communications
from 2004 to 2006 where he supervised all digital TV-related R&D in Korea. From 2015
to 2016, he was Dean of the College of Information and Communication Engineering,
SKKU. In 2019, he was President of the Korean Institute of Broadcast and Media Engineers.
Dr. Jeon is a senior member of IEEE, a member of SPIE, an associate editor of IEEE
Trans. on Broadcasting and IEEE Trans. on Circuits and Systems for Video Technology.
He was a recipient of the 2005 IEEK Haedong Paper Award from the Signal Processing
Society in Korea, and received the 2012 Special Service Award and the 2019 Volunteer
Award, both from the IEEE Broadcast Technology Society. In 2016, a Korean President’s
Commendation was conferred upon him for his key role in promoting international standardization
for video coding technology in Korea.