Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 14, No. 05, p.705-713

ISSN (online) :

2287-5255

Received : 21 May 2024Revised : 22 June 2024Accepted : 17 July 2024

DOI :

https://doi.org/10.5573/IEIESPC.2025.14.5.705

Regular Paper

A Survey of Multi-Sensor Fusion in SLAM Utilizing Camera, LiDAR or IMU

ChoiYein¹ HwangSung Soo²

(Department of Computer Science and Electrical Engineering, Handong Global University, Pohang, South Korea yein466@gmail.com)
( School of Computer Science and Electrical Engineering, Handong Global University, Pohang, South Korea sshwang@handong.edu )

^* Corresponding Author:Sung Soo Hwang, sshwang@handong.edu

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

This paper explains recent researches that integrate multiple sensors in a field of simultaneous localization and mapping (SLAM), which is of high interest in areas such as autonomous driving. Basic sensors commonly used in SLAM, such as LiDAR, camera, and inertial measurement unit (IMU), have individual drawbacks when used as single sensors. Therefore, in many SLAM researches, these shortcomings are overcome by fusing different sensors to complement each other and enhance performance, aiming for more accurate state estimation. Various methods are available for optimizing the information from each sensor during this process. In this paper, we aim to explain methods for integrating sensors information such as MAP, Kalman Filter, and MLE. Moreover, we will introduce research that utilizes information obtained from sensors. We hope that this paper seeks to understand the types of sensor data fusion methods employed when multiple sensors information is available.

Keywords

SLAM, Sensor-fusion, Multi-sensor SLAM, Visual-inertial SALM, LiDAR-inertial SLAM, LiDARvisual-inertial SLAM

1. Introduction

Robots are complex systems composed of various technologies and widely used applications in areas such as homes, factories, companies, and disaster areas. In these fields, accurately determining a robot's current position is crucial. Whether it's a robotic vacuum cleaner, autonomous vehicle, or disaster response robot, all face the significant challenge of precisely locating themselves on a given map. To address this challenge, a technology being researched is called Simultaneous Localization And Mapping (SLAM). SLAM utilizes information obtained from single or multiple sensors to determine the position of a robot in real-time while simultaneously mapping the surrounding environment to create a map.

To enable SLAM, cameras, LiDAR, or Inertial Measurement Units (IMUs) are commonly utilized as sensors. Cameras used in SLAM implementation have the advantage of being cost-effective, easily detecting and tracking objects, and providing visual semantic information. LiDAR, due to its laser directly interacting with objects, offers high accuracy and performs relatively well even in environments with limited visual features. IMU sensors have the advantage of estimating relative poses.

However, each of these sensors has its own drawbacks when used as a single sensor. Visual sensors are vulnerable to light, such as in light or change of light. Additionally, they may face initialization issues. LiDAR has limitations in the range of its laser, and the accuracy of poses can be affected when the LiDAR device rotates while capturing scenes. Moreover, it may be challenging to use LiDAR in environments with sparse features, such as open areas. IMU also has its drawbacks; when used as a single sensor, it is challenging to optimize the accumulated drift. Consequently, using a single IMU sensor alone may not be sufficient to operate SLAM effectively.

Therefore, many researches integrate individual sensors as a means to complement their respective drawbacks while leveraging their strengths. By doing so, each sensor's limitations are compensated for by others, leading to improved performance. Subsequently, the information from each sensor is utilized to estimate the position, and the estimated position is optimized using graph-based or filter-based methods.

In this paper, we aim to explore researches that integrates various sensors to address the limitations of individual sensors. We investigated the methods used to fuse sensor information in each case. Indeed, each method is not exclusive to sensors it was initially designed for. Just as techniques employed in Camera-IMU systems can be adapted for LiDAR-IMU systems, a choice of a specific method doesn't necessarily restrict its applicability to a single multi-sensor.

The structure of this paper is as follows: In Section 2, we will discuss the maximum A posteriori (MAP) method, maximum likelihood estimation (MLE), Kalman filter for the integration of information primarily used for sensor data fusion. Then, in Section 3, we will examine how information is fused for each sensor used. Finally, in Section 4, we will conclude and wrap up the paper.

Table 1. Pros & cons of sensors.

LiDAR

Camera

Inertial Measurement Unit (IMU)

Pros

• Accuracy

• Reliability

• Cost

• Visual Information

• Relative Pose Estimation

Cons

• Limited Laser Range

• Pose Estimation

• Open Area

• Light

• Texture of Material

• Sparse Feature Environments

• Initialization

• Drift Accumulation

2. Related Work

2.1. Notation

Notations used in this paper are as follows: those denoted by subscripts or superscripts, such as $I$, $L$, $C$ respectively represent frames of IMU, LiDAR, and Camera. Moreover, a state of a robot is denoted by $x$. Additionally, measurements from sensors are represented by $z$. A difference between observed and predicted values is referred to as a residual, denoted as $r$. residuals for each sensor, LiDAR, Camera, and IMU, are expressed as $r_L$, $r_C$, $r_I$ respectively. Feature points obtained from a Camera or LiDAR are denoted as $p$, and if their motion is predicted or estimated, it is represented as $\hat{p}$. Also, a distance between points or between a point and an edge or a plane is denoted as $d$.Lastly, Extrinsic Parameters, composed of rotation and translation information, are represented by $T$.

Table 2. Notation.

2.2. Maximum A Posteriori (MAP)

Before introducing Maximum A Posteriori (MAP), we would like to briefly discuss a state of a robot in this paper. The state of the robot refers to its position and related information, which can vary slightly depending on how researchers define it. For instance, it includes the robot's current position, rotation matrix, translation matrix, and velocity. While there are several methods for estimating such states in SLAM, one of them is MAP.

MAP is a Bayesian-based method that, given measured and predicted values, calculates a probability distribution of a position at a time of acquiring information. It provides a way to determine a probability distribution of a location when measurement and prediction values are available. MAP is a method designed to improve limitations of Maximum Likelihood Estimation (MLE). Likelihood is a probability of an outcome given certain data or models. On the other hand, Posterior is to find a probability that, given a particular result, which model among many data or models can cause this result. To illustrate with a cause-and-effect example, $x$ is a cause and $z$ is an effect, Likelihood is a probability of a result $z$ occurring when a cause $x$ has occurred. Posterior, on the other hand, seeks the probability of the cause $x$ occurring given that the result $z$ has occurred. The MAP method using Posterior takes into account the prior probability of the already known $x$, however the MLE method using Likelihood does not consider a prior probability of $x$. Therefore, MAP is considered to make more accurate judgments compared to MLE. In SLAM, $x$ represents a state of the robot, and $z$ becomes the observed sensor data.

During a state estimation process in MAP, the difference between the calculated and measured values obtained through sensors is computed. In this process, information from multiple sensors is collectively used. The formula for MAP can be expressed as follows:

(1)

$ \hat{x}_{MAP}\!=\!\max_x p(z\mid x)\!=\!\min_x \sum\|z\!-\!h(x)\|\!=\!\min_x\sum\|r\|, $

where $z$ represents an observed value, $h(x)$ denotes a state function, and $\|\cdot\|$ indicates the Mahalanobis norm. Therefore, the estimated state is determined based on a difference between an observed value and a state function. This difference becomes a residual, denoted by $r$. And a goal is to find the state $x$ that minimizes this residual.

When employing the MAP method in SLAM, the residuals of each sensor are obtained and computed. Therefore, based on these, this paper aims to explain using the following formula, where $a$, $b$, and $c$ represent the types of sensors, which can vary depending on the number of sensors involved

(2)

$ \min_x \sum\left\{\|r_a (x)\|^2+\|r_b (x)\|^2+\|r_c (x)\|^2\right\}. $

While additional formulas or values may be introduced or omitted in this equation, this paper aims to proceed with explanations primarily based on Eq. (2).

2.3. Kalman Filter

The Kalman Filter is also a method used to estimate states. This approach involves predicting a state of a robot and calculating the error between a measured values obtained through sensors and the predicted values. In this process, information from multiple sensors can be fused and utilized. The Kalman Filter and MAP utilize the errors between predicted and measured values to estimate a current position of the robot. However, MAP calculates estimated values by maximizing the posterior probability and a result is a probability, while the Kalman filter does not directly calculate the probability, but it calculates the estimate through the mean and covariance of the state variables.

The Kalman Filter is a system that remove noise from uncertain observed values. When calculating a current state using the filter, the state is predicted using a previous motion state and motion input, and then updated with the estimated state obtained from sensor information. This process is repeated to determine the current state. State prediction involves predicting a distribution of the current state using probability distributions calculated from the previous measurement update and the motion input. Measurement update entails updating a posterior probability distribution using the predicted probability distribution of the current state and the probability distribution of the observed values measured at the current position. (refer to Fig. 1).

Fig. 1. Example of filter based SLAM.

There are primarily two types of Kalman Filters: the Linear Kalman Filter and the Extended Kalman Filter. The Linear Kalman Filter is a Bayesian filter for linear system models that follow Gaussian probability distributions. On the other hand, the Extended Kalman Filter addresses a limitation of the Linear Kalman Filter, which cannot be applied to nonlinear systems. It achieves this by extending the Linear Kalman filter using Jacobian matrices to handle nonlinear systems.

In this paper, we focus on the extended Kalman filter (EKF). A basic equation of the extended Kalman filter are as follows:

State Prediction is performed using Eqs. (3) and (4). Here, $f(x$, $\mu)$ represents the nonlinear motion model, and its Jacobian matrix is denoted by $J_f$. The covariance matrix for the state variables is $\overline{P}$, and $P$ represents the covariance matrix from the previous state. Additionally, $P_Q$ represents the covariance for robot motion.

(3)

$ \overline{x}=f(x,\ \mu ),\\ $

(4)

$ \overline{P}=J_fPJ^T_f+P_Q. $

Measurement Update is carried out through Eqs. (5)-(7). Eq. (5) represents the expression for the Kalman gain, where $J_h$ is the Jacobian matrix of the nonlinear observation model $h(x)$, and $P_R$ is the covariance of the observations. In Eq. (6), $z$ represents the measurement, and $(z-h(x))$ denotes the measurement residual. Eq. (6) updates the current state of the robot, while Eq. (7) computes the covariance for the current state.

(5)

$ K=\overline{P}J^T_h{\left(J_h\overline{P}J^T_h+P_R\right)}^{-1}, $

(6)

$ x=\overline{x}+K\left(z-h(x)\right), $

(7)

$ P=\left(I-KJ_h\right)\overline{P}. $

Furthermore, definitions of (3-7) can vary depending on how it is defined, leading to different research outcomes.

3. Multi-Sensor Fusion in SLAM

In most studies, a current position is determined by utilizing an error between a predicted value, calculated based on a previous position, and a measured value obtained through sensors such as cameras or LiDAR, as described in Eq. (2). In each study, pre-integration is performed with IMU, which has a higher frequency compared to cameras or LiDAR, to synchronize it with other sensors. However, specific methods for pre-integration vary across studies. Nonetheless, they all aim to calculate positional changes based on IMU information between specific time intervals. For detailed explanations, please refer to each individual paper. Furthermore, methods for calculating residuals for each sensor vary among studies. Nevertheless, they all involve calculating a difference between an estimated value obtained through computation and a measured value. For detailed explanations of residual calculation processes, please refer to each respective paper.

In particular, in Subsection 3.4, we will explore research that involves post-processing sensor information rather than directly utilizing the information received such as MAP, Kalman Filter, and MLE. Specifically, we aim to delve into research that employs marginalization and plane extraction techniques. While marginalization and plane related studies are briefly mentioned in Subsections 3.1 and 3.2, please refer to Subsections 3.4 for details.

3.1. Integrating Camera and IMU

Using a camera as a single sensor has advantages of being cost-effective and providing visual information. However, it is sensitive to factors such as light conditions and object textures, and it faces challenges with initialization. Therefore, to enhance position estimation accuracy, IMU sensors are often used in conjunction with cameras. Additionally, the information from these sensors is fused using the Maximum A Posteriori (MAP) method, with the basic equation represented by Eq. (2).

Among the studies that utilize both camera and IMU, some examples include VINS-Mono ^[2], VIO-DualProNet ^[3], OpenVINS ^[5], OV Plane ^[8], PL-VINS ^[20], and PLE-SLAM ^[21]. In this context, the primary focus will be on discussing VINS-Mono and OV Plane. VINS-Mono and VIO-DualProNet employ similar forms of MAP equations. The MAP method is used to fuse sensor information utilizing the camera's residual ${{r}}_{{C}}$ with Huber norm ${\rho }({s})$, IMU's residual ${{r}}_{{I}}$, and marginalization ${{r}}_{{m}}$ to estimate the current state. The equation for this is Eq. (8).

(8)

$ \min_x \left\{\|r_m (x)\|^2+\sum\|r_I (x)\|^2 +\sum \rho(\|r_C (x)\|^2 ) \right\}, $

(9)

$ \rho(s) = \begin{cases} 1, & s \geq 1, \\ 2\sqrt{s} - 1, & s < 1, \end{cases} $

where compared to Eq. (2), differences are that the Huber norm is used in front of the camera residual and marginalization is used, not a value obtained directly from sensors.

OV Plane ^[8] was developed on top of Open VINS ^[5]. In this research, sensor information is fused without using MAP. They identify planes and distinguish between standard extended Kalman filter (EKF) and multi-state constraint Kalman filter (MSCKF) depending on specific situations. During a process of obtaining these planes, plane features ${x}_{\pi}$, IMU information states ${x}_{I}$, historical IMU poses ${x}_{H}$, and 3D point features from LiDAR ${x}_{f}$ are combined into a state ${x}$ and applied to the Kalman filter. For details, refer to Eq. (10).

(10)

$ {x}=\left[{{x}}^{{T}}_{{I}}{\ \ }{{x}}^{{T}}_{{H}}{\ \ }{{x}}^{{T}}_{{f}}{\ \ }{{x}}^{{T}}_{{\pi}}\right]. $

In this equation, ${x}$ represents a state in equations (3-7) for the Kalman Filter, and by defining the state as in Eq. (10), information of sensors is fused.

3.2. Integrating LiDAR and IMU

When using LiDAR, a laser directly contacts objects, providing highly accurate and reliable information. However, it has limitations in terms of range and may be challenging to use in open areas. Additionally, pose estimation can be more challenging compared to cameras, as distortions may occur while moving. To address these challenges, IMU sensors are often used alongside LiDAR to improve pose estimation.

Among the many studies in this category, I will describe CLINS ^[6], LIO-Mapping ^[7], LIPS ^[9], IN2LAMA ^[4], and FAST-LIO ^[10]. CLINS utilizes a residual of LiDAR ${{r}}_{{L}}{(}{x}{)}$ along with residuals ${{r}}_{{a}}({x})$ and ${{r}}_{{w}}{(}{x}{)}$ obtained from the acceleration and angular velocity from the IMU.

(11)

$ \min_x \sum\|r_L (x)\| +\sum\|r_a (x)\| +\sum\|r_w (x)\|. $

Unlike in Eq. (2), where the information from the IMU is computed into a single residual, here the residuals of the IMU's acceleration and velocity information are specifically utilized.

In LIO-Mapping, marginalization information, along with residuals from a IMU and LiDAR, are summed individually. Then, unlike Eq. (2) or Eq. (8), it is divided 2.

(12)

$ \min_x\frac{1}{2}\left\{\|r_m (x)\|^2+\sum\|r_I (x)\|^2 +\sum\|r_L (x)\|^2\right\}. $

Next, we would like to examine research that utilizes only maximum likelihood estimation (MLE) instead of maximum A posteriori (MAP). LIPS is research that utilizes MLE to fuse information, incorporating plane information in the process. In LIPS, points are obtained from LiDAR, and planes are constructed using points. Subsequently, predictions are made on how these planes moved between frames, and a difference between a measured position of the plane and a predicted position at the same time is calculated as the residual ${{r}}_{{{\pi}}}({x})$. Additionally, in this research, residuals are calculated by MLE.

(13)

$ \min_x\sum\|r_I (x)\|^2 +\sum\|r_\pi (x)\|^2. $

IN2LAMA minimizes a geometric distance ${{d}}_{{L}}$ (Point-to-Line or Point-to-Plane) associated with LiDAR features, along with the time-shift residual ${{r}}_{{t}}$, residual of accelerometer biases ${{r}}_{{{b}}_{{a}}}$from IMU, and residual of gyroscope biases ${{r}}_{{{b}}_{{\omega }}}$.

(14)

$ \min\left\{\|d_L \|^2+\sum\|r_t \|^2 +\sum\left(\|r_(b_a )\|^2+\|r_{b_\omega}\|^2 \right) \right\}, $

where unlike previous equations, this computes the residual of time-shift and utilizes geometric distance.

In LiDAR-IMU SLAM, there are studies that utilize Kalman Filter to estimate and optimize positions. FAST-LIO is one example of such research. In FAST-LIO, the scan-end projected point ${{p}}^{{{L}}_{{k}}}_{{{f}}_{{j}}}$ is obtained using LiDAR's extrinsic parameters ${{T}}^{{I}}_{{L}}$ and IMU's extrinsic parameters ${{T}}^{{{I}}_{{k}}}_{{{I}}_{{j}}}$, along with local measurement points from LiDAR ${{p}}^{{{L}}_{{j}}}_{{{f}}_{{j}}}$ in the IMU frame.

(15)

$ p_{f_j}^{L_k}=T_{L}^{I ^{-1}} T_{I_j}^{I_k} T_L^I p_{f_j}^{L_j}. $

Furthermore, an estimated feature point in a global frame ${\widehat{{p}}}^{{\kappa }}_{{{f}}_{{j}}}$ is obtained using this projected point ${{p}}^{{{L}}_{{k}}}_{{{f}}_{{j}}}$, LiDAR's extrinsic parameter ${{T}}^{{I}}_{{L}}$, and a calculated IMU extrinsic parameter in a global frame ${\widehat{{T}}}^{{\kappa }}_{{{I}}_{{k}}}$. Subsequently, this point is used in the Kalman Filter and MAP to compute residual.

(16)

$ \hat{p}_{f_j}^{\kappa}= \hat{T}_{I_k}^\kappa T_L^I p_{f_j}^{L_k}. $

In MAP, this residual is used to obtain the ${\kappa }$-th update of ${x}$ in the iterated Kalman Filter, while in the Kalman Filter, it is used to obtain ${\kappa }{+1}$-th update of ${x}$.

Besides these, there are instances where new sensors are incorporated, and the information from these various sensors is fused to estimate the state. For example, there are researches such as LIO-SAM ^[11] and LIRO ^[12].

LIO-SAM incorporates GPS as an additional sensor. Similar to how LiDAR and IMU are used in LOAM ^[13], LIO-SAM utilizes LiDAR and IMU while adding GPS to estimate a robot's state. Another one is LIRO. it incorporates UWB anchors as an additional sensor. This study calculates residuals from LiDAR, IMU, and UWB anchors ${r}_{L}$, ${r}_{I}$, and ${r}_{U}$ similar to other studies, to estimate the robot's position (Eq. (17)). And, it uses cost function.

(17)

$ \sum\|r_L\|^2 +\sum\|r_I\|^2 +\sum\|r_U\|^2. $

3.3. Integrating LiDAR, Camera, and IMU

Using LiDAR with IMU, as well as Camera with IMU, can complement shortcomings of each sensor and lead to better results compared to using a single sensor alone. However, there may still be some shortcomings even with this approach. Therefore, there are SLAM systems that combine LiDAR, Camera, and IMU to address these shortcomings and achieve better results.

The researches to be introduced this time are LE-VINS ^[14], R${}^{2}$LIVE ^[15], LVIO-Fusion ^[18], and LIC-Fusion ^[19]. First, LE-VINS utilizes marginalization like Eq. (8). However, it uses a residual of LiDAR ${{r}}_{{L}}({x})$.

(18)

$ \min_x\left\{\begin{aligned}&\|r_m (x)\|^2+\sum\|r_L (x)\|^2 \\&\quad +\sum\|r_C (x)\|^2 +\sum\|r_I (x)\|^2\end{aligned}\right\}. $

R${}^{2}$LIVE constructs its state using a prior distribution ${{r}}_{{m}}$, an observation matrix ${{H}}$ composed of Jacobians, an error ${{\delta}}{{x}}$ between estimated and measured states, and residuals from each sensor (Eq. (19)).

(19)

$ \min_x\|r_m\|+\sum\|r_L+H\delta x\|^2 +\sum\|r_C+H\delta x\|^2. $

LVIO-Fusion employs a formulation where square of residuals $\|r_I\|^2$, $\|r_C\|^2$, $\|r_L\|^2$ is multiplied by respective weights ${\omega }_I$, ${\omega }_C$, ${\omega }_L$. This utilizes LiDAR, Camera, and IMU. Furthermore, it calculates a Huber norm of residuals for Camera and LiDAR (Eq. (20)).

(20)

$ \sum \omega_I\|r_I\|^2 +\sum \omega_C \rho(\|r_C\|^2) +\sum \omega_L \rho(\|r_L\|^2 ). $

In LiDAR-Camera-IMU SLAM, there are researches that utilize Kalman Filter to fuse sensor information. Among them, LIC-Fusion ^[19] defines a state ${x}$ as a fusion of IMU state ${{x}}^{{T}}_{{I}}$, extrinsics between IMU and camera ${{x}}^{{T}}_{{calib}{\_}{C}}$, extrinsics between IMU and LiDAR ${{x}}^{{T}}_{{calib}{\_}{L}}$, IMU information at specific image times${\ }{{x}}^{{T}}_{{C}}$, and IMU information at specific LiDAR times ${{x}}^{{T}}_{{L}}$.

(21)

$ x=\left[x_I^T~~ x_{calib\_C}^T~~ x_{calib\_L}^T~~ x_C^T~~ x_L^T \right]. $

3.4. SLAM Utilizing Marginalization or Plane Extraction

In addition to using sensor information directly within MAP, MLE, and Kalman Filter, there are methods that involve post-processing the sensor data before estimating the state. Among these methods, we aim to explore the cases where Marginalization or Plane Extraction methods are employed. These methods are not significantly influenced by the type of sensors used.

Marginalization is employed to reduce a computational complexity of an odometry system and improve odometry performance. It is a technique commonly used in sliding-window approaches, where if a second most recent frame in a window is a keyframe, an oldest keyframe is removed, or if it is not a keyframe, the frame is removed. In Fig. 2 (a), if ${\pmb{x}}_{\pmb{4}}$ is a keyframe, ${\pmb{x}}_0$ is removed (Fig. 2 (b)). If it is not a keyframe, ${\pmb{x}}_{\pmb{4}}$ is removed (Fig. 2 (c)).

Fig. 2. Example of marginalization.

Examples of such researches include VINS-Mono ^[2], LIO-Mapping ^[7], LVIO-Fusion ^[18], R${}^{2}$LIVE ^[15], FAST-LIVO ^[16]. Each corresponds to Eqs. (8), (12), (20), and (19), respectively, excluding FAST-LIVO, all of which include the marginalization residual ${{r}}_{{m}}$.

Next, we would like to explore research on estimating states by reconstructing planes. These planes are then integrated with information from other sensors into a state vector ${x}$ for further use. If researches involve detecting planes, as shown in Fig. 3, it identifies planar regions and observes the movement of these planes. The reason for estimating planes is that planes are abundant in a real world and are wide in area, making them suitable as landmarks or references. For further details, refer to ^[8].

Fig. 3. Example of using a plane.

Among these researches are OV Plane ^[8], LIPS ^[9], LVI-Sam ^[17], where the equations for OV Plane and LIPS are Eqs. (10) and (13), respectively. In LVI-SAM, Visual-Inertial Odometry and LiDAR-Inertial Odometry are mixed together. In this case, an IMU not only provides biases but also inertial measurements. Additionally, visual features and LiDAR features are used together for plane extraction and depth registration, and this information is then utilized in Visual-Inertial Odometry.

Table 3. Summary of sensor types used and classification based on integration methods.

Sensors	Title	Integration Method	Additional Remarks
Camera + IMU	VINS-Mono	MAP	Marginalization
	VIO-DualProNet	MAP	Marginalization
	OV Plane	Kalman Filter	Plane
LiDAR + IMU	CLINS	MAP	-
	LIO-Mapping	MAP	Marginalization
	IN2LAMA	MLE	-
	LIPS	MLE	Plane
	FAST-LIO	MAP	Kalman Filter
	LIO-SAM	MAP	-
	LIRO	Cost Func.	-
LiDAR + Camera + IMU	LE-VINS	MAP	Marginalization
	R²LIVE
	Fast-LIVO
	LVI-SAM		Plane
	LVIO-Fusion		Weights
	LIC-Fusion	Kalman Filter	-

Additionally, besides the methods introduced this paper, there are various other approaches. Among them, methods utilizing points and lines, such as PL-VINS ^[20] or PLE- SLAM ^[21]. This method is not introduced because it introduced the method utilizing planes.

4. Conclusion

Simultaneously Localization And Mapping (SLAM) is a technology that scans the surrounding environment, creates a map based on this information, and simultaneously estimates the device's position. This technology is widely recognized, especially in the fields of mobility, robotics, and autonomous driving.

In SLAM, cameras and LiDAR are commonly used as main sensors. Additionally, various sub-sensors are employed to complement them, with the Inertial Measurement Unit (IMU) being a frequently used sensor. When each sensor is used individually, there are limitations and challenges. LiDAR has range constraints and pose estimation issues during movement, while cameras are susceptible to lighting conditions. Furthermore, IMU sensors struggle with cumulative drift. To improve these limitations, multiple sensors are combined to mitigate the drawbacks of each.

Each method is not exclusive to sensors it was initially designed for. Just as techniques employed in Camera-IMU systems can be adapted for LiDAR-IMU systems, a choice of a specific method doesn't necessarily restrict its applicability to a single multi-sensor.

Recently, when exploring Multi-Sensor Fusion SLAM research, recent researches have shown a slightly greater focus on Graph-based SLAM compared to Filter-based SLAM. Additionally, a prominently mentioned research in these studies is VINS-Mono ^[2].

As SLAM technologies are becoming more commercialized, various SLAM products are emerging. However, many of these products are challenging for individuals to directly utilize. That is, it is difficult to acquire and use raw measurements from the sensors of the products, or modify the algorithms and code of the products.

As time progresses, SLAM is expected to be increasingly integrated into individual products or other robotic systems. However, every product is bound to have limitations, and it becomes challenging for individuals to make further improvements directly.

The SLAM methods introduced so far involve obtaining raw measurements directly from sensors and fusing them to estimate the position. Furthermore, beyond these approaches, I believe that research should focus on utilizing existing products to leverage previously estimated data from sensors. This can be achieved without directly modifying algorithms but through additional efforts to achieve more accurate position estimation.

ACKNOWLEDGMENTS

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2022R1C1C1011084).

REFERENCES

M. Kaess, H. Johannsson, R. Roberts, V. Ila, J. J. Leonard, and F. Dellaert, ``iSAM2: Incremental smoothing and mapping using the Bayes tree,'' The International Journal of Robotics Research, vol. 31, no. 2, pp. 216-235, 2012.

T. Qin, P. Li, and S. Shen, ``VINS-Mono: A robust and versatile monocular visual-inertial state estimator,'' IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004-1019, August 2018.

D. Solodar and I. Klein, ``VIO-DualProNet: Visual-inertial odometry with learning based process noise covariance,'' arXiv preprint arXiv:2308.11228, 2023.

C. Le Gentil, T. Vidal-Calleja, and S. Huang, ``IN2LAMA: Inertial LiDAR localisation and mapping,'' Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 6388–6394, May 2019.

P. Geneva, K. Eckenhoff, W. Lee, Y. Yang, and G. Huang, ``OpenVINS: A research platform for visual-inertial estimation,'' Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, pp. 4666–4672, 2020.

J. Lv, K. Hu, J. Xu, Y. Liu, X. Ma, and X. Zuo, ``CLINS: Continuous-time trajectory estimation for LiDAR-inertial system,'' Proc. of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6657–6663, 2021.

H. Ye, Y. Chen, and M. Liu, ``Tightly coupled 3D LiDAR inertial odometry and mapping,'' Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, pp. 3144–3150., May 2019

C. Chen, P. Geneva, Y. Peng, W. Lee, and G. Huang, ``Monocular visual-inertial odometry with planar regularities,'' Proc. of the IEEE International Conference on Robotics and Automation (ICRA), 2023.

P. Geneva, K. Eckenhoff, Y. Yang, and G. Huang, ``LIPS: LiDAR-inertial 3D plane SLAM,'' Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, October 2018.

W. Xu and F. Zhang, ``FAST-LIO: A fast, robust LiDAR-inertial odometry package by tightly-coupled iterated Kalman filter,'' IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3317–3324, April 2021.

T. Shan, B. Englot, D. Meyers, W. Wang, C. Ratti, and D. Rus, ``LIO-SAM: Tightly-coupled LiDAR-inertial odometry via smoothing and mapping,'' Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2020.

T. M. Nguyen, M. Cao, S. Yuan, Y. Lyu, T. H. Nguyen, and L. Xie, ``LIRO: Tightly coupled LiDAR-inertia-ranging odometry,'' Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, pp. 14484–14490, May–June 2021.

J. Zhang and S. Singh, ``Low-drift and real-time LiDAR odometry and mapping,'' Autonomous Robots, vol. 41, no. 2, pp. 401–416, February 2017.

H. Tang, X. Niu, T. Zhang, L. Wang, and J. Liu, ``LE-VINS: A robust solid-state-LiDAR-enhanced visual-inertial navigation system for low-speed robots,'' IEEE Transactions on Instrumentation and Measurement, vol. 72, 8502113, 2023.

J. Lin, C. Zheng, W. Xu, and F. Zhang, ``R$^2$LIVE: A robust, real-time, LiDAR-inertial-visual tightly-coupled state estimator and mapping,'' IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7469–7476, 2021.

C. Zheng, Q. Zhu, W. Xu, X. Liu, Q. Guo, and F. Zhang, ``FAST-LIVO: Fast and tightly-coupled sparse-direct LiDAR-inertial-visual odometry,'' Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4003–4009, October 2022.

T. Shan, B. Englot, C. Ratti, and D. Rus, ``LVI-SAM: Tightly-coupled LiDAR-visual-inertial odometry via smoothing and mapping,'' Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, pp. 5692–5698, May–June 2021.

Y. Jia, H. Luo, F. Zhao, G. Jiang, Y. Li, J. Yan, Z. Jiang, and Z. Wang, ``Lvio-fusion: A self-adaptive multi-sensor fusion SLAM framework using actor-critic method,'' Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 286–293, September 2021.

X. Zuo, P. Geneva, W. Lee, Y. Liu, and G. Huang, ``LIC-fusion: LiDAR-inertial-camera odometry,'' Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, November 2019.

Q. Fu, J. Wang, H. Yu, I. Ali, F. Guo, Y. He, and H. Zhang, ``PL-VINS: Real-time monocular visual-inertial SLAM with point and line features,'' arXiv preprint arXiv:2009.07462, 2020.

J. He, M. Li, Y. Wang, and H. Wang, ``PLE-SLAM: A visual-inertial SLAM based on point-line features and efficient IMU initialization,'' arXiv preprint arXiv:2401.01081, 2024.

X. Xu, L. Zhang, J. Yang, C. Cao, W. Wang, Y. Ran, Z. Tan, and M. Luo, ``A review of multi-sensor fusion SLAM systems based on 3D LiDAR,'' Remote Sensing, vol. 14, no. 12, 2835, 2022.

C. Debeunne and D. Vivet, ``A review of visual-LiDAR fusion based simultaneous localization and mapping,'' Sensors, vol. 20, no. 7, 2068, April 2020

M. Servières, V. Renaudin, A. Dupuis, and N. Antigny, ``Visual and visual-inertial SLAM: State of the art, classification, and experimental benchmarking,'' Journal of Sensors, vol. 2021, 2054828, pp. 1–26, 2021.

I. A. Abaspur Kazerouni, L. Fitzgerald, G. Dooly, and D. Toal, ``A survey of state-of-the-art on visual SLAM,'' Expert Systems with Applications, vol. 205, 117734, November 2022

Author

Yein Choi

Yein Choi received his B.S. degree in artificial intelligence, computer science and electronic engineering from Handong Global University, Pohang, South Korea, in 2023. His research interests include SLAM and Robot.

Sung Soo Hwang

Sung Soo Hwang received his B.S. degree in computer science and electrical engineering from Handong Global University, Pohang, South Korea, in 2008 and his M.S. and Ph.D degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2010 and 2015, respectively. He is currently working as an associate professor with the School of Computer Science and Electrical Engineering, Handong Global University, South Korea. His current research interests include Visual SLAM and Neural Rendering based 3D Reconstruction.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

A Survey of Multi-Sensor Fusion in SLAM Utilizing Camera, LiDAR or IMU

Abstract

Keywords

1. Introduction

2. Related Work

2.1. Notation

2.2. Maximum A Posteriori (MAP)

(1)

(2)

2.3. Kalman Filter

(3)

(4)

(5)

(6)

(7)

3. Multi-Sensor Fusion in SLAM

3.1. Integrating Camera and IMU

(8)

(9)

(10)

3.2. Integrating LiDAR and IMU

(11)

(12)

(13)

(14)

(15)

(16)

(17)

3.3. Integrating LiDAR, Camera, and IMU

(18)

(19)

(20)

(21)

3.4. SLAM Utilizing Marginalization or Plane Extraction

4. Conclusion

ACKNOWLEDGMENTS

REFERENCES

Author

Yein Choi

Sung Soo Hwang

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing