Multimodal Sensor Fusion in Urban Environments

Berkin Öztürk
3 min readNov 25, 2021

--

Before we talk about Multimodal Sensor Fusion in Urban Environments, let’s define it to understand what it is. Defining sensor fusion is combining information from two or more sensors to perform a prediction. All these sensors refer to a certain type of information such as vision, sound, smell, and taste. And the multimodal sensor fusion is described as joining information from two or more modals to perform a prediction. It is a topic in artificial intelligence and the Internet of Things branches. McGurk effect and Kalman Filter are well-known and established methods for multimodal sensor fusion. The McGurk effect occurs when a person perceives that another’s lip movements do not correspond to what that individual is saying. This effect shows us the way people recognize a sound can be modulated by visual sensory information accompanying the sound. And with Kalman Filter, we use the optimal algorithm to estimate the states of a system from indirect and uncertain measurements. Another common use of Kalman filters is, where you can best predict the state of a system (for example, the position of a car) by combining measurements from multiple sources. With this, using the data fusion approach to determine the traffic state (low traffic, traffic jam, medium flow) using roadside collected acoustic, image, and sensor data. In the field of autonomous driving sensor fusion is used to combine the redundant information from complementary sensors in order to obtain a more accurate and reliable representation of the environment.

There is no perfect sensor. We just try to achieve the best result. Data is derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources are used individually.

Localization is joining data from internal (monitor the robot’s internal state) and external sensors (monitor the robot’s environments) over time. For multimodal sensor fusion in urban environments; localization, mapping, and navigation are examples of competitive, complementary, and cooperative fusions, respectively. These are mostly used in vehicle control and world modeling. As an example for multimodal sensor fusion in urban environments, we can talk about the mobile robot sample by Dr. Pascal Meißner. It is directly related to localization. In the example, the robot moves and we can see the probabilities of the incoming data thanks to the curve. As the robot moves, the sensor waves move. Finally, we can determine the location of the robot. These waves move mathematically according to the movements of the robot. It is the combining example of practical application and main theoretical outcome of multimodal sensor fusion in urban environments. Simply, these works and researches can be used for making automated robots for grocery shopping from home in practice or further researches. However, using this technology in urban environments will greatly facilitate our lives in the future.

Also, LIDAR scanning is standing as another example by Dr. Jürgen Hess. Although the method of controlling the scanner differs depending on the application technology, it is a common difficulty that it is difficult to precisely regulate a mechanically moving scanner. The disparity in scan speed between the acceleration zone and the constant region causes a scan error when a scanner is operated at a high speed. Scanning error in a LIDAR sensor stops the user from attaining the goal of identifying the exact position of an object by gathering or transmitting information that is not what the user wants. For solving the perception problem, we need a combination of LIDAR, camera, and radar. One sensor alone is not enough. Detecting, classifying, and tracking objects are the different types of multimodal sensor fusions.

As I mentioned before, the only problem is, corresponded between information from different sensors is unknown. And the second problem is information provided at different levels of abstraction with different types of uncertainty. Sensors do only capture incomplete and noisy data. To prevent this as much as possible, sensors must be calibrated and registered with each other and sensors must be time-synchronized.

In summary, multimodality is the use of different types of information. Sensor fusion combines data from different sources to obtain more robust and complete information. The bases filter fuses data of internal and external sensors to localize robots.

--

--

Berkin Öztürk

If that shortcut was actually a shortcut, it would be called a route.