| dc.description.abstract |
Event cameras and conventional cameras are sensors that have a complementary relationship with each other. Therefore, this study proposes a monocular depth estimation network through the fusion of two modalities. The proposed network is mainly composed of three parts. At first, I apply the event refine- ment module to consider the noisy events in low-light conditions. Through this module, the noise can be removed, and real active events are enhanced. Second, I utilize the recurrent asynchronous encoder to consider the asynchronous property of the event camera. This encoder allows for the fusion of two modalities that are not synced with each other while maintaining the asynchronous nature of events and the benefits of high temporal resolution. Finally, I utilize the local distribution learning, LocalBins. The event data is good for capturing details about the scene regardless of changes in lights. Therefore, in this study, I utilize the decoder based on LocalBins to use the local details from events more effectively. To demonstrate the proposed network, I perform the comparison with the baseline, RAMNet, on the MVSEC dataset. The results of the proposed network verify the superior performance for almost all sequences. Furthermore, the qualitative results confirm that the proposed network is better at predicting thin objects and their contours. |
- |