The PKU-DAVIS-SOD dataset is a large-scale multimodal neuromorphic object detection dataset with some challenging scenarios (i.e., low-light and high-speed motion blur) included. It is constructed by the National Engineering Research Center for Visual Technology, Peking University.
Collection Setup. This dataset is recorded using DAVIS346 which is one of the novel event cameras. As shown in Fig.1(a), we install a DAVIS346 camera on the front windshield of the driving car. For the convenience of acquiring high-speed objects meanwhile providing a comprehensive perspective of the objects, we additionally provide some sequences in which the camera is set at the side of the road, recording objects from the flanks. The DAVIS346 camera shown in Fig.1(b) can simultaneously outputs high temporal resolution asynchronous events and conventional RGB frames with the resolution of 346 ´ 260.
|(a) Recording platform||(b) DAVIS346 camera|
Data recordings and Annotation. Our PKU-DAVIS-SOD dataset contains 3 traffic scenarios by considering velocity distribution, light condition, category diversity and object scale (see Fig. 2), etc. We use the DAVIS346 to record 220 sequences including RGB frames and DVS events. In each sequence, we collect approximately 1 min as the raw data pool with 25 FPS of RGB frames. To provide manual bounding boxes in challenging scenarios (e.g., high-speed and low-light), grayscale images are reconstructed from asynchronous events using E2VID at 25 FPS when RGB frames are of low quality. After the temporal calibration, we first select three common and important object classes (i.e., car, pedestrian, and two-wheeler) in our daily life. Then, all bounding boxes are annotated from RGB frames or synchronized reconstructed images by a well-trained professional team.
|(a) Category diversity||(b) Light Change||(c) Object scale||(d) Velocity distribution|
Data Statistics. Manual annotations in the recordings are provided at a frequency of 25 Hz. As a result, this dataset has 276k labeled timestamps and 1080.1k labels in total. Afterward, we split them into 671.3k for training, 194.7k for validation, and 214.1k for testing. The precise numbers can be found in Table 1.
- DVS events, APS frames, and the corresponding annotation results can only be used for ACADEMIC PURPOSES. No COMERCIAL USE is allowed.
- Copyright @ National Engineering Research Center for Visual Technology and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.
You can download directly from here.
Address: Room 2604, Science Building No.2, Peking University, No.5 Yiheyuan Road, Haidian District, Beijing, P.R.China.