The vehicle is equipped with four LiDAR scanners, two at each side of the roof with a roll angle of 45° between them.
The cameras are mounted on top of our vehicle in two stereo pairs. The left pair is mounted on an independent bar rotated by 30° to capture the incoming road from the left and the right pair facing directly forward. We arranged the cameras with a baseline of 0.33m and 0.27m for the left and right stereo pairs, respectively. We triggered the cameras via a trigger signal emitted when the second camera from the left started exposing its sensor. We recorded the timestamp of each image using the cameras' internal clock and these clocks were synchronized via the IEEE1588-2008 PTP protocol. We also synchronized the computer timestamp to the camera clocks using the same method.
Using this timestamp, we compute LiDAR returns within a given camera frame.
The camera to camera calibration was performed for each pair of stereo cameras separately using the Camera Calibration Toolbox from MATLAB . For determining the camera to LiDAR calibration, we used 50 manually selected constraints between 3D LiDAR points and 2D pixel locations in both the left and right images and used the Levenberg-Marquardt algorithm to minimize projection error. The geometry of the sensor setup was used to compute the initial guess for the rigid body transformation. Prominent edges such as building corners, stop sign poles and electric poles can be easily identified in the point cloud as well as the image which are used for manually choosing the correspondences.
The capture sites and times are selected to maximize the amount of traffic, complexity of crossing patterns, and lighting and weather variation. To capture complex interactions between pedestrians and vehicles, we focused on 4-way stop intersections without any traffic signals. Three intersections are selected around a downtown area where the pedestrian-camera distance ranges from 5-40m. Lighting conditions vary based on cloud cover and shadows cast by the buildings. We manually selected interesting sequences of captured frames for manual annotation based on the observed activity of pedestrians or pedestrian-vehicle interactions.
The raw Bayer 12-bit images were converted into compressed PNG/JPEG image formats. We have compressed the raw images into 16-bit PNG files to keep the high dynamic ranges. Due to the large file size, however, 16-bit PNG images are currently not available for download. Please contact us if you need those image.
For the final release, the raw images were processed with JPEG compression with a quality level of 90. We provide original and rectified images in the Downloads page. We applied the gamma correction when rectifying the images.
The histograms below show the distributions of distance of pedestrians. Distances between pedestrians and camera centers are computed to plot the first two histograms. Note that majority of the pedestrians are within the range of distance 20-35m.
Distribution of pedestrian body orientation relative to the world reference frame is described as a polar histogram. A pedestrian heading straight towards our recording vehicle corresponds to 270°, and pedestrians walking away from the vehicle should be around 90°.