Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Yifu Tao² Jianeng Wang² Maurice Fallon² Victor Adrian Prisacariu¹

*Equal contribution. Random display.
¹Active Vision Lab ²Dynamic Robot Systems Group

University of Oxford

Arxiv Video Data

TLDR: A large-scale egocentric dataset with GT 3D and lighting variation for benchmarking NVS and visual relocalization. Easy to use with HLoc, NerfStudio, and 3DGS.

Overview of the Oxford-Day-and-Night Dataset at Example Scene Bodleian. Our dataset captures egocentric sequences across five locations in Oxford under diverse lighting conditions using Meta ARIA glasses. Top-left: Sample fisheye camera views across day and night recordings. Bottom-left: multi-session SLAM points aligned with high-quality laser ground truth. Right: Multi-session SLAM trajectories visualized on a satellite map, demonstrating consistent camera tracking across varying times of day. The dataset enables testing of challenging benchmarks for novel view synthesis and visual relocalization under extreme illumination changes.

Dataset Video

Abstract

We introduce Oxford Day-and-Night, a large-scale, egocentric dataset for novel view synthesis (NVS) and visual relocalisation under challenging lighting conditions. Existing datasets often lack crucial combinations of features such as ground-truth 3D geometry, wide-ranging lighting variation, and full 6DoF motion. Oxford Day-and-Night addresses these gaps by leveraging Meta ARIA glasses to capture egocentric video and applying multi-session SLAM to estimate camera poses, reconstruct 3D point clouds, and align sequences captured under varying lighting conditions, including both day and night. The dataset spans over 30 km of recorded trajectories and covers an area of 40,000 m², offering a rich foundation for egocentric 3D vision research. It supports two core benchmarks, NVS and relocalisation, providing a unique platform for evaluating models in realistic and diverse environments. Dataset statistics available here.

Challenge at Night: Example frames captured at different lighting conditions. The severe degradation in visual quality from day to night highlights the difficulty of consistent scene understanding, posing significant challenges for Egocentric 3D Vision.

Dataset Collection and Processing

We collect data using Meta ARIA glasses, which record raw sensor streams including IMU, RGB, and grayscale video. To capture varied lighting conditions, day, dusk, and night, sessions are recorded between 4-10pm, covering the natural transition from light to dark. Two individuals wear the glasses casually at each site. Recordings are grouped by location and processed with multi-session Machine Perception Service (MPS) provided by Meta, which estimates per-frame camera poses and semi-dense point clouds unified to a common coordinate frame.

Data Collection and Processing Pipeline. At a collection site, our pipeline starts with a) capturing 2-10 minute videos using ARIA glasses under varying lighting conditions. These multi-session recordings are processed using b) the MPS SLAM system to generate point clouds and camera trajectories in a unified coordinate frame. The colors of the points and trajectories represent different recording sessions; c) Leveraging the ARIA data and MPS outputs, we construct two dataset variants for NVS and visual relocalization tasks. Example scene: Observatory Quarter.

Applications

We demonstrate the dataset with two applications: visual relocalization and novel view synthesis.

Novel View Synthesis (NVS)

We create a dataset variant for NVS tasks, which is obtained by subsampling the video by 5x. Camera poses and intrinsics are provided by ARIA MPS service. We also provide fisheye and undistorted images. This variant is easy to use with NerfStudio and 3DGS.

3DGS in the Wild Example. Our dataset provides a unique combination of large-scale coverage, both day and night lighting conditions, and an egocentric perspective for training 3DGS in the wild.

Visual Relocalization

We create a dataset variant for visual relocalization tasks by further spatial subsampling the NVS variant and splitting them into database, daytime queries, and nighttime queries.

Visual Relocalization Example. We provide a database (white) consisting of daytime images, and daytime (orange) and nighttime (blue) queries. We convert ARIA MPS results to COLMAP format to facilitate the use with HLoc. Our dataset features 7197 night queries, 37× more than Aachen Day-Night (191 in total).

Acknowledgement

This research is supported by multiple funding sources, including an ARIA research gift grant from Meta Reality Lab, a Royal Society University Research Fellowship (Fallon), the EPSRC C2C Grant EP/Z531212/1 (TRO), and a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) under grant number RS 2024 00461409.