SceneHub: A Dataset and Evaluation Framework for 6-DoF 4D Scenes

1Carnegie Mellon University, 2Northeastern University
Under Review

⚠️ The “Dataset” link above includes all components, but only a subset of RGB-D data (100 frames per scene) for easier access. For full dataset access, refer to the Dataset README.

SceneHub Teaser Figure

The dataset features: (1) long, unbounded RGB-D sequences with high-quality background geometry;
(2) multiple 3D representations.

Supported Features

Camera Pose Per view RGBD Point Cloud Mesh Gaussian Splat Photogrammetry Backdrop Multi-person Interactive Objects Full Scene Metric Software Suite

Teaser Video

Abstract

We present a new dataset and evaluation framework for benchmarking 6-DoF 4D volumetric scenes. Our dataset captures long, dynamic sequences across diverse real-world indoor environments, with synchronized multi-view RGB-D streams, calibrated camera poses, and high-resolution background geometry reconstructed via photogrammetry and LiDAR. We provide a unified representation suite including point clouds, textured meshes, and Gaussian splats, along with tools for format conversion, rendering, and metric evaluation. To support structured comparison and perceptual analysis, we introduce novel metrics such as Geometry Complexity Score (GCS), SSIM-aware GCS, and Volumetric Temporal Information (V-TI). These components enable detailed characterization of spatial and temporal complexity, and facilitate benchmarking for tasks such as compression, view synthesis, and scene-aware rendering. Our dataset bridges gaps in scale, quality, and realism found in existing benchmarks, providing a comprehensive foundation for immersive 3D research.

Dataset Summary

SceneHub Teaser Figure

Table 1: Dataset description with scene size, actor count, frame count, triangle density, GCS, GCSSSIM≥0.98, Depth SI, and V-TI.

Metric Description

3D Scene Viewers

⚠️ The table shows example visualizations for scene ID = 0 (a static scene) from our dataset. This demo uses the ARENA, which runs in your web browser to render interactive 3D content. You can log in anonymously or with your Google account. Loading may take a few seconds depending on your network and browser performance.

Scene Point Cloud Mesh 3D Gaussian Splat (3DGS) Photogrammetry 3DGS (High-resolution)
Lab area View View View View View
Couch View View View View View
Kitchen View View View View View
Whiteboard View View View View View
Factory View View View View View

Raw data size comparison across 3D representations

SceneHub Teaser Figure

Table 2: Raw data size comparison across 3D representations for one temporal frame t (Geometry, Color component).
Ncam, P, V, T, and G denote the number of camera views, points, vertices, triangles, and gaussians, respectively.

Camera Pose Variants

Each scene includes calibrated original camera extrinsics along with three types of virtual camera poses to enable view-aware evaluation.

  • Shifted views: Slight offsets from the original pose (up/down/left/right).
  • Interpolated views: Midpoints between camera pairs using Slerp and linear translation.
  • Random views: Uniformly sampled within the scene volume, looking at the center.

These variants allow evaluating 3D reconstruction fidelity under diverse viewpoint conditions.

Camera pose visualization
Figure: Camera pose visualization in the capture room.
Original (cam1–4), Interpolated, Shifted, Random viewpoint.

Videos by Scene

BibTeX

BibTex Code Here