Stanford-TRI Intent Prediction (STIP) Dataset

Dataset Details

Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction, IEEE RA-L & ICRA 2020 (Arxiv)
STIP includes over 900 hours of driving scene videos of front, right, and left cameras, while the vehicle was driving in dense areas of five cities in the United States. The videos were annotated at 2fps with pedestrian bounding boxes and labels of crossing/not-crossing the street, which are respectively shown with green/red boxes in the above videos. We used the JRMOT (JackRabbot real-time Multi-Object Tracker) platform to track the pedestrian and interpolate the annotations for all 20 frame per second. See https://github.com/StanfordVL/STIP/ for the details.

Bibtex

@inproceeding{liu2020spatiotemporal,
title={Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction},
author={Bingbin Liu and Ehsan Adeli and Zhangjie Cao and Kuan-Hui Lee and Abhijeet Shenoi and Adrien Gaidon and Juan Carlos Niebles},
year={2020},
booktitle={IEEE Robotics and Automation Letters (IEEE RA-L) and International Conference on Robotics and Automation (ICRA)},
publisher={IEEE}
}

Stanford-TRI Intent Prediction (STIP) Dataset

[Dataset Details] [Bibtex] [Team] [Request Dataset]

Dataset Details

Bibtex

Team

Stanford

TRI

Request STIP Dataset