Human-Object Interaction (HOI) detection is an essential task for enabling a myriad of human- centric activity recognition, visual question answering, and robotic learning tasks. Existing datasets like HICO-DET, however, suffer from class imbalance and lack of diversity limiting their ability to truly evaluate detection models. To overcome these limitations, this work introduces a balanced external test dataset, called LAIT-HOI (Lighten All the Important Triplets in Human Object Interaction) with the aim to improve class distributions and cover various interaction scenarios. Quantitative evaluations reveal that baseline shows ranking changes in rare categories, showing the ability to generalize. A user study involving 48 participants further compared LAIT-HOI to HICO-DET across four evaluation categories: Human pose diversity, object appearance diversity, scene capture angle diversity and prompt alignment. Our findings demonstrate that LAIT-HOI represents better across these categories than the original HICO-DET baseline by aligning images to interaction prompts better and showing variety in visual context. Additionally, this research provides an extensive literature review of HOI detection methodologies from 2010 – 2024 to contribute context to the design and evaluation of such balanced datasets. Combining quantitative analysis, user feedback, and historical insights highlights the need of balanced datasets in resolving the current challenges and improving HOI detection model evaluation.
Publisher
Ulsan National Institute of Science and Technology