MediaPipe Hand Crop Fix
Fixes limitations in hand Region of Interest prediction in MediaPipe Holistic
This project delves into a significant challenge faced by MediaPipe Holistic's hand Region of Interest (ROI) prediction, particularly in scenarios with non-ideal hand orientations, which directly impacts the accuracy of sign language recognition. The existing method relies on three hand keypoints (wrist, index, pinky) to estimate the hand ROI, but falls short when the hand is not parallel to the camera.
To address this limitation, it presents a data-driven approach leveraging an enriched feature set, including additional hand keypoints such as shoulder, elbow, and thumb, along with the z-dimension. By incorporating these additional parameters, the proposed enhancement aims to provide more robust ROI estimation, crucial for accurate hand keypoint detection across various orientations and movements.
The methodology involves the utilization of the Panoptic Hand DB dataset, a valuable resource containing manually annotated 2D hand poses. This dataset serves as the foundation for training and testing, allowing for a thorough analysis of hand ROI predictions. The proposed approach seeks to formulate a Kolmogorov-Arnold Network (KAN) capable of predicting ROI parameters. This network utilizes all six normalized body right-hand keypoints and the image aspect-ratio, contributing to more accurate and reliable ROI estimation.
Comments
None