Project Overview
Objective: Develop an object detection system using Landing.ai to determine if a person is focused on the camera during video calls, such as on Zoom and other platforms. The goal is to enhance user experience by providing feedback on camera focus, improving engagement and communication effectiveness.
Methodology:
- Dataset Preparation:
- Collected images of people facing the camera and labeled them as "Facing Camera".
- Collected images of people facing away from the camera and labeled them as "Facing Away".
- Included images with varying angles and lighting conditions to enhance model robustness.
- Split the dataset into training and testing sets for model evaluation.
- Model Training:
- Utilized Landing.ai platform for training the model.
- Selected RTMDet architecture with 9 million parameters for fast training and inference.
- Conducted training with 100 epochs, leveraging Landing.ai’s GPU resources in the cloud.
- Employed default hyperparameter tuning provided by the Landing.ai platform.
- Model Evaluation:
- Tested the trained model using a laptop camera, capturing images at various angles.
- Achieved high accuracy with the model correctly predicting focus in almost all test images.
- Observed 100% accuracy on both training and validation datasets, noting the potential for overfitting due to the small dataset size.
- Future Improvements:
- Expand the dataset to include more diverse images for better generalization.
- Perform extensive hyperparameter tuning to optimize model performance.
- Integrate the system into video conferencing applications to provide real-time feedback on camera focus.
Model Architecture Details from Landing.ai:
- RtmDet-[9M]: Fastest training and inference times with 9 million parameters.
- RepPoints-[20M]: Faster training and inference than RepPoints-[37M] with 20 million parameters.
- RepPoints-[37M]: Captures complex patterns with 37 million parameters, though slower in training and inference.
Conclusion: The developed system successfully determines if a person is focused on the camera, demonstrating the potential to improve video call experiences. With further dataset expansion and model tuning, the system can be integrated into video conferencing tools for enhanced user engagement and communication.
High-Level Overview
Objective: To create an object detection system that identifies whether a person is focused on the camera during video calls using Landing.ai.
Approach:
- Data Collection: Gathered and labeled images of people facing the camera and facing away.
- Model Training: Used Landing.ai’s platform with RTMDet architecture, training for 100 epochs with default hyperparameter tuning.
- Evaluation: Tested the model with a laptop camera and my phone, achieving high accuracy in detecting focus.
Outcome: The system accurately detects camera focus, with 100% accuracy on both training and validation datasets. Future work includes expanding the dataset and fine-tuning the model for broader application.