Projects

Showcase of My Work


Card image cap

Real-Time Predictor in Two-Players Fighting Game via Vision Transformer
First Author, LNCS Vol. 15046 (Springer) · Presented at ISVC 2024

Vision Transformer-based predictor for competitive two-player fighting games.


ArcadeViT introduces a real-time outcome predictor for two-player fighting games using Vision Transformers. Trained on 693 Street Fighter II matches and 274,000 gameplay frames, it fuses visual cues with health bar trajectories to deliver mid-round predictions with up to 96% ROC-AUC. With inference under one second and open-source benchmarks, ArcadeViT enables predictive overlays that enhance esports viewership while laying the groundwork for broader applications in video understanding and behavioral analytics.

Card image cap

Pomeranian Activity Classification
Project for Stanford's CS221 online course

CNN-based system for classifying pet behavior from home surveillance footage.


Built on a small but diverse dataset labeled as Eat, Play, or Sleep, the model fine-tunes a pretrained ResNet18 to recognize discrete behavior from static RGB input. To ensure generalization despite limited data, the training pipeline includes data augmentation and weighted sampling for class balance. After 10 epochs, the model achieved 97.8% accuracy and a macro F1-score of 0.97, substantially outperforming a motion-based baseline by over 60%. These results highlight the potential of lightweight CNNs for real-time, camera-based pet monitoring systems without the need for wearable devices or temporal modeling.

Card image cap

Pocket Racer
Manuscript under review at IEEE Transactions on Intelligent Vehicles.

An accessible autonomous remote control car platform designed for AI education.


Pocket Racer presents a compact, low-cost autonomous racing platform designed to make robotics and AI education accessible and engaging. Built on a 1:28 scale RC chassis and powered by a Raspberry Pi Zero 2 W, the system enables high-speed overtaking up to 16 km/h using only a monocular camera and on-board computation. The RaceViT algorithm, a modified Vision Transformer, converts horizontal image slices into steering commands for end-to-end control. Designed for indoor classrooms and hallways, Pocket Racer eliminates the need for GPS or LIDAR and supports dynamic racing scenarios. With open-source hardware, reproducible build instructions, and benchmarked autonomy algorithms, the platform offers students hands-on experience across the full autonomy stack—from data collection and training to real-time deployment—bridging theoretical machine learning with physical robotics in an accessible and exciting way.