Portrait of Zijie Cai

Hello, I’m Zǐjié Cài

Follow me
Scroll Down

Projects

Gesture Control for Apple TV Screenshot

Gesture Control Apple TV

Real-time hand gesture control using MediaPipe and OpenCV, integrated with pyatv for mapping remote actions via Wi-Fi (play/pause, navigate, select, etc).

Comp Vision Data Pipeline
N-Queens Playground Screenshot

N-Queens Playground

Interactive visualizer for the N-Queens problem with backtracking and heuristics, exploring different algorithms and performance trade-offs.

Algorithms Visualizer
Coqui Voice Clone GUI Screenshot

Coqui AI Voice Clone GUI

Desktop GUI for voice cloning with Coqui XTTS-V2, supporting dataset upload, TTS, translation, Hugging Face models, and multilingual audio playback.

Voice AI TTS
ESM Contact Prediction Screenshot

Protein Contact Prediction

Framework and Colab tool to evaluate protein contact predictions from ESM-2 models against MSAs, with visual maps and precision/recall metrics.

LLMBio

Experience & Education

Experience

  • May 2025 – Present
    Instructor — AI & ML
    Internal Drive Inc, College Park, MD
    • Delivered project-based instruction on Python, ML, and prompt engineering using OpenAI API, Keras, NumPy, and Pandas.
    • Designed and led labs in computer vision, NLP, and text/audio generation with Neural Networks and TTS/STT APIs.
    • Mentored student teams through full ML pipelines: preprocessing → model development → API integration → debugging.
  • Apr 2023 – Present
    Research Assistant — Multimodal Perception
    Intelligent Sensing Laboratory, University of Maryland
    • Developed and benchmarked models for underwater monocular metric depth estimation using synthetic RGB data.
    • Led experiments on simulating synthetic and collecting real-world underwater RGB-D and acoustic sonar datasets.
    • Presented weekly progress and co-authored publications with Prof. Christopher Metzler on multimodal AUV perception pipelines.

Education

  • Aug 2024 – May 2025
    M.S. in Computer Science
    University of Maryland, College Park — GPA: 3.83 / 4.00
    • Master’s Paper: Underwater Monocular Metric Depth Estimation: Benchmarks and Fine-Tuning [arXiv].
    • Graduate Research focus in Multimodal Foundational Models.
    • Relevant Coursework: Deep Learning Systems, Advanced Computer Graphics, Computational Imaging, Computational Biology
  • Aug 2020 – May 2024
    B.S. in Computer Science (Machine Learning)
    University of Maryland, College Park — GPA: 3.70 / 4.00
    • Minor in Mathematics (GPA: 3.72)
    • Dean’s List (multiple semesters).
    • Undergraduate Teaching Assistant for CMSC351 (Algorithms).
    • Relevant Coursework: Object-Oriented Programming, Operating Systems, Networks, Databases, AI, Machine Learning, Computer Vision, NLP, Data Science, Algorithms, Data Structures, Parallel Computing, Applied Probability, Linear Algebra, Advanced Calculus.

Skills

Machine LearningAdvanced
PyTorch TensorFlow scikit-learn OpenCV Transformers
Systems & OptimizationIntermediate
CUDA Triton TensorRT C++ Quantization Profiling
LLMs & RAGProficient
Prompt Engineering LangChain Fine-tuning Evaluation
Web DevelopmentProficient
HTML CSS JavaScript TypeScript React Next.js Node/Express
Data & MLOpsProficient
SQL Docker CI/CD PyTest
CloudProficient
GCP AWS Cloud Run Lambda
Languages
Mandarin Chinese — Native English — Fluent Japanese — Beginner
Soft Skills
Communication Teaching Teamwork Problem‑Solving Leadership

Writing & Talks

Paper & Talk

CCT with FlashAttention + Triton

2025 Systems, Optimization, Transformers

Optimizing Compact Convolutional Transformers with FlashAttention and fused Triton MLP kernels, reducing memory without accuracy loss.

Paper

NeRF Ablations

2024 Neural Rendering, 3D Reconstruction

Ablation study of Neural Radiance Fields across hyperparameters, samplers, dataset size, and resolution, analyzing efficiency vs. quality.

PDF
Paper & Talk

Lensless Real-Time Reconstruction

2024 GD/ADMM, U-Net, DiffuserCam

Benchmarks of lensless camera reconstructions with GD/ADMM and learning-based methods on DiffuserCam, balancing speed and quality.

Contact

Let’s build something together.

0 / 1200

Live Visitor Map

Recent Past

See where visitors are coming from.