Zijie Cai — Portfolio

Featured

AItinerary (Preview)

React Node MCP

Live Code

Underwater Depth (RGB + Sonar)

PyTorch Depth Anything V2 Multimodal

Paper Code

UMD Class & Commute Planner

HTML/CSS/JS UI/UX APIs

Live Code

Projects

Gesture Control Apple TV

Real-time hand gesture control using MediaPipe and OpenCV, integrated with pyatv for mapping remote actions via Wi-Fi (play/pause, navigate, select, etc).

Comp Vision Data Pipeline

Video Code

N-Queens Playground

Interactive visualizer for the N-Queens problem with backtracking and heuristics, exploring different algorithms and performance trade-offs.

Algorithms Visualizer

Live Code

Coqui AI Voice Clone GUI

Desktop GUI for voice cloning with Coqui XTTS-V2, supporting dataset upload, TTS, translation, Hugging Face models, and multilingual audio playback.

Voice AI TTS

Video Code

Protein Contact Prediction

Framework and Colab tool to evaluate protein contact predictions from ESM-2 models against MSAs, with visual maps and precision/recall metrics.

LLMBio

Video Code

More Projects

Experience & Education

Experience

May 2025 – Present

Instructor — AI & ML

Internal Drive Inc, College Park, MD
- Delivered project-based instruction on Python, ML, and prompt engineering using OpenAI API, Keras, NumPy, and Pandas.
- Designed and led labs in computer vision, NLP, and text/audio generation with Neural Networks and TTS/STT APIs.
- Mentored student teams through full ML pipelines: preprocessing → model development → API integration → debugging.
Apr 2023 – Present

Research Assistant — Multimodal Perception

Intelligent Sensing Laboratory, University of Maryland
- Developed and benchmarked models for underwater monocular metric depth estimation using synthetic RGB data.
- Led experiments on simulating synthetic and collecting real-world underwater RGB-D and acoustic sonar datasets.
- Presented weekly progress and co-authored publications with Prof. Christopher Metzler on multimodal AUV perception pipelines.

Education

Aug 2024 – May 2025

M.S. in Computer Science

University of Maryland, College Park — GPA: 3.83 / 4.00
- Master’s Paper: Underwater Monocular Metric Depth Estimation: Benchmarks and Fine-Tuning [arXiv].
- Graduate Research focus in Multimodal Foundational Models.
- Relevant Coursework: Deep Learning Systems, Advanced Computer Graphics, Computational Imaging, Computational Biology
Aug 2020 – May 2024

B.S. in Computer Science (Machine Learning)

University of Maryland, College Park — GPA: 3.70 / 4.00
- Minor in Mathematics (GPA: 3.72)
- Dean’s List (multiple semesters).
- Undergraduate Teaching Assistant for CMSC351 (Algorithms).
- Relevant Coursework: Object-Oriented Programming, Operating Systems, Networks, Databases, AI, Machine Learning, Computer Vision, NLP, Data Science, Algorithms, Data Structures, Parallel Computing, Applied Probability, Linear Algebra, Advanced Calculus.

Skills

Machine LearningAdvanced

PyTorch TensorFlow scikit-learn OpenCV Transformers

Systems & OptimizationIntermediate

CUDA Triton TensorRT C++ Quantization Profiling

LLMs & RAGProficient

Prompt Engineering LangChain Fine-tuning Evaluation

Web DevelopmentProficient

HTML CSS JavaScript TypeScript React Next.js Node/Express

Data & MLOpsProficient

SQL Docker CI/CD PyTest

CloudProficient

GCP AWS Cloud Run Lambda

Languages

Mandarin Chinese — Native English — Fluent Japanese — Beginner

Soft Skills

Communication Teaching Teamwork Problem‑Solving Leadership

Writing & Talks

Paper & Talk

CCT with FlashAttention + Triton

2025 • Systems, Optimization, Transformers

Optimizing Compact Convolutional Transformers with FlashAttention and fused Triton MLP kernels, reducing memory without accuracy loss.

PDF Slides

Paper

NeRF Ablations

2024 • Neural Rendering, 3D Reconstruction

Ablation study of Neural Radiance Fields across hyperparameters, samplers, dataset size, and resolution, analyzing efficiency vs. quality.

PDF

Paper & Talk

Lensless Real-Time Reconstruction

2024 • GD/ADMM, U-Net, DiffuserCam

Benchmarks of lensless camera reconstructions with GD/ADMM and learning-based methods on DiffuserCam, balancing speed and quality.

PDF Slides

Contact

Let’s build something together.

Live Visitor Map

Recent Past

See where visitors are coming from.