Open-source, cross-platform desktop app for video creators to generate, translate, label speakers with unique per-speaker styling, and place subtitles directly into video editor timelines. Built with React/TypeScript, Rust, WebView, LuaJIT, and custom Adobe extensions.
- Built and maintained over 2+ years with 3.5K+ stars and 400K+ downloads, using a Rust backend to stay under 200MB idle alongside video editors.
- Developed editor integrations for DaVinci Resolve via an embedded LuaJIT HTTP bridge and Adobe apps via custom extensions for timeline audio extraction and subtitle placement.
- Implemented audio preprocessing and on-device model management, automatic language-based model selection, translation, and a formatting engine optimized for CJK, Korean, RTL, Indic, and SE Asian line-breaking and timing constraints.
- Rust
- TypeScript
- React
- Lua
- 3.5K+ stars
- 400K+ downloads
Multimodal video-editing framework that reached 72.5% shot attribute classification accuracy, fusing visual composition, optical flow, audio-temporal patterns, and narrative reasoning for next-shot recommendations.
- Achieved 72.5% accuracy in shot attribute classification, a 16 percentage point improvement over the Anatomy of Video Editing benchmark.
- Developed a multimodal pipeline optimized for Apple's MLX framework, enabling real-time, on-device inference by fusing visual composition, optical flow, and audio-temporal patterns.
- Implemented a narrative reasoning engine for next-shot recommendations using VLMs and LLMs to bridge technical pattern recognition and semantic story comprehension.
- PyTorch
- MLX
- VLMs
- LLMs
- Video AI