
Try Apple’s FastVLM: Lightning-Fast Video Captioning from Your Browser
Apple’s FastVLM demo lets you experience near-instant video captioning locally on your Apple Silicon Mac via a browser, combining speed, privacy, and accessibility.
Apple has launched an exciting breakthrough in video captioning technology with its FastVLM model, which offers lightning-fast, near-instant video and image description directly from your browser—no cloud needed. This amazing tech demo runs locally on Apple Silicon-powered Macs, including M2 and M3 chipsets, delivering real-time captions with minimal delay.
FastVLM is Apple’s state-of-the-art visual language model optimized for speed and efficiency using the MLX machine-learning framework tailored for Apple hardware. Compared to similar models, FastVLM is up to 85 times faster in generating captions while being more than three times smaller, unlocking the ability to run smoothly on consumer devices rather than relying on data center GPUs.
You can try it yourself via an online demo hosted on Hugging Face. It requires just a compatible Mac and a modern browser that supports WebGPU. Once loaded, point your Mac camera at any scene, and watch as FastVLM streams live captions describing objects, actions, and even expressions with remarkable speed and accuracy. You can tweak prompts to focus on specific details like emotions, colors, or objects to see real-time adjustments in the captioning style.
What sets this model apart is its privacy-forward approach: everything runs locally on your device, so no video or image data is uploaded anywhere. This means it works offline and maintains user privacy, making it ideal for accessibility applications for those with hearing impairments, content creators who need quick metadata tagging, or enterprise environments requiring on-premise video processing.
While the demo currently uses the lightweight 0.5-billion-parameter version for responsiveness, bigger variants with up to 7 billion parameters exist for richer and more complex understanding, though those may run better in dedicated apps rather than browsers.
FastVLM signals a promising future where instant, real-time captioning and visual understanding are a given—always on and always private—right from the devices in our hands, empowering users and creators alike.
Give it a go on your Mac and see the future of video captioning unfold live in your browser!
