Ying

Heting

7 1

AI & ML interests

None yet

Recent Activity

upvoted an article 6 days ago

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

upvoted an article 7 days ago

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

upvoted an article 7 days ago

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

View all activity

Organizations

None yet

upvoted an article 6 days ago

Article

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

omlab

•

6 days ago

• 11

upvoted 2 articles 7 days ago

Article

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

omlab

•

7 days ago

• 12

Article

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

omlab

•

7 days ago

• 13

updated a model 16 days ago

omlab/OmTrackVLA-0.6B

Other • 0.6B • Updated 16 days ago • 87 • 4

upvoted a paper about 1 month ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published May 27 • 25

upvoted a collection 5 months ago

Qwen3-TTS

Collection

7 items • Updated Jan 22 • 369

liked a Space about 1 year ago

3d-Model-Playground

👀

Control 3D models using gestures and voice

upvoted a paper about 1 year ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10, 2025 • 36

authored a paper over 1 year ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Paper • 2406.16620 • Published Jun 24, 2024 • 3

Ying

AI & ML interests

Recent Activity

Organizations

Heting's activity

VLX-Go: Vision-Language Short-Horizon Waypoint Prediction for Embodied Navigation

VLX-Seek: Improving VLM Fine-Grained Perception via Region Reference Instead of Coordinate Generation

VLX-Flow: Continuous Video Understanding for Real-Time Multimodal Interaction

3d-Model-Playground