Meta Segment Anything Model 3

vessenes · 2025-11-25T14:31:30 1764081090

Released last week. Looks like all the weights are now out and published. Don’t sleep on the SAM 3D series — it’s seriously impressive. They have a human pose model which actually rigs and keeps multiple humans in a scene with objects, all from one 2D photo (!), and their straight object 3D model is by far the best I’ve played with - it got a really very good lamp with translucency and woven gems in usable shape in under 15 seconds.

Qwuke · 2025-11-25T15:51:06 1764085866

Between this and DINOv3, Meta is doing a lot for the SOTA even if Llama 4 came up short compared to the Chinese models.

nl · 2025-11-25T15:01:42 1764082902

https://ai.meta.com/blog/sam-3d/ for those interested.

Fraterkes · 2025-11-25T15:41:44 1764085304

Are those the actual wireframes they're showing in the demos on that page? As in, do the produced models have "normal" topology? Or are they still just kinda blobby with a ton of polygons

trevorhlynn · 2025-11-25T12:51:06 1764075066

This was front page for a while last week

https://news.ycombinator.com/item?id=45982073

enoch2090 · 2025-11-25T15:13:20 1764083600

Surprisingly, SAM3 works bad on engineering drawings while SAM2 kinda works, and VLMs like Qwen3-VL works as well

phkahler · 2025-11-25T15:10:19 1764083419

Which (if any) of these models could run on a RaspberryPi for object recognition at several FPS?

cheesecompiler · 2025-11-25T15:44:24 1764085464

This would be convenient for post-production and editing of video, e.g. to aid colour grading in Davinci Resolve. Currently a lot of manual labour goes into tracking and hand-masking in grading.

the_duke · 2025-11-25T13:24:24 1764077064

Side question: what are the current top goto open models for image captioning and building image embeddings dbs, with somewhat reasonable hardware requirements?

NitpickLawyer · 2025-11-25T13:44:37 1764078277

Try any of the qwen3-vl models. They have 8, 4 and 2B models in this family.

Glemkloksdjf · 2025-11-25T13:51:53 1764078713

I would suggest YOLO. Depending on your domain, you might also finetune these models. Its relativly easy as they are not big LLMs but either image classification or bounding boxes.

I would recommend bounding boxes.

jabron · 2025-11-25T15:18:48 1764083928

What do you mean "bounding boxes"? They were talking about captions and embeddings, so a vision language model is required.

smallerize · 2025-11-25T14:02:17 1764079337

Which YOLO?

Glemkloksdjf · 2025-11-25T14:55:51 1764082551

Any current one. they are easy to use and you can just benchmark them yourself.

I'm using small and medum.

Also the code for using it is very short and easy to use. You can also use ChatGPT to generate small exepriments to see what fits your case better

throwaway314155 · 2025-11-25T15:07:56 1764083276

There aren’t any YOLO models for captioning and the other models aren’t robust enough to make for good embedding models.

colkassad · 2025-11-25T15:29:28 1764084568

Been waiting days to get approval to download this from huggingface. What's up with that?

shashanoid · 2025-11-25T15:25:05 1764084305

Miss the old segment anything page, used it a lot. This UI I found very complex to use

Workaccount2 · 2025-11-25T12:56:00 1764075360

I do a test on multimodal LLMs where I show them a dog with 5 legs, and ask them to count how many legs the dog has. So far none of them can do it. They all say "4 legs".

Segment anything however was able to segment all 5 dog legs when prompted to. Which means that meta is doing something else under the hood here, and may lend itself to a very powerful future LLM.

Right now some of the biggest complaints people have with LLMs stems from their incompetence processing visual data. Maybe meta is onto something here.

jampekka · 2025-11-25T13:00:11 1764075611

Segmentation doesn't need to count legs. I'd guess something like YOLO could segment 5 legged dogs too.

chompychop · 2025-11-25T13:01:43 1764075703

YOLO is not a segmentation model.

jampekka · 2025-11-25T13:07:23 1764076043

https://docs.ultralytics.com/tasks/segment/

chompychop · 2025-11-25T13:51:18 1764078678

Thanks! TIL there's a class of segmentation models with the YOLO naming scheme.

lucasban · 2025-11-25T13:26:32 1764077192

I thought it was a joke about YAML

Der_Einzige · 2025-11-25T15:30:20 1764084620

Lol you obviously haven't seen what cheats for FPS games look like in the last 3 years.

https://github.com/Babyhamsta/Aimmy

nerdsniper · 2025-11-25T14:41:19 1764081679

You don’t need segmentation to count legs. Object detection can do that. DeepLabCut from 2020 perhaps.

PunchTornado · 2025-11-25T13:36:44 1764077804

I doubt that gemini 3 cannot do it.