YOLOv6: Redefine state-of-the-art for object detection

bArray · on June 29, 2022

https://github.com/meituan/YOLOv6/blob/main/docs/About_namin...

> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.

You should ask _before_ publishing, not _after_.

They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.

There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual

This appears to be on purpose to advertise DagsHub: https://dagshub.com/pricing

saynay · on June 29, 2022

The individual ops could be faster, so even though there are more of them, the overall speed is quicker. The authors mention using more 16bit ops, so that might be part of the reason?

ur-whale · on June 29, 2022

> There is unbelievably little information about the architecture too

Fair enough, but the repo at git clone https://dagshub.com/nirbarazida/YOLOv6 seems to contain somewhat standard torch code.

So it's not exactly like the architecture is a secret.

The weights is a different story, of course.

klohto · on June 29, 2022

> it's important to note that MT-YOLOv6 is not part of the official YOLO series

I don't understand the logic behind it. Since you're not part of the official series, why create the confusion? Now the "official" YOLO will release v6 and then what? Or do you expect them to skip it because you already made v6?

To me, it seems disrespectful, just add a different suffix.

garblegarble · on June 29, 2022

>Now the "official" YOLO will release v6 and then what

I agree that it's disrespectful, although FYI the 'official' YOLO is done - the author, Joseph Redmon, has quit the field[1] because of the military and privacy concerns of CV.

1: https://twitter.com/pjreddie/status/1230524770350817280

black_puppydog · on June 29, 2022

Not only did he quit CV, he also seems to have fun outside the field: https://twitter.com/pjreddie/status/1504180525656801280

Good on him! :)

Traubenfuchs · on June 29, 2022

I will save this link for the next time someone asks for an IT / engineering exit path.

amelius · on June 29, 2022

> because of the military and privacy concerns of CV.

If all the morally good people quit a field, and we're left with only the morally bad people, is that a good thing?

k__ · on June 29, 2022

I know someone working for a big arms dealer, and they say it's really hard to find people, because nobody wants to work with them.

So, it seems to have an effect.

tehjoker · on June 29, 2022

Tip: you don't change the system, the system changes you (or uses you and spits you out) especially when talking about military applications. Refusing to work is the only way to make a difference via your job.

Banana699 · on June 29, 2022

It's a function of how much you tend to perceive yourself as effectual. If you think the world is already a lost cause then you let it rot and try to keep away, if you think you can make a difference then you stay and try.

npRandom · on June 29, 2022

I agree. The name YOLO is heavily amused (A good example: https://github.com/jinfagang/yolov7). However, you should note that the research team of YOLOv5 is also not the original one. As @garblegarble mentioned, the original research group stopped working on it (https://news.ycombinator.com/item?id=31918087).

basedbertram · on June 29, 2022

> But note that YOLOv7 doesn't meant to be a successor of yolo family, 7 is just a magic and lucky number. Instead, YOLOv7 extend yolo into many other vision tasks, such as instance segmentation, one-stage keypoints detection etc..

Are you kidding me?

natly · on June 29, 2022

Deceptive habits like this give ML a really bad reputation and gives me so little confidence that this technology will be used responsibly as ML becomes increasingly powerful.

echelon · on June 29, 2022

At this point so many different people have reused "YOLOv{n}" that the lineage is broken. I don't know how or why it happened, but it continues to be a thing.

chriskanan · on June 29, 2022

I assume this happened because Joseph Redmon, the creator of v1 - v3, left the field of computer vision [1]. So then other people just took (stole?) the name.

[1]. https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...

basedbertram · on June 29, 2022

At the very least v4 has a paper attached to it. The authors of v5 have claimed that they'll release a paper at some point, but it has never materialized. It doesn't seem like v6 even makes that promise.

npRandom · on June 29, 2022

Yup. They don't plan to release a paper but a technical report https://github.com/meituan/YOLOv6/issues/95

lamfm95 · on July 1, 2022

Have you read the article or just tried to be the 1st comment? They did mention the reason, which seems sensible to me.

martingoodson · on June 29, 2022

Just to clear things up: Joseph Redmon (who made the first YOLO) has anointed Alexey Bochkovskiy as the keeper of the flame [1]. Alexey is a very careful researcher and does a ton of performance evaluation on his models. His results are to be trusted.

He gave a great talk at the London Machine Learning Meetup in April, if you’re interested [2]. (Full disclosure: I run the meetup)

[1] https://mobile.twitter.com/pjreddie/status/12538910781821992...

[2] https://youtu.be/nxOzeTmqe3Y

hvdijk · on June 29, 2022

> We can clearly see that YOLOv6s detects more objects in the image and has higher confidence about their label.

If actually look at the images they provide directly above:

In the first image, the older one detects one extra tie. In the second image, the objects detected are the same. In the third image, the older one detects a stop sign, and this new network (no, let's not call it YOLOv6) appears to get confused by the two cars behind one another and detects two objects, but the bounding boxes of the objects includes both, it doesn't look like it actually separates them. But to be fair to them, they do detect an additional person on the left.

rocauc · on June 29, 2022

Joseph Redmon was the original author of the YOLO family of models, up through YOLOv3. A maintainer of Darknet, Alexey Bochkovskiy (the framework for the original three YOLO models), published YOLOv4. Glenn Jocher used “YOLOv5” and showed the ML community that you can, but not without controversy[1], force a name into existence with adoption.

It appears the authors of YOLOv6 are aiming to employ a similar clever naming strategy.

I’m looking forward to more benchmarks before getting too excited.

[1] https://blog.roboflow.com/yolov4-versus-yolov5/

stathibus · on June 29, 2022

If yolov6 is really a significant improvement, the article does a very poor job explaining it, but does a decent job of hyping it up for people who know nothing about the field. Who is the audience here?

isaacfrond · on June 29, 2022

You can try version 5 on your iphone in realtime.

https://apps.apple.com/us/app/idetection/id1452689527

sdlion · on June 29, 2022

There's a reason to not compare it with the YOLOv4 family? If I recall correctly, the advantage of YOLOv5 over YOLOv4 is still disputed and YOLOv4-tiny seems widely used.

amelius · on June 29, 2022

They cut the y-axis in the graph. The improvement is less dramatic than they want to make it seem.

rjdagost · on June 29, 2022

Welcome to computer vision / machine learning research!

toxik · on June 29, 2022

It is absolutely not common in CV or ML to use such underhanded tricks. Peer review is allergic to it.

sdlion · on June 30, 2022

Is not the same scale cut out used in Scaled Yolov4 and previous research? (Though this one is on AP not mAP). It seems it is needed since the improvement would be hard to appreciate if it would show a 0-100 scale, since improvement in OR models are precisely not dramatic. https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_S...

toxik · on July 1, 2022

I honestly find that deceiving and I would’ve said so in a review. Since it /is/ a percentage, I think you should absolutely show the zero - and the maximum if applicable. The correct scale is 0-1. It being a minute difference is exactly the type of thing a fair plot would reveal.

However, I think it’s better to use a semilog plot here - and show 1 - AP. I’m sure there’s a name for that number too. Then perhaps invert the Y axis if you really want the curve trending upward. No need for the cut, and infinite precision.

You of course should note this in the figure caption. This goes doubly so for cutting axes.

amelius · on June 29, 2022

How about cherry picking (which peer review can't easily catch)?

toxik · on June 29, 2022

IMO being able to cherry-pick results is pretty interesting in itself. Also, good works tend to come with online demos or even exact parameters — hard to hide your shortcomings when people can reproduce your work.

notme1234 · on June 29, 2022

What is the best object detection that is able to run on Raspberry PI 2? I checked tiny Yolo which requires less resources but it is much less accurate than the regular Yolo. Looking for the best accuracy with the least CPU/RAM requirements. Mainly needs this to tell if an image contain a person (still images from a network camera, which publishes an image upon motion detecion).

sdlion · on June 30, 2022

It would seem that it depends on the architecture you will be using. Whether is a ARM, GPU, mobile GPU processor, etc. This is comment from the author of YOLOv4 mentions that NanoDet is more suitable for ARM-CPU's https://twitter.com/alexeyab84/status/1436377831974506496

singularity2001 · on June 29, 2022

is there any progress on combining object detection with object segmentation? So instead of boundary boxes we get the true shape of objects? I know segmentation exists, just wondering about integration with Yolo or similar.

Q6T46nT668w6i3m · on June 29, 2022

Yes, look into instance or panoptic segmentation. The most popular method is a region-based network that jointly regresses bounding box coordinates alongside an object mask and class label.

singularity2001 · on June 29, 2022

Thanks. The next step would be combining it with text-image foundation models such as clip https://github.com/openai/CLIP so that the model no longer depends on a limited set of predefined labels (coco…), right?

Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).

Exciting decade.

yeldarb · on June 29, 2022

Yes, this is called "instance segementation". There's a YOLO-based instance segmentation model called YOLACT.

singularity2001 · on June 29, 2022

Wonderful, thanks.

It says backbone: Resnet101-FPN in https://github.com/dbolya/yolact ?

Anyone else looking for a pip installable solution: I found https://github.com/ayoolaolafenwa/PixelLib

And most current: https://github.com/yeliudev/catnet

dimatura · on June 29, 2022

This task is called instance segmentation and is an active research topic. Mask-RCNN is relatively old these days, but still might be the most popular approach. There also happens to be a few approaches for the task taking inspiration in methodology from YOLO, e.g. YOLACT (which clearly also pays homage in name).

genewitch · on June 29, 2022

Is this "rotoscoping"?

klysm · on June 29, 2022

In real time?

franciscop · on June 29, 2022

Is this available for https://coral.ai/ somehow (USB accelerator)? Would it be difficult to convert it? I've played with the USB accelerator and it's cool, but would love to use some of these better algorithms since I found the default available ones were lacking.

joshvm · on June 29, 2022

You can use yolov5 - here's a repo I made for an Ultralytics competition: https://github.com/jveitchmichaelis/edgetpu-yolo

Note: typical constraint is RAM and changes to the EdgeTPU compiler which now fails to convert larger models. Previously (version 15?) it would delegate layers to the CPU, but now it just doesn't work at all for large input sizes.

Also while it works, I think it's unlikely to be much better than a well trained mobilenet SSD. The advantage is you can train in pytorch and go from there, training quantised/edge models in Tensorflow is tricky.

codeinassembly · on June 29, 2022

It is available via ONNX, which can convert to tensorflow weights. From there, it's possible to perform post-training quantization and it should finally be available for use with Coral. However, there's a good chance operations aren't yet supported by their chip, and accuracy will surely take a hit.

eis · on June 29, 2022

I see you are working at "dagshub". Maybe you can let the people know that it's not a good show to create fake accounts here to push the story and leave useless praising comments.

formerly_proven · on June 29, 2022

Is this another fraud like YOLOv5?

rjdagost · on June 29, 2022

Calling YOLOv5 a "fraud" is a bit harsh. It has many excellent aspects for practitioners: easy to use, fast inference time, scalable model architecture, and it has many helpful utilities built-in for model deployment. In my experience, in real use-cases the models achieve about the same precision / recall / mAP as well as "state of the art" methods that report better stats on benchmarks.

saynay · on June 29, 2022

All YOLOs past v3 will be, to some extent, since the original author wont be releasing anymore versions. This one looks to be in pytorch, so it is also not in darknet. I am not really sure what makes it "YOLO" anymore other than being a single-shot detector, but the author claims they took inspiration from the techniques in the original YOLO papers.

npRandom · on June 29, 2022

I compared YOLOv5 and v6 on several images, and v6 outperformed v5 by ~10% in the confidence level of the labels.

eis · on June 29, 2022

Comparing confidence metrics of the networks themselves is like comparing two athletes by asking them each how good they are and declaring athlete B the winner of the race because he thought he was better than athlete A thought about himself.

jdelman · on June 29, 2022

With graphics cards prices coming down, I'm considering purchasing one to mess around with GPU-based ML. Is a model like YOLOv6 runnable on a modern single GPU? If so what would get me the best bang for my buck?

sdlion · on June 30, 2022

For training, more GPU RAM will allow you train with greater resolutions in less time and better performance. Before feeding data to a model, it needs to be resized to a "network dimension" (YOLOv4 default is 416x416 px if I recall correctly). For training, it will group several samples and train with them at the same time, in "batches". For better generalization you want bigger batches (so more different images are feed at the same time). With a 3060 (non-Ti) you'll have 12GB of GPU RAM, with that I think you can run the default settings (network size, batch size and subdivision of batches) for the YOLOv4 model. If you want to go to 512px, you might have to increase the subdivision (create more subbatches) or reduce the batch size. If I recall correctly, you could find 3070 with less than 12GB of RAM, so in trying to purse faster training times (I'm not talking about inference, using the model to actually recognize something) you might not be able to train with a broader range of options that can improve your accuracy.

KingOfCoders · on June 29, 2022

I'm training YOLO on a 2080TI works fine but YMMV. Waiting for a 4080(TI).

genewitch · on June 29, 2022

Relatedly someone linked a "all in one ML imaging software" called chaos vision or something, and the first thing on the linked page was "you probably need an Nvidia rtx 3090, or another Nvidia card with 24GB of memory"

I've tested a 'machine vision for image tagging' self-hosted service and it seemed reasonably responsive, CPU only, too - but I ran a pre-trained model for that.

ekleraki · on June 29, 2022

I would wait for the 4000 series from nVidia, which should be this fall, and then making a purchase. The best bang for your buck will likely be a 4070, or a discounted 3080.

xbar · on June 29, 2022

A distinct name should have been chosen.

toxik · on June 29, 2022

Curiously, a host of newly created accounts are posting positive comments here. Is this astroturfing?

airbreather · on June 29, 2022

seems like the models are able to pick a lot of things, but they don't what a strawberry is...

inbars · on June 29, 2022

Very cool

inbars · on June 29, 2022

Very cool!