> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.
You should ask _before_ publishing, not _after_.
They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.
There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual
The individual ops could be faster, so even though there are more of them, the overall speed is quicker. The authors mention using more 16bit ops, so that might be part of the reason?
> it's important to note that MT-YOLOv6 is not part of the official YOLO series
I don't understand the logic behind it. Since you're not part of the official series, why create the confusion? Now the "official" YOLO will release v6 and then what? Or do you expect them to skip it because you already made v6?
To me, it seems disrespectful, just add a different suffix.
>Now the "official" YOLO will release v6 and then what
I agree that it's disrespectful, although FYI the 'official' YOLO is done - the author, Joseph Redmon, has quit the field[1] because of the military and privacy concerns of CV.
Tip: you don't change the system, the system changes you (or uses you and spits you out) especially when talking about military applications. Refusing to work is the only way to make a difference via your job.
It's a function of how much you tend to perceive yourself as effectual. If you think the world is already a lost cause then you let it rot and try to keep away, if you think you can make a difference then you stay and try.
> But note that YOLOv7 doesn't meant to be a successor of yolo family, 7 is just a magic and lucky number. Instead, YOLOv7 extend yolo into many other vision tasks, such as instance segmentation, one-stage keypoints detection etc..
Deceptive habits like this give ML a really bad reputation and gives me so little confidence that this technology will be used responsibly as ML becomes increasingly powerful.
At this point so many different people have reused "YOLOv{n}" that the lineage is broken. I don't know how or why it happened, but it continues to be a thing.
I assume this happened because Joseph Redmon, the creator of v1 - v3, left the field of computer vision [1]. So then other people just took (stole?) the name.
At the very least v4 has a paper attached to it. The authors of v5 have claimed that they'll release a paper at some point, but it has never materialized. It doesn't seem like v6 even makes that promise.
Just to clear things up: Joseph Redmon (who made the first YOLO) has anointed Alexey Bochkovskiy as the keeper of the flame [1]. Alexey is a very careful researcher and does a ton of performance evaluation on his models. His results are to be trusted.
He gave a great talk at the London Machine Learning Meetup in April, if you’re interested [2]. (Full disclosure: I run the meetup)
> We can clearly see that YOLOv6s detects more objects in the image and has higher confidence about their label.
If actually look at the images they provide directly above:
In the first image, the older one detects one extra tie. In the second image, the objects detected are the same. In the third image, the older one detects a stop sign, and this new network (no, let's not call it YOLOv6) appears to get confused by the two cars behind one another and detects two objects, but the bounding boxes of the objects includes both, it doesn't look like it actually separates them. But to be fair to them, they do detect an additional person on the left.
Joseph Redmon was the original author of the YOLO family of models, up through YOLOv3. A maintainer of Darknet, Alexey Bochkovskiy (the framework for the original three YOLO models), published YOLOv4. Glenn Jocher used “YOLOv5” and showed the ML community that you can, but not without controversy[1], force a name into existence with adoption.
It appears the authors of YOLOv6 are aiming to employ a similar clever naming strategy.
I’m looking forward to more benchmarks before getting too excited.
If yolov6 is really a significant improvement, the article does a very poor job explaining it, but does a decent job of hyping it up for people who know nothing about the field. Who is the audience here?
There's a reason to not compare it with the YOLOv4 family?
If I recall correctly, the advantage of YOLOv5 over YOLOv4 is still disputed and YOLOv4-tiny seems widely used.
Is not the same scale cut out used in Scaled Yolov4 and previous research? (Though this one is on AP not mAP). It seems it is needed since the improvement would be hard to appreciate if it would show a 0-100 scale, since improvement in OR models are precisely not dramatic.
https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_S...
I honestly find that deceiving and I would’ve said so in a review. Since it /is/ a percentage, I think you should absolutely show the zero - and the maximum if applicable. The correct scale is 0-1. It being a minute difference is exactly the type of thing a fair plot would reveal.
However, I think it’s better to use a semilog plot here - and show 1 - AP. I’m sure there’s a name for that number too. Then perhaps invert the Y axis if you really want the curve trending upward. No need for the cut, and infinite precision.
You of course should note this in the figure caption. This goes doubly so for cutting axes.
IMO being able to cherry-pick results is pretty interesting in itself. Also, good works tend to come with online demos or even exact parameters — hard to hide your shortcomings when people can reproduce your work.
What is the best object detection that is able to run on Raspberry PI 2?
I checked tiny Yolo which requires less resources but it is much less accurate than the regular Yolo.
Looking for the best accuracy with the least CPU/RAM requirements.
Mainly needs this to tell if an image contain a person (still images from a network camera, which publishes an image upon motion detecion).
It would seem that it depends on the architecture you will be using. Whether is a ARM, GPU, mobile GPU processor, etc. This is comment from the author of YOLOv4 mentions that NanoDet is more suitable for ARM-CPU's
https://twitter.com/alexeyab84/status/1436377831974506496
is there any progress on combining object detection with object segmentation? So instead of boundary boxes we get the true shape of objects? I know segmentation exists, just wondering about integration with Yolo or similar.
Yes, look into instance or panoptic segmentation. The most popular method is a region-based network that jointly regresses bounding box coordinates alongside an object mask and class label.
Thanks. The next step would be combining it with text-image foundation models such as clip https://github.com/openai/CLIP so that the model no longer depends on a limited set of predefined labels (coco…), right?
Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).
This task is called instance segmentation and is an active research topic. Mask-RCNN is relatively old these days, but still might be the most popular approach. There also happens to be a few approaches for the task taking inspiration in methodology from YOLO, e.g. YOLACT (which clearly also pays homage in name).
Is this available for https://coral.ai/ somehow (USB accelerator)? Would it be difficult to convert it? I've played with the USB accelerator and it's cool, but would love to use some of these better algorithms since I found the default available ones were lacking.
Note: typical constraint is RAM and changes to the EdgeTPU compiler which now fails to convert larger models. Previously (version 15?) it would delegate layers to the CPU, but now it just doesn't work at all for large input sizes.
Also while it works, I think it's unlikely to be much better than a well trained mobilenet SSD. The advantage is you can train in pytorch and go from there, training quantised/edge models in Tensorflow is tricky.
It is available via ONNX, which can convert to tensorflow weights. From there, it's possible to perform post-training quantization and it should finally be available for use with Coral. However, there's a good chance operations aren't yet supported by their chip, and accuracy will surely take a hit.
I see you are working at "dagshub". Maybe you can let the people know that it's not a good show to create fake accounts here to push the story and leave useless praising comments.
Calling YOLOv5 a "fraud" is a bit harsh. It has many excellent aspects for practitioners: easy to use, fast inference time, scalable model architecture, and it has many helpful utilities built-in for model deployment. In my experience, in real use-cases the models achieve about the same precision / recall / mAP as well as "state of the art" methods that report better stats on benchmarks.
All YOLOs past v3 will be, to some extent, since the original author wont be releasing anymore versions. This one looks to be in pytorch, so it is also not in darknet. I am not really sure what makes it "YOLO" anymore other than being a single-shot detector, but the author claims they took inspiration from the techniques in the original YOLO papers.
Comparing confidence metrics of the networks themselves is like comparing two athletes by asking them each how good they are and declaring athlete B the winner of the race because he thought he was better than athlete A thought about himself.
With graphics cards prices coming down, I'm considering purchasing one to mess around with GPU-based ML. Is a model like YOLOv6 runnable on a modern single GPU? If so what would get me the best bang for my buck?
For training, more GPU RAM will allow you train with greater resolutions in less time and better performance.
Before feeding data to a model, it needs to be resized to a "network dimension" (YOLOv4 default is 416x416 px if I recall correctly). For training, it will group several samples and train with them at the same time, in "batches". For better generalization you want bigger batches (so more different images are feed at the same time).
With a 3060 (non-Ti) you'll have 12GB of GPU RAM, with that I think you can run the default settings (network size, batch size and subdivision of batches) for the YOLOv4 model. If you want to go to 512px, you might have to increase the subdivision (create more subbatches) or reduce the batch size.
If I recall correctly, you could find 3070 with less than 12GB of RAM, so in trying to purse faster training times (I'm not talking about inference, using the model to actually recognize something) you might not be able to train with a broader range of options that can improve your accuracy.
Relatedly someone linked a "all in one ML imaging software" called chaos vision or something, and the first thing on the linked page was "you probably need an Nvidia rtx 3090, or another Nvidia card with 24GB of memory"
I've tested a 'machine vision for image tagging' self-hosted service and it seemed reasonably responsive, CPU only, too - but I ran a pre-trained model for that.
I would wait for the 4000 series from nVidia, which should be this fall, and then making a purchase. The best bang for your buck will likely be a 4070, or a discounted 3080.
> P.S. We are contacting the authors of YOLO series about the naming of YOLOv6.
You should ask _before_ publishing, not _after_.
They claim it runs faster and is more accurate than YOLOv5, yet requires 3x as much computation (GFLOPs)? Something doesn't add up here.
There is unbelievably little information about the architecture too. Unfortunately it's not in a format I can easily throw the cfg in as visualize it: https://gitlab.com/danbarry16/darknet-visual
This appears to be on purpose to advertise DagsHub: https://dagshub.com/pricing