Sony released some software a couple of months ago that lets you use most of their DSLRs as webcams with USB. My goodness, paired with a fast lens, what a difference to my MacBook webcam, even with these ml blurred backgrounds!
It's only 720p and around 15fps but real shallow dof, very little sensor noise, autofocus works. Well worth trying if you have a Sony camera from the last few years.
Sensor size and good optics still wins. Having said that,the effort and detail gone into this feature is very impressive, enjoyed the blog post. Also webassembly SIMD looks super cool, looking forward to a new class of webapps using wasm.
I recently tried to get a setup similar to this with a Fujifilm X-T20 I had lying around, remembering that Fujifilm announced similar software. Alas, that software only works with their higher end models.
I ended up getting a $10 HDMI USB capture stick from Aliexpress. I get a perfect 1080p/60fps signal, and at least on Linux it worked out of the box with Zoom.
The only problem now is that most of my meetings start with "wow, why do you look like you're on TV?"
Canon did too! Definitely a huge upgrade over a typical webcam.
I'm using my old T1i which can be had for less than $50 these days, plus you can pick up a 18-55mm kit lens for like $20 and the video quality blows away any webcam, especially for the same price. Also recommend battery->power adapter.
Canon and Nikon do too. In practice, the quality bump is nice, but we are still talking of a fairly low res/bit rate when it gets through Zoom so the end result is fairly underwhelming. As far as what the other people see on their wnd.
Yeah.. both Zoom and Google Meet have >720p video but the bitrate especially on Zoom is a travesty, 600kbps/1.2mbps stream with all the different resolutions in the same stream.
The codec situation with h264/HEVC/vp9/AV1 software/hardware encoding is a mess. Hopefully we'll get wide hardware support for AV1, although it might take a while.
Woot. Thanks for pointing this out - I looked for a solution a while back and it seemed like I had to get a separate capture card to connect my Sony DSLR. Will go check this out now.
(I ended up having to buy a little logitech webcam, which has been fine, but being able to pick my lens etc is awesome!)
I use my Android (Redmi Note 8 Pro) primary cam (720p I think) using Droidcam and it works like a charm on Linux.
I also tried gPhoto2/ffmpeg and virtual cam driver with Nikon D5200 (USB) on Linux but I prefer the Redmi since I do not have a decent low light lens for my DSLR.
Having used both Zoom and Meet extensively now for the past 6 months, my experience is:
1/ Your internet connection, especially upload bandwidth and latency matter a lot.
2/ Zoom's desktop app performs very well, but its web version is atrocious. Not just because of the dark patterns they use to force you to install the desktop app, but also its performance is terrible compared to its desktop version, as well as worse than almost everything else. Unfortunately, I don't trust them and refuse to use their desktop app on anything but my iPad.
3/ Meet used to be bad like Zoom on web 6 months ago, but has improved a lot and is slowly approaching Zoom desktop in performance. I have noticed that Meet on my work GSuite calls at work perform much better than on my personal account. This might be explained by #1 above I.e. my family has worse internet connections than my coworkers, but I am not sure if all improvements have been rolled out to personal accounts.
> 1/ Your internet connection, especially upload bandwidth and latency matter a lot.
I moved to a new house, and the quality of my video calls dropped dramatically. Constant freezing and dropouts. It was extremely frustrating to try to participate in a meeting. I could receive fine, but anytime I spoke out, I would drop out within minutes.
Speed tests showed plenty of bandwidth, but my modem statistics showed high upstream power levels, occasionally out of the allowed range, and lots of "uncorrectable" packets.
I finally got a Comcast technician in to look at it (yay for business-class support), and they replaced the cable from the pole all the way to the first splitter in the basement, and since then it's been flawless. 100/15 Megabit service has been totally adequate for our needs, so long as it's reliable and the latency is low enough.
It kills me that our city isn't putting in conduits or fiber while doing utility work, though. The whole time that was happening, there were gas contractors opening the street and running new supply lines to every house, but not putting in any extra conduits or dark fiber. The construction sounds were almost like being back in the office...
Today I took a flawless webex meeting on a laptop tethered to my mobile phone, that same tether also allowed me to work without issue over rdp or whatever.
My mobile internet is really fucking good, and often outperforms my sodding wired connection
I've had great experience with T-Mobile 4G. It outperforms my wired Frontier connection in terms of both up/down speeds. Although it has been getting Spotty lately. During peak hours the speed drops significantly.
>1/ Your internet connection, especially upload bandwidth and latency matter a lot.
It grates me when people claim DSL/cable qualifies as sufficiently good broadband in the US because of the lack of upload bandwidth and slow latency (can add packet loss in here too). The situation is so bad that you can't even find how much upload bandwidth so called "broadband" cable ISPs offer.
The experience on symmetric fiber connections is noticeably improved, and we can have a house with a whole group of people streaming video up and down simultaneously without a hiccup. Such as in times of work from home and school from home.
Disclosure: I work on Google Cloud (but not Meet).
For the last item, personal accounts (only?) default to send and receive video at lower resolution (360p). So if you meant that the quality is lower, you can set it on both sides to 720p.
Edit: I don’t think Meet remembers those settings though, so you have to do it every time (and show your family members how to do so).
As a legacy free GApps user it is even more confusing because the admin page gives me an option to default to higher quality video but that doesn't do anything.
Why does Google, with all the resources at its disposal, choose to cheap out like this when competitors in the video chat space (from tiny startups to gigantic corporation of similar size) have offered near native resolution video chat for ages?
Meet certanly rolls out improvements for GSuite before public ones. I think there's even a GSuite setting of "release channel" where you can control how early you get these improvements.
I refuse to install Zoom. They have removed the dark pattern, and the "join via browser" option is almost immediately available. If you have it installed, now is a good time to uninstall it.
The example video clips in the post look nothing like me and my team's view when using the new feature. Most of the time half of our hair gets blurred or replaced and hand gestures will cause either our hands or head to disappear.
I can vouch for this. I haven’t really needed the background blur feature personally, but I’ve tried it and both myself, colleagues, and friends — pretty much everyone I’ve talked to that has used it — loathe Google Meet’s background blur, and prefer Zoom’s by far.
In my experience, it doesn’t completely cover the background most of the time, and if you move at all, as you point out, it can’t keep up.
Kind of funny to see Google engineering blogging about it when it feels extremely half baked.
This makes me sad, because in all other areas, I think Meet excels well beyond the competition.
"Half baked" misrepresents the difficulty this task. Yes, Zoom does it better, but it's _still_ an excellent and interesting engineering accomplishment.
I've always wondered what proportion of modern real-time video effects rely on ML vs. classical image processing; this not only answers that question, but provides details down to the level of model architecture and the final latency and IOU benchmarks.
Of course I'd be more interested to read how Zoom manages to do even better, but I'm not holding my breath for them to publish those details.
At least for background blur the latency there is enough to make it almost unusable: easily over 100ms. This is with latest stable Chrome on a relatively recent Ryzen/Nvidia system. Maybe background replacement will do better once it rolls down to regular Google Meet (too lazy to log into my Google <del>Apps</del> <del>Suite</del> Workspace) :-) However, everything else about Google Meet is great and I wish I could make all my Zoom friends switch.
I have a pretty modest machine and zoom wins hands-down. It also "just works." I've had trouble getting non-technical people on hangouts/meet/whatever they call it today. Zoom "just works," and they've been responsive to peoples' concerns.
You can use Zoom in the browser. They "just" discourage it by using a dark pattern. The link for the web client is small and gray and the browser tries to open the desktop app automatically.
You can also join by phone, at least in some circumstances.
Yeah, which makes it a pretty annoying barrier if you want to make an ad-hoc call to someone. Sending a Meet link is more convenient. Plus, Zoom is pretty crippled on the web in the feature department.
Meet seems to work better on poor connections, but it does it with a significantly more CPU intensive codec (VP9?). As a result, it only seems to work well if you have a powerful CPU. If you have a weak CPU, Zoom seems to work much better.
My only experience with Meet on a weak CPU is my daughter using it for remote learning on her school supplied Chromebook which uses a Mediatek processor from 2015 which has 2 A-53 and 2 A-72 cores. Meet performs fine on the platform.
It seems to have gotten a little better recently, but my experience matches yours. It really struggles when I wear over-ear headphones - they sort of phase in and out of existence.
The other thing I've noticed is the background blur absolutely annihilates my CPU. To the point where I would rather just turn off my camera if I don't want my background visible.
They have their example video clips, but they also provide data. They say that in their better model, They get an IoU of 93.8% This means 6.2% of pixels are misclassified. Either it's your hair getting cut off or the background is leaking through. 6.2% of an image is a fair bit considering your head is probably 30% of the frame.
I'm wondering why they didn't just use standard CV techniques like background subtraction? Does their technique work with a dynamic background as well?
I’ve done some work in this space - subtraction doesn’t perform well when other motion is present, whereas if you use pose / body detection you can ignore other bodies in view (i.e, the toddler running across the room).
Aside: Imagine you’re driving down the road and you need to make a right turn. Well, for some reason the steering wheel is stowed away and disappeared! You need to hover your hand around the center console in a specific area to be able to expose it. Out comes the steering wheel and now you can make a right turn.
Google UX/UI team: Please fucking make the mute/unmute button visible at all times.
Isn't this sort of a Fizz Buzz for a UX/UI design professional? I don't mean to demean anyone, but I see this sort of a thing literally everywhere. Hiding important and absolutely crucial information (that can make or break your product) in the name of minimalism. Coming out of a company that has one of the highest hiring bars for software engineering, and yet, their products have such an awful UX/UI. This isn't an exception, it is a pattern.
I worked as a freelance graphic artist/web designer once and while I wasn't bad at the job, I really hated one aspect of it:
Everybody and their kid thought they knew better than I did. When I said: "Yeah but this should really be visible, because accessibility", they would say: " But it looks better if..."
People in high paid position certainly want "has taste" and "knows what looks good" to be part of their self image. Many fails in design and architecture happen for that reason alone.
I then ended up programming and working in film sound, because very few people in both fields tell you what to do when they have no idea what's going on.
Ah ha, someone with the same experience as me. My degree is in Graphic Design, but I immediately ditched the idea of using it after university and took up programming instead because everyone has a fucking opinion when it comes to design.
Imagine a pointy-haired boss, or some rando in Marketing doing your code review (shudders) "That value is a trademarked name of our product - I mean variable - please capitalize it and add a (TM)" I'm glad I don't get noob oversight the way designers do.
I actually studied film and through my music experience I was always "the sound guy" programming was actually more like a hobby until it turned out I am actually not bad at it.
I did a fair amount of indie films and know sound guys, so the part I am confused about is: what are you programming in film sound? is it per-film, or like software for film sound in general?
Ironically forgetting that visual minimalism produced by hiding things isn’t really minimalism.
It would be like me throwing all my things in the garage and advertising my house as Spartan. No, it’s not, it’s a mess. The mess is just hidden until I need to do something.
"Hiding important and absolutely crucial information"
If we want to give awards for this my vote would go to Apple. I find their products to be horrific when it comes to completely undiscoverable features. iOS is bad on its own but the Apple TV is a total train wreck. I couldn't get rid of that thing with its awful interface and remote fast enough.
Exactly. Everybody does this. In anything using video, UI elements apparently need to be hidden as much as possible. In virtual meetings, Youtube, and it's often an option in games.
And sometimes it's great, because you get to focus on the content, and sometimes it's not, because you lose control. It's something that should be optional or configurable. It's great to have shortcuts for the most common commands (like space for pause in youtube), and I guess it would make a lot of sense if video conferencing tools also had such a shortcut for mute/unmute.
But again, give people more control over their UI. There are too many applications that mess this up one way or another.
But this has been a solved problem for ages.... just move your mouse a tiny bit, and all the controls are exposed, with large, visible buttons, help text, etc, click whatever you need to, and the controls slowly dissapear, revealing the video.
Having to find the exact spot to hover your mouse is a bad UX
Which also happens to be the shortcut for bookmarking webpages in most browsers... and Meet doesn't let you rebind this to something sane like spacebar.
This is true. I find Android UI so offensive that if I did not have iOS as an alternate I probably would carry a dumb phone and live like a monk. I can’t stand the miles of white space and brightly coloured tiny UI controls.
Evokes such a visceral reaction in me that even I am startled at times haha
As a developer I'm a huge fan of Google Cloud. But I'd actually think really hard about chosing them if I started by own business, as the customer service is both expensive and woeful.
More important than the button is the status indicator - I need to know if the call is muted or not. Even better, promote it to an OS-level icon/badge/overlay. If my mic is actively in use, please make it blindingly obvious.
And then I have to keep hovering over the icon and guessing from the tooltip whether I'm muted or not. Unfortunately, different software tends to be inconsistent with toggle buttons - sometimes the icon tells you what is, sometimes it tells you what will happen if you click it.
The only software that gets vide-co right is probably Discord
I used MS Teams and zoom and both are decent (ms teams works fine for school)
but it's insanely unbelievable that this kind of software lacks of features that gaming communities had probably 20 years ago
PUSH TO TALK is probably one of the most important features of any voice software. The lack of it is big WTF.
It gives you 100% control over when you're talking and you don't have to alt-tab between programs in order to "mute" yourself.
You can bind it to e.g MOUSE3 (scroll-push) and it works fine with other programs, games and stuff. Switching between muted/unmuted is different thing.
From somebody who uses/used ventrilo, mumble, teamspeak and nowadays discord for like last 12 years for hours per day, almost everyday.
For push to talk to work, you need to have access to keys even when you're not in focus.
That's not something doable today on the web for obvious security reasons, but it's possible for Discord that has a separate app, would be doable for Zoom too I guess.
Interesting sidenote: PTT works fairly well on mobile. I'm in a lot of meetings where folks are using their computers for video and "dial in" for audio on mobile so that they can continue working and then PTT on the phone which is now functionally a giant dedicated button for speaking.
It’s even worse on touch devices. You have to touch the bottom screen to get the controls to appear. Accident touch twice in the wrong location and you can hang up.
I've often thought that on a touch screen device the OS should ignore touches on buttons/popups that have been on screen for less time than a human could reasonably have observed it and chosen to interact with it. If I touch the screen 0.05 seconds after a button appears, I was probably _not_ aiming for that button.
In fact, now I think about it, this has happened many times over the years with traditional mouse drive interfaces too.
I'm sure some power users would like to shorten the 'reaction time delay' or even remove it entirely so I guess that should be an option as well.
Honestly with mouse driven interfaces the rule should be that whatever is popping up on screen absolutely cannot put an interactable control under the current mouse cursor location, and no control should have control focus by default.
There's nothing quite like watching a dialog box go flying by because you hit enter at the exact moment it popped up. What was it? What did it do? We'll never know!
My "favorite" instance of that, many years ago, was when the dialog turned out to have said, "Reboot the computer immediately because IT has installed new software." I filed a ticket on that one.
The mute/unmute changes position and can be hidden in a top bar that slides out.
In some fullscreen situations there is no button to get out of fullscreen. Sometimes double-click works, sometimes it doesn't. Recently I could not even alt-tab away, basically my computer got 'locked' by zoom.
I imagine most know this by now but the space bar works as a push to talk button in Zoom (as long as it has focus of course).
I really think there is a market for a physical video conference controller. If I could get a hefty slab of something with quality buttons to enable/disable video, push to talk/mute/unmute, bring to foreground, ‘on air’ light and end call, I’d easily pay $100 for it.
These exist, e.g. the Elgato Stream Deck. It's basically a keypad with x buttons (there are various versions) that each are small lcd displays that you can program to show and do anything you want (so you can make it do the 'on air' display thing you mention). Its main use case is for streamers to switch between scenes in their streaming software, but I use it for video conferencing (with OBS's virtual camera) to switch between full-screen camera view and desktop sharing, and do stuff like mute/unmute etc.
Is it possible to use it to control a Zoom session without virtualizing the audio/video input devices? Discord has a local API for that but I haven't found a way to control Zoom calls from another app.
Not sure what you mean by 'control a zoom session', but yes I use it with Zoom. I use OBS to composite video and some audio, I use the OBS virtual camera as the camera device in Zoom, for audio I usually use the straigh microphone stream because it's fiddly to set up (you have to do the mixing outside OBS because OBS doesn't have a virtual audio device).
If you mean that you just want to mute/unmute a zoom session, then also yes - you configure the stream deck to output key press events so you'd program it to output the keyboard shortcuts that you want. Not sure if Zoom has separate mute/unmute shortcuts and if you change settings with the regular keyboard/mouse you might get the display state of the stream deck out of sync with the actual state of the software, that would probably be finicky and/or a lot of work to solve.
I'm still tweaking my setup but using this piece of kit with a good quality webcam, a Blue Yeti mic on an arm, and OBS, being able to control Zoom/MS Teams/Skype in a uniform way, having ultimate control over what part the desktop I share, how I pre-process audio, being able to show my desktop with myself in the corner, ... is already so much better than the clunky default experiences of each of these video conferencing tools. It's like programming with vim - yes I spend an inordinate amount of time 20+ years ago getting proficient with it, but using it just feels like an extension of my brain, like using a Hilti drill hammer vs using a bargain bin Chinese piece of junk.
Thanks for the explanation! Sorry, I got distracted and forgot to write a reply. I don't need the full range of features offered by OBS yet, but I'm strongly considering setting it up just to have control over the video stream. I'm using the Zoom (hah) portable recorder for audio since it offers outstanding audio quality, convenient mic controls and basic signal processing. The problem with controlling apps via keystrokes is exactly what you describe: since the communication is one way, the state of the toggle buttons inevitably gets out of sync. I think maybe using the accessibility API to read the UI state back can help, but I'm not holding my breath.
Yes! I worked at Crittenden Lane about five years ago and really liked the hardware at the time. The whole thing was eye-opening for me, how seamlessly I could meet with folks whether they were in Zurich or on the second floor...I imagine it has only got better since then.
Zoom does this well on the iOS app. They call it "safe driving mode" [1] and half your screen essentially becomes the must/unmute button. You can either tap it or swipe left to unmute.
And stop telling me I'm using an input different than the output. I have a condenser microphone on an audio interface with RTX Voice; no, it's not going to transmit an echo.
To be fair a lot of sites do need it, especially for more power user level UX. See BetterTTV, RES, etc. Sites generally don't target power users, understandably.
The British PM just had to tell a major media journalist to unmute during the press conference introducing the new quasi-lockdown, so I think we can safely say that Mute button and status is no longer a power-user feature ;)
Doesn't Meet, like most other app, have a message when you try to speak muted? Though they should maybe make it more obvious. I do agree that mute button isn't power user.
What would be the logic behind “deaf”? That “mute” is a homonym/polyseme of a word for a disability, so let’s just use the first letter of any disability?
Zoom at least uses Cmd+Shift-A for Audio and V for Video
But as the recent Google Icon kerfuffle, UI/UX is not their strength (probably because of opinionated technical people that think you need to A/B shades of blue)
Teams uses something equally silly, like Ctrl+Shift+M for mute/unmute, IIRC.
Which is pretty annoying, because the mute button is about the most important button in a videoconferencing tool, and I want to have it under a single keypress, so it can be used effortlessly, with my left hand (the same that operates Alt+Tab, while my right hand is on the mouse, scrolling ... well, meeting agenda, let's say).
I'd fix that for myself with AutoHotkey, but I can't, because Teams is just another Electron app, so I can't just look at which UI component has the focus to create a rule, "if focused on Teams video call and not its chat, rebind M to Ctrl+Shift+M".
One of the countless reasons I hate it when people do custom UI, instead of using OS-provided controls.
Speaking of mute/unmute I've not yet found a way to get Google Hangouts (same thing as Meet?) to play nice in situations where simultaneous interpretation is involved. Our company works in Japanese and English and we typically have a second meeting running in parallel for interpretation. This setup almost works, I say almost because I've yet to find a way of muting the audio in one meeting so I can properly listen to the other. I can't leave the first meeting either because often I'll also want to see the presentation slides. Currently I'm working around this by muting my MacBook and joining the second meeting on my phone.
Perhaps I'm missing something obvious (or a Chrome plugin that will allow me to mute based on the page URL rather than site). In the unlikely event that a Googler is reading this I'm not asking for yet another product or complicated new piece of functionality aimed at this specific use case. Just a mute button for audio. Thanks!
Wow that's strange. FWIW Firefox does not do the same domain-level blocking, only tab-level blocking. And as far as I know, Hangouts still works in Firefox.
A major motivation why I got a StreamDeck was to be able to put a big fat mute button that "physically" kills the microhone level at the source.
It renders a big cross through the microphone when muted.
Simple, yet insanely effective UI (#).
Best thing ever.
#) Especially when compared to the mess that is Google Meet. My favourite "feature" of theirs is how when someone is presenting, it's impossible to view the presentation as just another stream - no they have to make it dominate everything, meaning it's so hard to see the other team members.
And it can be extremely hard to see who's talking when viewing a lot of cameras at the same time. And for whatever reason the quality turns to a blurry mess a far cry from 720p just way too often. (I have fibre internet).
When did you recently use Meet? I just used it yesterday with a gaming session with friends and the console for the mute / unmute was visible at all times. I even just tried it right now.
While you're at it, always display a vu-meter. It gives feedback on what is transmitted and thus can alert a user whether they are being heard or not. It's the most basic of sound recording tools, and was a standard part of recording equipment for over half a century for good reason.
And if you need minimalism, offer a toggle for that. But I think most people should have it forced on them, would save anyone a lot of trouble -- just think about all the aggregate time lost talking into a muted mike by all users.
We are in the era of three seashells. There is no turning back from this. Soon you won't be able to find the power button for anything tech industry related.
Which is so odd as CTRL-D is also the bookmark shortcut in Google Chrome. So, say for example, my team has a goto channel where we have our ad-hoc meetings. It's a pain to bookmark it for later use without jumping through the gui.
MS Teams has finally changed this on their video calls. Ah the hours I spent telling colleagues 'If you move your mouse around, you should see a black bar appear somewhere near the middle llof the screen'.
Happy to see ML become mainstream. In the future, I don't think ML will be a separate field of programming. It'll just be "programming," the same way webdev is.
There's a tendency to think of ML as "not programming," or something other than just plain programming. But as the tooling matures, that'll go away.
(Lisp used to be considered "AI programming," till it became useful in many other contexts.)
ML will become a library. It has about as much to do with programming as a compiler. You don't need to know what it does, you just need to know how to make it do things. The problem with ML currently is that nobody really knows how to do things and that you have a million parameters that need tuning and most algorithms need continuous improvement and fine tuning to the use case. There is nothing "mainstream" about ML at this point, except that everyone wants to use it.
In maybe a decade, it might be found in standard libraries of programming languages and on top of things like `Math.abs`, we will have `ML.textToSpeech("Hello world")`, or `ML.isCat(image)`, etc. However, the problem I see with that is that no matter how far we wind the clock forward, we will only be able to put the most simplistic use cases into a library. `ML.isCat()` could be one of those, since most humans will be able to image categorization, it stands to reason that you could put this into a library. However, most industry application involved highly customized ML algorithms that are optimized for a very specific use-case. So there will always be a need for a research team in big companies at least. Maybe smaller companies will try to build their stuff by chaining libraries together.
There's never going to be a `ML.isCat(image)`, just like there isn't a `Math.solveProblem(hypothesis)`. Yes you do have `Math.abs` and you're going to have stuff `model.fit()` and `layers.dense()` - but something like `ML.isCat` is too specific to be used in a library
Disagree. In the future, that'll be `npm install ml-cat` followed by `MLCat = require('ml-cat'); MLCat.isCat(image)`
It might not be npm, but something like that is probably inevitable.
The reason it seems so unlikely is because the tooling isn't there yet. No one even agrees how ML code should look, let alone how libs should be distributed to end users. But I saw the transformation for JS in 2008.
the range of problems you can solve with ML/AI is simply too wide for there to be fully-canned solutions for everything. Sure, there will be canned solutions for _some_ things - maybe even for cat detection, because it's fun so why not.
But, a library that uses AI to optimize the production of your business' flux capacitors? Ain't gonna happen, you need to build that yourself. To have a library/product that solves problems using AI, you need a "language" to describe the problem (like you can e.g. use SQL to describe any data query you may have). But describing problems is notoriously hard - accurately & precisely describing the problem is very often just as hard as solving it.
Mm, it's a bit like arguing that "the range of text editor customization is simply too wide for there to be fully-canned solutions for everything." Meanwhile, elisp wiki go brr.
I think ML solutions will increasingly take the form of an elisp script rather than a python library, but it'll take a little while to get there.
> it's a bit like arguing that "the range of text editor customization is simply too wide for there to be fully-canned solutions for everything."
But the the range of editor customization really isn't that wide. That's exactly what I'm arguing, that ML/AI is more like "math" than like "editor customization".
Fwiw macs have had an equivalent functionality for both text to speech and speech to text for at least 17 years to my memory. The quality is poor compared to today's server-driven approaches, of course, but the functionality has been there if you're willing to articulate yourself clearly.
AI is learning existing patterns from input/outputs.
Programming is setting up patterns to turn your inputs into desired outputs. Most often it's just plumbing data around with some transformations.
What you're talking about is using AI as programming tools. It's still programming, but using pre-trained models as part of the plumbing.
We used to use Jitsi Meet and it worked perfectly for our team meetings, but we kept having issues with 10+ participants, overseas meetings with 100ms+ latency, and whenever Firefox was used. YMMV, one year ago.
My team does weekly Google Meet meetings along with WebEx. The biggest complaint I'd have is that Meet sacrifices functionality for cleanliness; everything useful is hidden behind some menu or popover, and you can only open one popover at a time (otherwise whatever you had open closes). This contrasts widely with WebEx, where most things (participants, controls, chat) can be shown at the same time, but also hidden if you don't want to see them. Meet seems complicated in comparison because views that are 0 clicks away on WebEx require 1-2 in Meet.
Basically what other comments suggested. The popup menu that shows every time I move with the cursor and covers part of the screen. Can't be hidden on demand. Shows status I can't see without opening it (and covering part of the screen). Can't change my mute status without opening it (and covering part of the screen) or using a very obscure shortcut.
I am confused. The microphone button is on the bottom bar, is clearly available at all times, and can always be clicked-on.
You are using Google Meet within a browser?
I am going to admit that Nvidia Broadcast looks absolutely amazing to me. It's likely to be the reason why my next GPU won't be AMD's new, even though it appears to deliver much more bang for the buck.
I already have RTX Voice now and it's the best thing ever.
No, because tech people want software that works, has good UX etc. This is a PR piece for people that prefer software that has cutsie little backgrounds.
I thought the whole point of having a video call is to see who you are talking to, and their environment to further enhance the effectiveness of the conversation.
If you are in your kitchen, or under a tree, I definitely would like to see that because that environment will have an effect on how we communicate.
Sometimes people may not be comfortable sharing their backgrounds, and may not have convenient alternatives. For example, if you have a bed in the background it can be awkward and you might want to blur that out.
I don't bother, but then I live in my own home and my background is an empty study.
I have coworkers who are in house shares with 5 other adults all trying to work from home around tiny desks. Background blur for them is a nice way to hide some of the chaos of their living arrangements.
If the apartment is a mess in general. Table full of empty cans of beer. A dildo on a chair. Your wife randomly walking by in her underwear (not sure whether this would be unblurred?).
In the above scenarios, if I'm not certain there aren't going to be ackward things in behind me, I'd want to blur or set a custom background. Back against a wall also works which is what a lot of people seem to be doing.
Why not just turn off your camera? The blurring tech doesn't seem nearly reliable enough for me to trust it if my "office" was that much of a catastrophe.
Yeah this should be obvious. I think video calls are a waste of time the majority of the time but one legitimate use case is where there's an issue which doesn't seem to be easily resolved using written media. In this case it's useful to have a video call where you can gauge someone's reaction to specific things you say. That way you might be able to get to the gist of where the miscommunication is happening. A dildo in the background doesn't add to this (although a bunch of empty vodka bottles might gives some clues), while seeing a person's reactions to statements/questions might.
> In the current version, model inference is executed on the client’s CPU for low power consumption and widest device coverage.
Naively I would think model inference done server side would have the lower CPU power (from the client point of view) and widest device coverage (client does nothing more), what am I missing ?
It is done on the CPU instead of the GPU. GPU would seem like the natural choice for a convolution heavy model but was not used here for the mentioned reasons.
Some work needs to happen locally to show you a preview of what you're going to transmit, as it should for most video related work.
If the segmentation is done server-side, then you need to sync it to the sender and reflect that quickly in the preview. It's probably not a great experience, at least for a launch.
I wish my coworkers would stop using background blur.
It sucks and it’s distracting.
Your hair and hands pop in and out of blur. Sometimes part of your face will blur.
I don’t care if your workspace is messy or your kid walks in the room. I do care that we’re all being distracted by your weirdly blurred hair and hands.
Your co-workers have a reasonable expectation of privacy regarding their home life and family members.
Given that many had to start WfH with short notice meaning they couldn't relocate to circumstances enabling a dedicated home office space blurry hair and hands are a very reasonable compromise.
> Your co-workers have a reasonable expectation of privacy regarding their home life and family members.
I think you are overthinking it. I've seen people use it when it provides no real material benefit other than the placebo effect on the user to believe that the blur makes other people focus on their face.
Yeah, this is why I use the background blur. I have my wife's and my hobbystuff behind me. Can't really align the video/pc setup any other way.
Rather provide a blur than a confusion of guitars, sewing kit and such.
Is it really that hard to set up a greenscreen for this? I can look out accross the street and see a number of people who have done this in their tiny apartments. If I cared about people being able to see the room behind my WFH set up I would do it too. Thankfully for me it just points at the wall I use as a projector screen so there's nothing to see. Plus my team seems to have just given up on video anyway.
I find background blur even more distracting than background replacement. It's like my mind tries to picture the person that I am seeing in a particular environment and blur makes that process messy.
But that's not always true tho, I have seen background replacement all over people's face (and yes, I seem to be the only one who thinks that's wrong).
I don't think anyone is being distracted by blurred hair or hands. If your coworkers don't feel comfortable even turning on the camera, it shouldn't matter to you. Aside from edge cases like a modelling agency looking for fresh faces, you have zero right to demand how people choose to potray themselves in a VC call.
I have no problem with people choosing to leave their cameras off (I rarely turn my own camera on in meetings). I still think the complaint about poorly implemented background blur/background replacement is at least partially valid. It is very distracting to me compared to either a raw camera or no camera at all.
> Can we get a mute button visible at all times before 2024?
Is it just me or is the button visible at all times? I could see the button visible on the bottom of the screen at all times I used meet during a session with friends. I even tried it right now to make sure.
They mention SIMD support, but It's unclear to me in what capacity the GPU is leveraged. The hair segmentation example on the MediaPipe webpage suggests it's evaluating the graph on the GPU though.
The "Rendering Effects" section describes it in some detail: "Once segmentation is complete, we use OpenGL shaders for video processing and effect rendering" and some info on what that covers. (OpenGL parts runs on GPU)
It would be nice if there was a webcam on the market that took actual lenses so you could get free, legit depth of field. Paying $700 for a used DSLR that has a clean hdmi out is not appealing, especially when I have a mirrorless from the same company that could probably do the same with a firmware update (that will never come)
I think a cheaper solution would probably just be a depth sensing camera. Even a developer targeted Intel RealSense kit is only like $150. Consumer hardware could be much cheaper I imagine.
Once you have depth information integrated with a camera, then it should be pretty trivial to do background removal.
Whereas a 35mm f1.8 from Nikon is like $200 and whatever you mount it to is still going to need to do auto focusing and a bunch of other camera-y stuff to make it accessible to non photo geeks and then you’re going to need an off camera microphone so the entire call isn’t listening to your autofocus motor and...
Meet is business oriented and offers features that Hangouts does not, e.g. dialing in via phone. It also requires a G Suite account (or did before COVID, IIRC).
Here's a tip: take a picture of your real, actual background from the POV of your webcam, and set that as your meeting background.
Advantages: it looks natural, it covers whatever is going on behind you (in case you are not alone and people walks by, or if your living room is messy), and it blends better than fake backgrounds (because it's the same image behind it). I have a picture of my office that I use both at home and at my real office, and most people can't tell. And since I took the picture with my phone which has better resolution, my video feed looks better for cheap.
The single biggest missing feature compared to Zoom for my team is background noise cancellation. It's an unfortunate decision to limit it to Enterprise users.
I was going to point out that xnnpack was basically created by a single guy who also created qnnpack, and how amazing it is for the work of a single guy to have so much impact, then I realized he posted it! Congratz dude!
As in, the blurred background looks totally different (light:dark, shapes, etc.) to the unblurred background.
(I get that they’d need to do something funky to show blurred and unblurred backgrounds with the same foreground video, and faking it is likely easier than doing it programmatically, but this is just odd/sloppy.)
If you have a Windows computer with a RTX graphic card, you can use nvidia broadcast to get similar perks. It creates a virtual camera that you can select in whatever conference apps/browsers you are using.
There are some works on OBS to get the green screen AI working, so I hope we will get that on GNU/Linux one day.
The listed CPU usage / elapsed time for the features in this article is obscene. Only 62FPS = maxing out at least one core on a 60hz display, just to replace/blur a background. Kiss your laptop's battery goodbye. How is this worth it?
Why isn't Mediapipe built on gstreamer? Nvidia gets this right. If you're slinging frame buffers around, use an API that there is already an ecosystem for.
A few people commented that the foreground/background detection cannot keep up with movements fast enough. Here's an idea that might help, although I'm not sure if it can realistically be done:
When the video is encoded, the codec does motion estimation (among other things) to reduce the bandwidth required. So why don't we use the motion vectors from the video codec to modify the foreground/background mask in real time? Obviously this is going to create weird artifacts pretty soon, but it might just be good enough for a few frames before the ML model produces another accurate mask.
I have observed in the last couple months that whenever I create a Google Calendar invite with others, Google has started inserting a Google Meet conference as the location to meet.
It was one thing to ask/offer this as an option if you'd like to use it, but now Google is positioning it as if you had chosen that. So if you left it empty, because you usually use some other understood method with your friends/colleagues, now your participants are confused and think you wanted to use Google Meet.
I think that's going too far to get people to adopt your product.
I noticed this too, but I actually got a tooltip popup that notes this is something that can be disabled in the calendar settings. The specific checkbox is "Automatically add Google Meet video conferences to events I create"
Disclaimer: I work at Google but not on these products.
Edit: it seems the tooltip only appears the first time you try to add Meet. After that it doesn't appear and you have to go into settings.
Automatic door sensors are in my experience universally infra red. In fact I don't think I've ever seen any camera technology used in that context. Are you saying they used cameras to open the doors?
It's video conferencing software. Makes sense that they might put together imagery which might suggest people meeting from different corners of the planet. But sure, I get your point. I didn't notice this myself, but I have been living on side of the planet opposite from where I was born for the past decade.
I understand the feeling somewhat: it is very noticeable how TV shows for example now have expanded the default cast from "all white people + token black, maybe a gay individual" to "all white people + token black woman + token asian + token non-binary, maybe a transgender individual".
It feels forced and ham-fisted, but I don't see how this could be made in a better way.
Possibly people growing up with this will not notice it at all, and it will be good for them, it's only us old farts that need to adapt.
I think the issue is that, were it not for everyone being forced to notice, everyone was defaulting to caring so little that we just used the people around us or who we had existing connections with (for demos, training data, employees... whatever), and everything was horribly biased due to numerous reasons. So, while before you weren't being forced to pay attention, everything was more racist than now, where you are being forced to actually make an effort to be anti-racist.
To put this into a programming metaphor, to me this is like being triggered by someone going out of their way to add a buffer overflow check due to a bunch of people spending the last decade screaming about buffer overflow security issues. Sure, before you didn't have to notice, but your code was probably also horribly insecure; now, everyone gets angry if you don't take at least minimal precautions, and people are even advocating that you use more secure languages from the start (to the point of questioning your architectural decisions if you don't), so you are being forced to pay attention--and sure, it seems a bit annoying and like extra work that a bunch of fanatics are foisting on you, and if someone had taken the time for their code to be secure before you might not have noticed (I mean, you weren't ever against security) but now it is screaming at you "this is because of those annoying security people rubbing our noses in our buffer overflows" so you get angry because this is taking time away from "getting the real work done" on your product--but the reality is that your code used to have glaring security issues that affected people who weren't you, and it sucked; everyone is better off for you paying attention now, and maybe one day we can fix the systemic problem and no one will have to put in such obvious effort to avoid being part of the problem, but we clearly aren't there yet. Being angry about this just comes off as not giving a shit about security: you didn't notice these checks before, because the code was just as good to you, for the criteria you were bothering to pay attention to, without the checks as it would have been with them, and the whole point is that that wasn't true... it was actually much worse; similarly, being angry about people making active efforts to have diversity in product advertising frankly just makes you come off as not giving a shit about minority representation :/.
It’s funny how Google pours time into things like this but the last person I know who uses a Google chat product just stopped because it’s less reliable than Zoom. Losing 15 minutes with someone trying to get the sound working counts more than a gimmick many people never notice, not to mention now even normal people don’t want to yet install another app because they expect it to be cancelled soon.
This is only true to the extent that the IT department has complete control and is insulated from user opinion. That tends to result in choices like WebEx or the ever popular shadow IT option. This is especially true now that everyone is doing this and the odds approach certainty that if someone is having problems they’ll suggest switching to a different product they know works.
Given the number of IT people I’ve heard express concerns about UI quality and eventual cancellation even for enterprise purchases, it’s also far from a given that the IT department is just blindly pushing a product.
It's only 720p and around 15fps but real shallow dof, very little sensor noise, autofocus works. Well worth trying if you have a Sony camera from the last few years.
Sensor size and good optics still wins. Having said that,the effort and detail gone into this feature is very impressive, enjoyed the blog post. Also webassembly SIMD looks super cool, looking forward to a new class of webapps using wasm.