I’m Cody, one of the founders of WakaSaba. Traditional video conferencing platforms like Zoom assume that you’re right in front of your device. We created WakaSaba for “distance first” video conferencing and have built novel ways to interact with your room and control your device from afar:
- Hand Tracking: Turn your hands into a mouse and use it to point, select, and click. Supports basic room controls like muting / unmuting yourself and more advanced features like controlling a countdown timer that’s shown to everyone in the room
- Gesture Recognition: React to anything that happens with your hands. We capture certain hand gestures and broadcast them to everyone in a fun, engaging, and minimally(?) intrusive way
- Phone Pairing: Scan a QR code to turn your phone into a remote with access to additional controls and interactive features
Another core feature of our platform is that it’s all in the browser. We explicitly designed WakaSaba so that users wouldn’t need to download additional software or install any apps.
We originally built this for online fitness instructors, but recently have been thinking about what other industries and use cases might benefit from our technology. As a result, we’ve reoriented our landing page to focus less on “why” you might use our platform and more on “what” you can do with it. Now we want to get it in front of people to collect thoughts and general feedback.
Instructions if you want to try it out:
1) There’s a `Try a Free Room` button on our homepage
2) Clicking it will create a room that you can join / invite anyone to for 40 minutes, no email or sign up required
3) Before joining the room, there are some tutorial screens that go over our core feature set
Sorry in advance for any rough edges! We’ve deliberately prioritized getting this in front of people above all else :)
If you encounter any issues or have any questions / feedback I’m all ears! I should be around for the next couple hours.
Can you provide integrations with other videoconferencing suites? This would be great when I'm stuck reserving a huge conference room and can't reach the controls from where I'm sitting, but there's no way in heck I'm convincing my employer to switch from $EnterpriseVideoconferencingSuite to your startup's offering. However, since said enterprise videoconferencing suite does have a webapp, said webapp is happy to communicate with browser extensions, and you're doing everything with image recognition, I bet you could do most of this stuff on my laptop so I could use your UI without having to convince my employer to switch to your servers/software/codecs/security/auditing.
To clarify, are you describing something to the effect of: [whatever WakaSaba is doing] => Browser Extension => Webapp version of $EnterpriseVideoconferencingSuite?
This is not something we have explored. If you had to hazard a guess, what $EnterpriseVideoconferencingSuite would be most amenable to this sort of work?
Pretty much, yes. WakaSaba's major offering - at least, that's what the headline and demo video focused on - seems to be a new control toolkit for video meetings, and I feel like it should be possible to get a good chunk of that without doing all the hard to work to develop, host, and maintain your own videoconferencing suite. Like, say I wanted to advance my slideshow without walking over to the keyboard on the podium; I could build a fancy smart remote that talks to the cloud with its own GSM radio and interfaces with a novel presentation suite specifically designed to integrate with my remote, or I could make my remote a bluetooth keyboard that can only press the left and right arrow keys and then keep PowerPoint focused.
I don't know what enterprise conferencing suite would be easiest to talk to. It'd be straightforward if I wanted to use a local video stream, proxy `navigator.mediaDevices.getUserMedia` and generate synthetic click/keyboard events as necessary [1] or make like the remote and pretend to be a keyboard, but the really interesting thing would be to run WakaSaba on a second device and use it to control my "primary" presence. I took a quick spin through some different videoconferencing suites to see what looked doable:
FaceTime: No API, no client SDK, no browser client, dead end.
Google Meet: No API, no client SDK, but a web client. If they're using html5 video elements (which I haven't checked) you could probably intercept the streams, correlate them with the user information displayed on each card, and then associate them, but without an API or client SDK you wouldn't be able to do anything on the first device from the second device. Probably the easiest PoC for a first-device demo (that is, run the hypothetical WabaSaba extension on the same machine you're controlling), given that it's a webapp first and there're a ton of extensions that augment/tweak it to copy code from.
Discord: Very nice API - you can even implement audio clients! - but the video API isn't documented and looks like a hand-rolled solution in Elixir [2]. They do have a browser client, though, so if they're using HTML5 video elements and you can capture the source, might be your best bet. That said, I'm not sure the API lets you remote-control a second session. If you can capture the video stream this looks like your second-best bet for a first-device PoC.
Zoom: Like Discord - APIs and client SDKs that look like they'd permit remote-control, but can't see any support for video so you'd have to intercept video from their webapps.
Oh wow. Okay. I found a product that wants to do half of this - otter.ai does live transcription of meeting audio - and they seem to have given up on it entirely and just abused the analog loophole. Sooooo maybe the only way to get remote-control from a second device would be to reach out to a videoconferencing provider and ask for a privileged integration like otter seems to have gotten with Zoom, lol.
I love this, but I think it's one of those ideas, where people dismiss it as impractical at the time, but then it comes back in force a couple of years down the line.
IF video conferencing is here to stay, then we would expect to see more larger screen consumer[1] devices appear with it built in. This has started to happen - Facebook has Portal and Amazon have Echo Show. Scale that up to a on-wall TV of 50"+ and you've got your room-size video conferencing setup, at which point gesture control will be a very nice add-on.
Right now though, people are still wrestling with how to do video calls, whist at their desks or with laptops on their laps.
---
[1] Yes I know, that corporations have 'video conferencing ROOMS' with massive screens and multiple cameras but those are mega bucks, and not deployed en-mass.
Yes! The sudden increase of webcam usage is one of the things that prompted us to think about opportunities around this type of interaction. Sometimes I wish I didn't have to talk to my Nest Hub in order to make it do something.
Hello, I think this idea is fantastic. Since you said you've prioritized getting this in front of people - please make sure you're talking to folks who are doing remote fitness, yoga, and dance classes.
I know several folks who would love something like this with a different set of gestures and end-results. Especially smaller operations who have switched from in-person to live during the pandemic. The bigger shops have space for a dedicated operator, but the smaller ones are just the instructors and their students in a zoom call.
I think the key is going to be in arriving at the right set of gestures for the right audience. e.g. yoga instructores are probably looking for ways to minimize/maximize certain screens. Aerobics instructors are looking to be able to mute and unmute and send directed feedback. Both types are often struggling to figure out lighting and how to position their cameras. On all sides I think something simple like capturing the entire class clapping along is hard to do. I think something like this would be perfect for that market.
> I know several folks who would love something like this
We’d love to talk to them! If you have any contacts who’d feel comfortable talking to us and influencing where we go from here, I’d love to hear from you / them at cody@wakasaba.com
> I think the key is going to be in arriving at the right set of gestures for the right audience...
This makes sense. We’ve mostly been working with Pilates instructors, but this is a good reminder to talk to people running all types of remote fitness classes.
I love the execution. The business idea? Not so much. I just don't think there's a big market for video conferencing from 6ft away.
My .2: find a way to generalize the applications. It's common to start a business focusing on a specific niche and then expand, but I have a feeling this would work better as something that can integrate into many different other applications or workflows rather than a standalone product.
> don't think there's a big market for video conferencing from 6ft away.
Why on earth not? I've been in plenty of video conferences over the years where plenty of the participants were more than 2 m away from the camera and screen. So much so that I would say that in my experience it was the most common way to do it in the offices where I have worked.
Not everyone is using their mobile phone for conferencing you know.
Is this a product or a feature though? Seems very much a feature in a VC software product to me.
No moat; Google/Zoom will just build this if it’s popular and then you are out of business. Unless you can somehow stay ahead of them with sophistication of the UI?
I think plenty of people would like this feature, I just can’t see many people paying for it.
>> don't think there's a big market for video conferencing from 6ft away.
There's a chicken egg problem. Other than TVs, we don't really have devices that we use from 6 feet away. These ideas tend to come to fruition when either (a) you catch an egg in the act of hatching or (b) laying. The rest of the time, novel media usually lives in video games or porn.
One idea might be to make this a game, or game related. Playing poker or something, even against friends on videoconference, from a few feet away could be quite nice. I'm sure there are porn ideas too.
Reminds me of a UI designer I worked with once who had an idea for controlling phone audio playback with gestures while driving (with the phone in a cradle). There have to be lots of other neat applications.
We're not too familiar with toastmasters (other than hearing that it's a great way to learn public speaking). Would you find this useful in the context of in-person or remote meetings or both?
And what would it be used for? Moving slides? Displaying / hiding the agenda? Sending lots of clap emojis?
Public speaking in general is best done standing but due to the pandemic, most toastmasters clubs end up using zoom for virtual meetings.
While many speakers do stand to deliver their speeches, it gets difficult to interact with the audience in a meaningful way when zoom is built around the expectation that people will be a foot away from computer camera.
Moving slides is a big thing that people can’t do today if they are standing to give a speech.
Your tech can help with Emojis and reactions.
Starting and stopping the timer is another useful feature that Toastmasters will use.
I think the idea is super cool, and the execution works very well ! Definitely much better than what I anticipated.
The first thing that jumped to my mind is not that I can use it from 6ft away, but mostly that it can be used to control things even if I don't have the conf call window I'm in in the foreground (which happens to me quite a lot).
Some ideas I had while playing with it :
- Some sound feedback would be nice, for example a little bip when switching from value to value in a menu : in the context of not having the call in the foreground, it allows to make sure that the gesture is registered.
- Swipe gestures could be a nice addition, for example a swipe up or down to mute/unmute the mic
- I'm overall less convinced by the "control by phone" flow
- It's surprising for the welcome video to be without sound : I initially thought my headphones where not connected or something. Overall, I think the welcome video present the functionalities very well and goes straight to the point, which is very appreciable ! But it feels a bit rough around the edges.
- A bit of smoothing when selecting would make the experience a bit better : I don't want to know that my fingers are shaking this much !
But in any case, congratulation on the product, it's really really cool !
> it can be used to control things even if I don't have the conf call window I'm in in the foreground
Just to clarify, by “foreground” do you mean:
1) a split screen scenario where the video conferencing window is visible but not in-focus? or
2) the video conference window is under another window / not the active tab and therefore not visible?
(1) is actually something I never registered that actually makes our platform valuable even for traditional conferencing so thanks for concretizing that!
> I'm overall less convinced by the "control by phone" flow
This is fascinating. We actually built the phone pairing feature because of some earlier feedback that the hand tracking features would be too difficult for average consumers to use. Could you possibly elaborate on why you’re less convinced by the phone pairing feature? As an engineer at heart, I’m looking for any reason to keep iterating on the cool hand tracking features :)
All your other ideas are great! Will definitely add them to our todo list
I've been thinking about doing something like this forever, but never taken the time to sit down and do it. (And probably couldn't have done a particularly good job of it anyway.) My main usage scenarios are:
- using my hand as a mouse in space-constrained situations where I can't have a physical mouse
- dance class, so the instructor can seek through music, especially for rewinding to the same starting spot over and over again. (This is from the days of in-person classes; it drove me nuts that the instructor had to keep walking back to his phone and then getting it a few beats off)
- spellcasting game
For many uses, I'd be really concerned with false positives. I don't want to have to pin my arms to my sides to avoid triggering something. I'd also want it to be pretty robust to different angles, so I don't have to get my hand exactly parallel to the camera. (Seeing yourself is pretty good for smoothing this over, since at least you'll notice when you're not pointing in quite the right direction. But some applications would be much nicer if they were usable without seeing yourself.)
Overall, this feels more like a tech demo than an application. Sadly, you'll probably have to pick some niche to focus on, because I think the tradeoffs are going to be very different. And dance teachers and yoga instructors don't have much money. (WakaSaba also doesn't exactly suggest the functionality to this English-speaker.)
> - using my hand as a mouse in space-constrained situations where I can't have a physical mouse
Are you thinking that you would consider using something like this to interact with your entire computer generally (i.e. instead of solely in the context of a video streaming like shown in the demo)?
> - dance class, so the instructor can seek through music, especially for rewinding to the same starting spot over and over again. (This is from the days of in-person classes; it drove me nuts that the instructor had to keep walking back to his phone and then getting it a few beats off)
This is really validating feedback and affirms a lot of our experience in these types of classes (especially remote). Requiring the instructor to walk towards the device to basically do anything really messes with the flow of the class. And requiring the learners to walk towards the device to do anything basically reduces the social aspects of the class to zero.
We actually do have a built out music control system that we took out for the purposes of this demo.
We also have voice controls that can be used to control the music playing (among other things).
> - spellcasting game
This prompted a considerable amount of discussion during our most recent sync.
If you're available / interested, can you reach out to vince@wakasaba.com, we'd love to chat some more.
Now the real challenge would be to make it work from six feet under. ;)
I'm joking, but I wonder if I'm the only one who made that connection. Otherwise a different tag line might work better.
Apart from that I agree with the other posts regarding the business idea and choice of wardrobe in the demo.
I really like the idea of using your phone as a remote, I think this kind of I/O is generally underutilized. Personally I would explore that space some more, even though it's not as flashy as the gesture recognition.
Ten feet seemed too ambitious and five was too many letters =). We didn't think about the six feet under connection, but now we'll have to figure out what to do with that. Probably lean into it.
There was another comment that was less convinced about controlling with the phone. Is there something about that modality / use case that you are particularly moved by? When you imagine "controlling" by phone what spaces would you want to dig into more?
Something to consider: To stand 6' away and still be able to read the things people at my company present via screen share I would need a 10' screen (or binoculars).
This is a really great point. We do have some early thoughts around the idea of the interface being responsive to "distance from user" as opposed to "screen size". The chat interface, for instance, will change in size depending on how far away from the device the user is.
This doesn't change the particular problem you noted, but the general issue is an interesting space to dig into.
Interesting technology I think it is super cool and the implementation looks good.
Some feedback:
1) I would really like to stress that you should be including and focusing on _why_ more than the what. The whole time reading the landing page I was wonder _why_ this exists and what problem it would solve. It is better to speak to someone than to speak to no one.
2) Use a higher quality video. I think some of the other comments are focused on the fact that you are a guy in shorts in a video. It would tell a more interesting story to give a demo of at least one situation or several situations where this would be useful. I would get a lot more out of this video if you staged a group fitness class or 1-on-1 training. This would give me a reason _why_ this exists but with relevant context. If the video was higher quality and it was a fitness setting I don't think anyone would care that you were in shorts
3) Interface Controls section. This section is confusing because the user cannot see the interface just the gestures. It would be better to give off a list of features and benefits rather than going into _how_ you use a feature with the specific gestures. Too much complexity when people haven't bought into the concept yet. Which reminds me, you should focus on selling the concept more than telling me step-by-step how to use it.
4) More controls from phone. This section is a relief because controlling everything using gestures _sounds_ exhausting. Maybe it isn't. But this should be more emphasized than the specific gestures. Controlling everything from my phone seems more intuitive for most people. And having the gestures for an interruption free experience seems like a nice to have.
5) Have you had a yoga or fitness instructor try this yet? Have you all done a live demo with them to validate the problem? Not saying you haven't, but if you have those are probably the best screenshots or video clips to show. If you have done them why aren't you showing us that. If you haven't done them, what are you waiting for?
I run a martial arts school and we have fitness classes. We also do private and group sessions. I can definitely see the use, but you gotta sell it. What is the dream you are selling for us? How is this better than in-person classes? Is it worse? Is it worse but makes us safe from COVID? Is it the next revolutionary way to bring fitness classes online in our new environment? Can influencers use this to monetize their audience?
> I run a martial arts school and we have fitness classes
What martial arts do you teach if you don't mind me asking? My cofounder and I are martial arts enthusiasts (we mostly train BJJ and Muay Thai)
I've spent a lot of time thinking about pain points for traditional martial arts schools and applications of our platform. Unfortunately haven't had too many good ideas though; our platform feels more straightforwardly applicable to reducing friction and improving engagement in online group fitness classes (yoga, cardio kickboxing, etc.)
Heard on all your other points! We're going to prioritize improving the landing page and video. Thanks!
> What martial arts do you teach if you don't mind me asking? My cofounder and I are martial arts enthusiasts (we mostly train BJJ and Muay Thai)
Same! I teach BJJ and I have another instructor that teaches 'Fitness Kickboxing' as well as traditional Muay Thai. I agree that how this platform could apply to our martial arts isn't immediately obvious. However, post-lockdown, some interesting things happened. Paulo and Joao Miyao started upping their online presence and using Twitch to stream drilling sessions and answer questions. I feel like if they had a platform, they would have been able to monetize their efforts more to teach online private lessons. Most notably, Jon Thomas (@jonthomasbjj on IG), has been teaching online seminars and private lessons that he has advertised through IG. I definitely think there is a need for everyday people to get taught by high-level competitors and athletes in our domain, BJJ. I do think your platform is probably better suited to other fitness related activities rather than BJJ. However, during lockdown, this would have been gold. And I still think that some platform that connects everyday people to high level athletes for 1-on-1 or small group lessons would be amazing.
> I've spent a lot of time thinking about pain points for traditional martial arts schools and applications of our platform. Unfortunately haven't had too many good ideas though; our platform feels more straightforwardly applicable to reducing friction and improving engagement in online group fitness classes (yoga, cardio kickboxing, etc.)
My intuition is that there are all kinds of fitness influencers with large followings on Instagram, TikTok, and platforms like that. They are all trying to find ways of monetizing their audience. A really good platform that allows them to run online group fitness classes would be great. Ultimately, their dream is to have a large audience, be respected, and make money while growing and leveraging their audience. How can this platform help that? Ultimately I think these are the types of things that the landing page should allude to or try to sell. This could be monetized in two ways: 1) the influencer pays a monthly fee to use the platform 2) the client pays to access the influencer's content and live classes (platform takes a cut).
As I type all this out, once you imagine a particular type of audience (lets say influencers) it becomes clear that they need more tools than just 'rooms' to do what they need to do. You might say well handling all of those payment things and providing a content platform is beyond the scope of what you are trying to do with your platform. But the more you put yourself in the shoes of the customer, the more specific that platform needs to become to not only offer them value, but also the motivation to pay for and use it. This is why I think the landing page needs to speak to someone, whoever you all decide that is. Because then you will end up focusing on the benefits of what your platform can do for them rather than focusing on how to use the platform from a technical perspective. Through this, you might also uncover the additional features the platform needs to have so that people can actually use it.
After playing with it for a bit, the two finger swipe and click feels pretty intuitive.
But the 'rock on' thumb swipe still feels a little uncomfortable. Especially when I get used to gesturing with my right hand, and then I have to carry my arm over my body to get to the Mute/UnMute.
Have you considered just pulling in some American sign language? The controls (mute, tile, etc.) could even have a depection of the hand symbol.
We've explored a bit of the ASL space. The motion based ones are not something we've dug into. But some of the "static-but-location" based ones are things we've tried.
"Mom" is easier than "grandmother". We've tested out a version of "mute". "Tile" would be quite a bit harder (I think).
> and then I have to carry my arm over my body to get to the Mute/UnMute
Interesting, is there any reason you didn't just use your left hand for the Mute/UnMute? Was it too uncomfortable with your non-dominant untrained hand? Or did we not do a good job of communicating that the gestures work on both sides?
Honestly blown away by all the amazing feedback / thoughts here. I was not expecting to wake up to this considering this post had 3 upvotes when I went to bed haha
Just wanted to let you know that my cofounder, Vince, and I are slowly making our way through all of the comments and hope to thank / answer most of you properly soon :)
@ModernMech, what do you teach if you don't mind me asking? I'd love to hear from you at cody@wakasaba.com
FWIW, our product roadmap is quite malleable right now and we could provide you with a room with higher time limits (+ some other cool features we've been testing) if that's at all enticing :)
I teach computer science, but I may be an anomaly because my primary mode of teaching is "chalk talking" where I will write on the blackboard instead of use powerpoints (you might find more chalk talkers in physics and maths). I feel that it provides better pacing for the students.
My primary use case for this right now would be the fact that I have to teach in person and online at the same time. If one of my students gets covid, they have to quarantine for 2 weeks while I continue lecturing in person. To support them, I have to record my lectures all by myself. Back in the day, it used to be that if you wanted to record your lectures, we had dedicated staff to set up a camera and manage that. Now, everyone has to record their lectures, and there just isn't the personnel to handle this, so we're on our own. Therefore, if I could control my lectures from the chalkboard as I'm talking, it would really be beneficial.
Personally our semester is starting now and I've settled on all of my teaching tools for now, but I'll be keeping track of your progress for sure!
This looks amazing! However, as other commentators say: right now there are not a lot of use cases for that. It will be hard to go to market because you'll have to create that market. I wish you good luck in that because potentially this can be something huge!
For one I think it looks awesome and there might be a lot of potential.
That said, I think you should redo the video with long sleeves and pants. From the get-go, I can't be sure you won't be doing indecent things. I think it is too casual to be shared frictionless in a business setting.
Agreed, beyond the casualness it feels like a little hacked experiment right now. Better lighting, background and outfit would make a massive difference in overall vibe in my opinion. I like the idea though
Are we looking at the same video? He's wearing shorts and a t-shirt, something you might see in any warm climate anywhere in the world (that isn't a caliphate)
I could see gestural controls like this being a useful feature for Twitch streamers, too! It seems like a form of gesture-based control that's reliable, fast, and powerful enough that it isn't just a gimmick.
The Obsbot Tiny webcam also has some gesture control, but not quite as extensive as this. I love the tracking feature that lets me move around without having to about staying in the frame.
The Mmhmm app has a feature like this implemented really well. This would be great to see integrated into video conferencing in general as the space gets more attention.
Thanks for sharing! Mmhmm wasn't on our radar, but we had considered applying our tech to "virtual business presentations at a distance". Interesting to see how Mmhmm is approaching the problem of making virtual presentations more engaging.
I’m Cody, one of the founders of WakaSaba. Traditional video conferencing platforms like Zoom assume that you’re right in front of your device. We created WakaSaba for “distance first” video conferencing and have built novel ways to interact with your room and control your device from afar:
- Hand Tracking: Turn your hands into a mouse and use it to point, select, and click. Supports basic room controls like muting / unmuting yourself and more advanced features like controlling a countdown timer that’s shown to everyone in the room
- Gesture Recognition: React to anything that happens with your hands. We capture certain hand gestures and broadcast them to everyone in a fun, engaging, and minimally(?) intrusive way
- Phone Pairing: Scan a QR code to turn your phone into a remote with access to additional controls and interactive features
Another core feature of our platform is that it’s all in the browser. We explicitly designed WakaSaba so that users wouldn’t need to download additional software or install any apps.
We originally built this for online fitness instructors, but recently have been thinking about what other industries and use cases might benefit from our technology. As a result, we’ve reoriented our landing page to focus less on “why” you might use our platform and more on “what” you can do with it. Now we want to get it in front of people to collect thoughts and general feedback.
Instructions if you want to try it out:
1) There’s a `Try a Free Room` button on our homepage
2) Clicking it will create a room that you can join / invite anyone to for 40 minutes, no email or sign up required
3) Before joining the room, there are some tutorial screens that go over our core feature set
Sorry in advance for any rough edges! We’ve deliberately prioritized getting this in front of people above all else :)
If you encounter any issues or have any questions / feedback I’m all ears! I should be around for the next couple hours.