I recently showed some videos of Soli in the HCI class I teach. Students immediately hit upon the two major issues I wanted to discuss (I was pretty proud!).
The first is learnability. A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.
The second is what's known as the Midas touch problem. How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing? The example I used was the new Mercedes cars that have gesture recognition. While I was doing a test drive, the salesperson started waving his hands as part of his normal speech, and that accidentally raised the volume. Odds are very high Soli will have the same problem. One possibility is to activate Soli via a button, but that would defeat a lot of the purpose of gestures. Another is to use speech to activate, which might work out. Yet another possibility is that you have to do a special gesture "hotword", sort of like how Alexa is activated by saying it's name.
At any rate, these problems are not insurmountable, but it definitely adds to the learning curve, reliability, and overall utility of these gesture based interfaces.
"A loud clatter of gunk music flooded through the Heart of Gold cabin as Zaphod searched the sub-etha radio wavebands for news of himself. The machine was rather difficult to operate. For years radios had been operated by means of pressing buttons and turning dials; then as the technology became more sophisticated the controls were made touch-sensitive - you merely had to brush the panels with your fingers; now all you had to do was wave your hand in the general direction of the components and hope. It saved a lot of muscular expenditure of course, but meant that you had to sit infuriatingly still if you wanted to keep listening to the same programme."
Perhaps the computer is smart enough to determine intent. To paraphrase Marvin, "Here I am with a brain the size of a planet and they ask me to determine whether you were gesturing at me on purpose."
Sirius Cybernetics clearly had some ideas along those lines, but the results were lacking:
"He had found a Nutri-Matic machine which had provided him with a plastic cup filled with a liquid that was almost, but not quite, entirely unlike tea. The way it functioned was very interesting. When the Drink button was pressed it made an instant but highly detailed examination of the subject's taste buds, a spectroscopic examination of the subject's metabolism and then sent tiny experimental signals down the neural pathways to the taste centers of the subject's brain to see what was likely to go down well. However, no one knew quite why it did this because it invariably delivered a cupful of liquid that was almost, but not quite, entirely unlike tea."
The second one seems more of a technical one and can be solved if Soli can reliably recognize user attention, which can effectively be a "hotword" for gesture. This is hard and not sure even it's feasible with this tech, but given all the excitements in this thread on potential privacy issues I guess it's doable :D
The first one seems more troublesome. This is less intuitive than touch screen based interface. The only way I see fighting against this is to standardize a set of generic gestures, map onto existing equivalent touch/voice actions and push it to the Android ecosystem. But not sure how many third party manufacturers will join this parade. Does this technology work well under screen? The industry is now obsessed with getting rid of notch and if Soli blocks this path then it will be a pretty hopeless fight.
Snapping my fingers would be a nice trigger, like "ok Google" or "Alexa". Synchronising the sound with the gesture would cut down on the false positive rate, and it's something I'm unlikely to do unless I want to interact with my phone.
If it could penetrate my pants pocket, being able to snap my fingers next to my pocket, and then perform simple interactions without having to pick up my phone would be nice. Pick up, hang up, volume etc
I would say that snapping is definitely an incidental gesture for some people, and it's also highly inaccessible (while many gesture controls aren't perfectly accessible, audibly snapping is difficult for many more people than those who waving is difficult for)
Not to mention, half the utility of the gestures is the ability to interact with messy/wet hands. Snapping my fingers near my phone in that situation isn't attractive.
Maybe teaching a gesture to your phone is the most accessable option, respects culture and disability the best.
It's a shame though, I did like the intentionality that the sound of snapping fingers afforded.
> A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.
For the Google Pixel 4 that they are using in the video you already have a big display. It can instruct you how to gesture so that you learn it and later it can let you gesture without instructions.
> The second is what's known as the Midas touch problem. How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing?
Either an activation word like you said, or it could use the front-side camera to see whether or not you are looking at it.
Or, depending how smart it is, and its range, it might detect your head attitude and use that as a proxy for attention. The website claims that it can detect a turn toward, a lean, or a look.
> A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback
> ...learnability
Do you have any examples of well structured learnable systems? I have struggled to find much of anything in this space, yet every technology release I see wants for it.
Here are my two examples, I have no others off the top of my mind. I am more impressed with the vim example.
1.
`vim-pandoc-syntax` has a set of documents exampling the feature-set of markdown. These documents are the system they document. Here is one file in a directory of 10 such documents.
I have yet to hear a good response to this question.
I have a Pixel 3 and I want a manual for the device, it appears one does not exist. Nor does documentation. My headphones which came in the box don't resume the most recent media player when I tap the middle button, I called support and over the course of an hour they found they have the same issue. Before my call the people I spoke to said I was wrong and didn't know this issue existed, aftewards they had no advice for me other than to give up. My issue persists.
>>The first is learnability. A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.
That's the same reason for why I think voice controls are literally the worst way to interact with a computer ever(although I think this might actually top it).
> How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing?
This is why I had to change my Amazon Echo Dot's call word back from "Computer". Turns out one might say "computer" a lot during the course of the day, and Alexa was CONSTANTLY going off when it shouldn't have. It was so disappointing that I gave the echo dot away.
Off topic but does it seem like this link deliberately doesn't load any of the Youtube UI details, just leaving in grey hints? I thought the rest hadn't loaded but it's kind of a nice experience.
Once we figure out (non-invasive) BCI and EEG type brain activity signature patterns for when our brains process our perceived intent of taking an action and can activate that action on the system side, prior to our brain sending those electrical impulses to our motor system.
How hard would it be to teach ourselves to inhibit the electrical impulses to our motor system when BCI can identify intent?
When would this level of BCI be possible if you had to make an educated guess?
Thanks for sharing, as a fellow HCI/Cog Sci graduate!
That would be great, but can radar really tell what you are looking at? I suppose could combine it with a camera but that sounds less than ideal in terms of energy use.
Didn't students ask about health affects? If not, consider me as a student and ask what affects it have on hand health with prolonged exposure at a closer proximity in your shirt or pant pocket.
spot on. I feel this is an abuse of technology. They want to take Touch to next level with gesture, but it is doomed to fail unless they solve other issues as you pointed out(just my opinion). Gesture might be good for gaming (ex:kinect). I worked on Hover touch in one of the big smartphone company. we achieved good results at different heights but eventually it didn't take off. After all Humans need a sense of touch to interact.
It is a nice piece of technology. It is a 60Ghz millimeter-wave radar. It is a privacy nightmare. It is already shipped.
Radar uses electromagnetic waves (like Wifi but higher frequency) so it can go through walls, and even typical range for gesture recognition is less than meter,
It probably can go at least 10 times as far by boosting the gain of the amplifier, it is not constrained like a theremin would be because it is already working in the far field region of the antenna.
Because it work at such high frequency (but not so high that it can still go through walls), it has many very small antennas arrays, and sense sub millimeter movements even from far away.
It also has beam-forming capabilities, meaning it can focus the direction in which to sense.
Because it is radar, things which moves are of interests and are filtered easily from the background.
Typically this piece of technology already can or will soon be able to : sense how many humans are around it, where they are, how fast they breath, how fast the heart is pulsating, who they are by computing some heart-base ID.
It is low-power and always-on, 360° with focus-able attention. It is cheap because it can be made on a chip.
(Edit: fixing typos)
60 GHz RF doesn't really pass through anything, which is the entire reason that the frequency is used for radar. A radar needs powerful reflections to detect things. Penetration impedes its operation, and would be akin to a camera that is out of focus.
I'm also unsure if the resolution of this chip is noteworthy.
Yup, the range claims are silly, too. Radar has an inverse fourth power with distance, so to go 10x as far by adding power, you need 10000x as much power, which is quite a challenge at 60GHz.
You are right that Radar follows an inverse 4th power with distance (inverse 2th power for the light to go to the reflected object then 2th power for the reflected wave to go back to your antenna).
This is a deflection and ridiculous. We're talking about spotting e.g. gestures. It falls apart with distance because of both angular resolution and inverse-fourth power.
I've been involved in the design of mmwave radars. If it was easy to spot and precisely track small objects at 10m, we'd be doing it...
>If it was easy to spot and precisely track small objects at 10m,
That's not the claim I'm making.
I agree that this chip won't do the gesture recognition at 10m, but I'm quite convinced that it can pick up human movement signal if they try to do so.
>I've been involved in the design of mmwave radars
I don't have this level of expertise. But I'll be really surprised if we couldn't reach the same levels of amplification. Gaining 12 dB will double the range. We can extend the antenna array or use more expensive low noise amplifiers. For cars there are 30Ghz Doppler radar, and the distance can go at least 50m.
From the referenced paper ( A Highly Integrated 60 GHz 6-Channel Transceiver With Antenna in Package for Smart Sensing and Short-Range Communications https://sci-hub.tw/https://doi.org/10.1109/JSSC.2016.2585621 ) :
"In this work a 60 GHz 4-channel receiver 2-channel transmitter packaged chip targeting high resolution sensing systems and capable of supporting large bandwidth communication channels is presented. The SiGe technology used offers a low 1/f noise which is essential to the functionality of the chip in frequency modulated continues wave systems (FMCW)and Doppler radar with a sensing range below 10 m."
"While we have not explored this in-depth we would like to highlight the similarities to re-cent work exploiting FMCW RF technology to coarsely ‘image’ users through a wall [1]"
Your second quote seems out of context and doesn't occur in the paper you cite.
Your first quote says "sensing range below 10m".
Yes, it is possible to make long range 60GHz systems-- largely through antenna gain and lenses. Yes, we could build an entirely different radar to track peoples' gross movement--- and could have 20 years ago, too-- but that has largely nothing to do with the original system or your original claim.
If I wanted to image people at low resolution through a wall, 60GHz is about the last thing I'd pick. Drywall alone has an attenuation of about 3 dB/cm, and remember we have to cross the wall twice. You run out of loss budget really quick. Suggesting one can go 10x as far (10000x power alone) and through walls is... creative.
If you want to track people through a wall, use UHF. It works pretty well and is pretty easy.
Sorry I mismanaged my tabs, the second citation is just above the conclusion in the referenced paper (https://dl.acm.org/citation.cfm?id=2984565 ) "Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum"
> Suggesting one can go 10x as far (10000x power alone) and through walls is... creative.
I was going for conservative. Typically thinking about open-space work environment.
The device doesn't need line of sight like a camera would, it can be in your coworker pocket and listening to you.
They are well founded and doing their own radio chips, I give them the credit they deserve and assume they can replicate a somewhat similar technology (6Ghz radar).
It is better to use a logarithmic scale. The 10000x power (40dB gain) doesn't mean you will need to emit 10000x more power, in practice you will amplify the signal, by focusing the beam via antenna design, and amplifying which theoretically you can do relatively easily as long as you are above the Cosmic Microwave Background Radiation noise. Then you need to do some trickery to trade bandwidth for signal strength (which they do : FMCW). Then you still can integrate over time. A radar typically scan its whole surrounding, but if instead you chose to focus it at one place you can gather for longer.
Sorry to dig this up so late, there is a usual misconception when reasoning with power (which you might suffer from) :
What matters is not power, what matter is what we measure. We are measuring electric field in volts whose square is proportional to the power. Going 10x as far, mean measuring a voltage 100x smaller. Even with just the dynamic range if your analog digital converter is reading you 8-bits values (values going to 256) with a 100x voltage reduction you get a smaller blip going from 0 to 2.
What matters is signal to noise ratio. There is not really background noise in UHF and up; instead there is thermal noise in the receive amplifier.
If you have SNR of 6dB over some integration interval which gave you acceptable results, and you have 40dB of additional path loss, you're now -34dB, and you need to make that up. There's no "taking half" because we're talking about only a 100x difference in receive voltage.
Put another way, we already "took half"-- our SNR of 6dB (quadruple the power) was already only double the voltage. Our metrics for SNR already take what you're talking about into account.
Dynamic range can be a factor, but you generally have some kind of automatic gain control that increases effective dynamic range (so that in absence of signal, you have noise much bigger than 1LSB showing up at the converter), and the conversion dynamic range "width" of the converter only matters when there are other in-band transmitters around you need to reject (because they limit how far you can turn up the initial gain before saturating the converter).
Note also, not that it relates to what we said at all-- you can detect signals much smaller than 1LSB because of dither.
Not an mmwave radar guy as well, but afaik, this little chip is quite capable of counting how many people are IN a room, if they're standing or laying down, etc, but of course, without the granular control of its factory NUI [natural user interface] proximity calibration.
And here I'm not even considering if this solid state radar chip cross reference its data with Wifi radio signals or the dot projector, boosting its ML recognition capabilities to be able to "blob" (and possibly identify) moving things as far as 50m.
I am not familiar with 60Ghz. I agree that it has more problem going through obstacles. But usually when going higher frequency we can compensate by using more bandwidth. (Terahertz antennas use this to do crazy stuff.)
The main reason it was chosen was to make it smaller to have everything integrated on a chip, and have an antenna array.
Walls are typically static, which means they won't appear in the hardware amplified Doppler-shift signal (below 100Hz) which detect moving objects. (Radar works by amplifying the low frequency beating that occurs when adding two signal of high but close frequency 60GHz and 60Ghz+50Hz, low pass filtering then amplifying.)
Typically the limits about power are dictated by how much attenuation (distance + obstacles) in dB *4 because inverse 4th power law, but as long as you are above the background noise you can amplify.
Also here it is not trying to form a focused image, it is just gathering some signal to run some pattern detection algorithm.
Even if there is a lot of noise we can integrate it with software fft over a long period of time, because we are trying to look for breathing movements (0.3Hz) which are even slower.
> In the case of Pixel 4, the model runs on device, never sends sensor data to Google servers, and helps it interpret the motion to Quick Gestures.
From the "Technology" page linked at the top.
Personally, I don't see how it is meaningfully any worse for privacy than the always on microphone in the google assistant. And I look forward to what it will enable for VR and AR tech!
It's pretty trivial* to sniff traffic and find out what data is actually being sent to Google once people have devices in their hands, no?
It doesn't seem like the kind of thing a company would try to underhandedly sneak in after explicitly saying the sensor runs on-device and "never sends sensor data to Google servers".
* Trivial in the sense of someone who actually knows what they're doing -- of which we only really need one person in order to "leak" what's actually happening.
No, it's not. Between encryption and a high background communication noise of the cellphone already it's hard to discern exactly what is being sent.
I don't think anyone thinks that they are, or will be sending all the sensor data to the servers. But they don't need to, they only need to send back what is determined due to that data.
Specifically consider what is most likely for an ad company, a few differentiated markers to help with targeting, that can easily be packed into a few bits. And can be trivially stored to be sent later on with some other packet that has a valid reason to be sent. (Appstore update check, anyone?) Accounting for every bit that leaves the phone sounds like a nigh impossible task.
I'm not saying this is definitely happening, but I don't think it's practical to rule out the possibility, with any amount of packet sniffing.
Because I'm pretty sure there is some random "user experience checkbox" checked by default that somehow means some of the data is sent to Google so they can "improve their products and services" and what not, but don't worry because it is probably "anonymized" and will only be seen by "humans" or "AIs" depending on what feels less bad for the general public once discovered.
I've played with some 6Ghz 2$ microwave radar sensor and Software Defined Radios though I have not yet used 60Ghz.
Electromagnetic Waves are kind of magic when used in not conventional ways. I remember seeing the 2015 disney EM-sense video https://www.youtube.com/watch?v=fpKDNle6ia4 which is a non-radar way to listen to the environment.
I mean, in theory you setup 3 WiFi base stations and you can use WiFi in roughly the same way. No where near as accurate, of course, but you can easily figure out which areas of the house a person is in.
I don't see why this wouldn't be an even better way to do the same thing.
It seems like they need some humans to tweak their algos, by playing a game no less:
>* Headed South is an experience that introduces Pixel’s new touchless interface to users in a playful and engaging way. Through your journey, learn, practice and master the use of Motion Sense gestures to gather a flock of birds and fly South to escape the storm.
Not sure what to think about it. Seems awesome at first glance but then the examples are skipping songs and hand waving pokemons. Feels a lot like a solution looking for a problem.
This is the one and only suggestion so far that makes sense. It would be nice when cooking. If I was sitting in a bus/train/plane/concert/<public space>, there is no chance in hell I am waving my hands at a phone. I would even feel silly doing it at home.
Controlling audio also seems pretty big with the assumption that gestures work "through various materials for seamless interaction" as the website says. The dial, slide, and swipe gestures look perfect for adjusting volume and skipping songs without taking your phone out of your pocket. Technically a smart watch could also do this (though I'm not sure if it'd be equally smooth), but I also can't remember the last time I saw someone wearing a smart watch (YMMV, I'm in the midwest).
It seems like it'd work well in the general case of "I want to control a device that I can't touch". There's lots of reasons you wouldn't want to touch a device (dirty/non-free hands, device is out of reach, device doesn't have a screen, etc) and a lot of devices you'd want to control from afar (televisions, radios, speakers, AC units, alarm clocks, or arguably phones, I guess). It'll be interesting to see what uses emerge from that intersection once other kinds of devices get Soli.
I wonder if it'd be useful for more crazy ideas that fit into "giving a device more information about physical actions nearby" (like sleep tracking), but I don't know how realistic those are.
Controlling audio doesn't seem like a perfect use case at all.
When you control volume you usually want to make it quiet and stay quiet - e.g. someone is sleeping but you want to continue working (moving) and you don't want to accidentally turn it back up.
The other time when you control volume is when you want to make it louder and then dance (moving) and you don't want the volume to jump up and down.
Only other case that I can think of is when you want to turn the volume up or down just a tiny bit, which actually seems like the worst use case for this technology.
I just spent most of this year rebuilding the engine in my SUV and this would've been an incredible feature. I either greased up my phone when I needed to look something up, or had to take off my latex gloves, which are a pain to get back on when your hands were all sweaty.
Don't forget wet hands (after a shower maybe) and gloves too. But yeah I agree right now the much better use is detecting when you're near, but for saving battery and for speeding up Face ID.
That being said, the website near the bottom show cases finer controls, such as scrolling and turning a dial. I could see these being added in the future for finely controlling some UI elements without your finger blocking the way.
I believe it's a documented fact that most of breakthrough technology feels like dabbling in triviality.
I read somewhere (will look for source when I get time), that when Thomas Edison invented phonograph, he believed there wouldn't be no other use for it than to listen to sermons at home.
I believe this __could__ over time revolutionize HCI and perhaps gaming.
Exactly! The same thing could be said about touch-screens. We can do all these things with a keyboard, why need a touch screen. As a matter of fact, that was the argument Ballmer made during the launch of iPhone. I am not saying Soli would be the same. But it's hard to tell how things would turn out. Or it might take few iterations (maybe from other companies) to get there.
Better speakerphone handling. If the device knows how far the speaker is, or speakers are, it might be able to filter noise better.
Or using the chip to try to provide a consistent music playing volume when you do something in a room that involves moving closer and farther from the speakers.
That said, I agree, I can't come up with anything that's truly useful and not just some over-engineered comfort thing.
Agreed. If I'm so close to a phone, why would I swipe in the air? I also believe that making those Soli actions requires more muscle energy compared to tapping the screen with one finger.
> If I'm so close to a phone, why would I swipe in the air?
Because you don't want to extract it from your backpack/purst first? Or you are lying in the dark avoiding turning the backlight on but you also badly want to skip that terrible song you hate...
edit: or you're presenting and using your phone on the podium as a remote without breaking eye contact/physically fiddling with the phone/laptop.
"Requires more muscle energy" seems like such a weird metric to base design decisions around but if we start the phone in similar states for both, ie laying on a table, I have a hard time believing they're going to be that different.
Raise your hand and try to make 10-20 swipes. You will start feeling pressure in your wrist. In contrast, you can swipe for hours holding the phone in your hand.
Yeah, I'm noticing that more and more. I'd even say we're going backward in many ways : jack 3.5 removal, everything glued/soldered, form over function, complete lack of reparability (while advocating for mobilisation against climate change and recycling at the same time)
I think I agree with the rest of your list, except for this one. While the transition to USB-C is painful, I think the world of consumer devices will be better off for it in the end.
It could be handy for a digital assitant to know when you're near the phone. You could use a particular motion to wake it for audio input rather than a button or having the phone always listening for audio. You could possibly use it for presenting slides, or have it measure a piece of wood or string or fabric or a package for you.
I'd love to see PC devkits for this device. Something as simple as a USB 3 device you plug in and place under your monitor. Perhaps it will make its first "PC" appearance in a Google Chromebook?
I can see a lot of eccentric users figuring out interesting ways of integrate many of these gestures into their workflow. Perhaps for navigating in 3d space or switching between workspaces?
The Kinect caused a huge wave of innovation. It was a convenient and low-cost source of RGBD data, and many robotics labs got a few when it was released.
Personally I feel FaceID is a step backwards from fingerprint readers. It doesn't work as often for me (facial hair, hat, lighting, hoodie, and sometimes it's just finnicky) and there are privacy concerns. The fingerprint reader isn't perfect, but for me it was better. I can touch it while pulling the phone out of my pocket and it's unlocked when I open it.
I'm frustrated with Google's choice to copy Apple and remove this feature from the Pixel 4. It's literally the only reason I'm not buying one.
I guess it depends on whether or not you consider a 3D scan of your face personal data. As face-tracking becomes more prevalent, I'd say this is the worst thing you could voluntarily give away. Your face can be scanned just walking through a crowd in public, a fingerprint is only usable when you physically touch something (and the digital version is always prompted so you can't be identified in a crowd unexpectedly like you can with a face).
I would agree facial recognition in a general sense has privacy implications — but in the case of Face ID I think the implementation is sufficiently secure.
I agree Apple is one of the few companies that takes security seriously. That said, it only takes one hack/leak to compromise your biometric data forever. You can't change it as easily as a password. And you're voluntarily giving it up...for what? A worse unlocking experience? If it weren't for the novelty factor I am confident most people would not use it - probably why the hardware is removing it as an option. Also it's no secret those companies want the extra data for training better models and this is a "free" way to get really high quality data.
As far as things to bemoan in tech this is pretty low on the totem pole. I'm just annoyed because I wanted to buy a new phone and can't find one that has what I want at any price.
EDIT: I saw this article seconds after finishing this comment, which seems ironic.
I think you should read up a bit more on the technical implementation of Face ID. The data never leaves the phone and, even if an attacker had physical access to the phone, they could not get the information. Apple is getting no training data from it.
> If it weren't for the novelty factor I am confident most people would not use it
I think this is really incorrect. You really think everyone is using Touch ID and Face ID only because of the novelty factor, and not because it's significantly more convenient (and, at least in many cases, more secure)? That if it wasn't "fun", everyone would be completely okay going back to entering 6-digit passcodes?
> You really think everyone is using Touch ID and Face ID only because of the novelty factor, and not because it's significantly more convenient (and, at least in many cases, more secure)? That if it wasn't "fun", everyone would be completely okay going back to entering 6-digit passcodes?
I meant compared to using a fingerprint reader, but even then yes. I returned an iPhone because I can't unlock my phone in the dark enough that I just got sick of it. My partner complains of the same thing, and vows to buy a cheaper phone next time because of it.
> I think you should read up a bit more on the technical implementation of Face ID. The data never leaves the phone and, even if an attacker had physical access to the phone, they could not get the information. Apple is getting no training data from it.
Thank you for this, it's useful. It still doesn't assuage me that it's impenetrable though, just that it would be incredibly difficult to obtain.
Face ID works way better for me. I couldn't get the fingerprint reader to recognize my stupid sweaty fingers sometimes, but Face ID always works for me.
You can unlock your phone in the dark. It doesn't rely on visible light, as it shines its own infrared light on you. You were probably holding your phone too close to your face.
Leap killed itself by shutting itself in with proprietary drivers and staying away from any sort of modding.
I ordered one the week it came out, went through quite a bit to get it through customs (for whatever reason) and paid almost twice the retail price because of it, just to be disappointed after the initial hype. Years later I thought I could use it for some stuff with the Pi, just to be disappointed again, as nothing had changed.
Since then I have been looking for replacements and I am really hoping for Soli not to take the same route.
As is often the case, this is some very interesting technology, but for now, we’ll only see it used in some novelty applications.
An increased level of spatial awareness for phones will be huge in the coming decade. However, it will almost certainly be a result of sensor fusion between a Soli-like radar sensor, a FaceID-like ToF sensor, enhanced positioning and pose detection, RGB cameras, microphones, and a lot of ML to assemble a comprehensive picture of environmental context and user intent.
Radar is one more piece of the puzzle in building products that can read the same cues we naturally use to communicate with other humans: Imagine, instead of telling a voice assistant “Alexa, turn down the volume,” where you have to use a phonetic trigger, and all the system has to go on is audio, something more natural: You look in the direction of the hardware, say “turn it down a bit,” and make a pinching gesture with your hand. The system can assemble all these pieces (you were looking at it, you spoke in its direction, you gestured) and, with a sufficiently-trained neural network, make a more conclusive determination of your intent.
After a few days, the user develops a muscle memory of sorts. User doesn't even have to look at the controller and all actions (and feedback) are executed through the tactile interface. From cockpits to nuclear power plants to home tv remote control, there is absolutely nothing that replaces physical buttons, encoders, sliders and toggles.
I haven't formally studied UI/UX, but these are important:
- Feedback for an action
- Predictable steps to take an action (Muscle memory)
- Fast response
- Expose current state (sliders, toggles do this)
There should be 0% ambiguity or the user gets frustrated. Any piece of technology that puts impedance in this process is no fucking good. User shouldn't have to "guess and wait" whether the device recognized their gesture to swipe. A physical button guarantees that the action was performed by the means of feedback. Nope, sound feedback or taptic stuff still isn't as good as the click of a button. It can be but no one engineers it well. For example MacBook Trackpad that "clicks" without moving is excellent. Seeing touch screens (one exception, phone), cap buttons, gesture controls, etc everywhere makes me sad because it has nothing to do with UX but everything to do with the bottom line (cost) and marketing, and in this case perhaps better ad tracking? I will put this in plain words - Don't trust a company that sells ads at the same time as building hardware. Either sell ads or sell hardware, not both. Google already serves software which relies on trading off privacy (even if it is anonymized). When it comes to hardware, I freak out and no way in hell this thing sits in my home.
>there is absolutely nothing that replaces physical buttons, encoders, sliders and toggles
Even space, versatility, & cost aside, there are definitely tasks that a touch screen does better. Using a map with a touch screen is incredibly intuitive compared to a mouse. (I cannot imagine using sliders or other "physical" interfaces)
>A physical button guarantees that the action was performed by the means of feedback
So you've never experienced pressing on a TV remote and nothing happening? On touch screens I can see if the app responded to my interaction. On many button interfaces I cannot.
>taptic stuff still isn't as good as the click of a button
Not sure why you're so confident with this when most people I tell are surprised that their new Macbook touchpad is entirely haptic and not actually moving.
>Don't trust a company that sells ads at the same time as building hardware
So by this definition Apple is worse than Google for privacy?
Swiping and zooming a map on a phone or tablet is quicker and far more intuitive than clicking and dragging with a mouse. You can't scroll to a precise depth without moving your mouse to a specific UI slider. Whereas with touch you control it with the spread of your fingers.
One big piece of functionality missing from mouse-controlled maps is rotation. I can't imagine doing it with a mouse, but "grabbing" a map with two fingers and intuitively rotating it to the angle you want is huge.
Similarly, two-finger swipe up/down to adjust viewing angle is something that could technically be done with a slider you drag with a mouse, but I definitely wouldn't call that better.
I edited and added Macbook trackpad example before your response. I agree when its done right, it works but those are rare examples. Vast majority of products do not go to the lengths that Apple does.
> Even space, versatility, & cost aside, there are definitely tasks that a touch screen does better. Using a map with a touch screen is incredibly intuitive compared to a mouse. (I cannot imagine using sliders or other "physical" interfaces)
Have you used a space mouse? I think it is vastly superior to moving hands, pinching and getting tired after about 3 mins. CAD work requires moving things around for hours on . Pinching, zooming and moving things around with hands is completely insane to a CAD engineer - take a look at this: https://www.3dconnexion.com/products/spacemouse.html
I've been using a space mouse since 2005 for mechanical CAD. I wish I can explain what it is like to use to others because you can't. It is as if its an extension of my senses, and it feels so incredibly natural.
The space mouse is like a joystick or a thinkpad “trackpoint”. It doesn’t move around or measure movements of the body, but just uses force on it to control velocity. This is much less intuitive/effective than a direct movement. A mouse beats a trackpoint hands down for speed and precision in any context where mouse-user performance is important per se (the trackpoint’s advantage is that it is already directly under the fingers while typing, so does not require moving the hands to a separate device and reduces switching time between mousing and typing).
For just inputting rotation, the best ever tried is a trackball which senses a full 3 dimensions of rotation (most trackballs only sense 2 dimensions, with rotations in the third dimension ignored). These have only ever been made as research prototypes, but they were dramatically faster and more precise for people to use than a space mouse.
3D movement is hard to build physical devices to measure; for that maybe the space mouse is the best we can do.
I totally disagree based on experience with using a CAD software for many years. A space mouse isn’t a trackpoint. It is a 6 degrees of freedom device that feels insanely natural once you get used to it.
There is a reason why professionals working on 3D software use a Space Mouse - animation studios to Boeing. Everyone got one when you start a new job.
You should try one if an opportunity comes up. Honestly, it will change your mind forever about what it does. And it’s nothing like a trackpoint, trackball or a joystick. I wish I could show it in person to you and blow all previous expectations.
Edit: a couple of things:
* Trackpoint: 2 axis device
* Trackball: 2 axis device
* Joystick: Usually 2 rotational axes, but you can get 3 axes (twist about Z-axis)
* Space Mouse: 6 axis device
You don't need to move things physically... try flipping through pages on iPad for 1 hour straight vs. using PageUp and PageDown keys on a keyboard. See it for yourself which one is more tiring.
Nothing you wrote is in disagreement with anything I wrote, except for the inaccurate assumptions about my past experience and your misunderstanding the main thrust of my comment.
I have used several different space mice, and also taken a couple of them completely apart. I understand reasonably well how they work. They have a fixed internal part which is attached to the (slightly movable) outside shell by several springs, with little position sensors used to measure a proxy for the force and torque exerted on their outside shell. I believe the early versions used strain gauges instead of position sensors, but the result was more or less the same.
> A space mouse isn’t a trackpoint.
You completely missed my point. The part that is similar between a space space mouse and a trackpoint or joystick is that all of these essentially measure force on them, with that input converted by software to velocity (or angular velocity). If you let go they move back to a neutral position instead of staying where you put them. They do not measure the direct displacement or rotation of some physical component or body part. This is different than the input of a mouse or a touchscreen or a knob or a slider or a trackball or a scroll wheel or a dial or a stylus, but similar to the input of an analog trigger or pedal or steering wheel.
> Trackball: 2 axis device
A trackball with the appropriate sensor(s) can be a 3-axis input device (i.e. if the sensor fully characterizes the rotation). There have been several such devices made as research prototypes in various universities and corporate labs, and empirically they are much more efficient to use than space mouse for inputting rotations. However, they have never (as far as I know) been commercially produced.
This is a shame, because I would love to have one, and building hardware for myself is a lot more effort than buying something off the shelf.
> try flipping through pages on iPad for 1 hour straight vs. using PageUp and PageDown keys on a keyboard. See it for yourself which one is more tiring.
This is a non sequitur. But to answer you, I am perfectly happy to flip physical pages on a book for 5 hours straight, without my fingers ever getting tired.
> You completely missed my point. The part that is similar between a space space mouse and a trackpoint or joystick is that all of these essentially measure force on them, with that input converted by software to velocity (or angular velocity). If you let go they move back to a neutral position instead of staying where you put them. They do not measure the direct displacement or rotation of some physical component or body part. This is different than the input of a mouse or a touchscreen or a knob or a slider or a trackball or a scroll wheel or a dial or a stylus, but similar to the input of an analog trigger or pedal or steering wheel.
Yep, I did. I am sorry. I personally cannot stand using a trackpoint for precisely the same reasons as you're describing. Conversely, I tremendously enjoy using a space mouse and you would have to snatch it from my dead cold hands! It is so good, I don't know how to explain.
> This is a non sequitur. But to answer you, I am perfectly happy to flip physical pages on a book for 5 hours straight, without my fingers ever getting tired.
As compared to PageUp/PageDown? Which one do you think would make you less tired?
The physical book is a lot easier to carry to different kinds of body positions on varying furniture, walk around with, etc.
I suppose a tap on the iPad screen is probably technically the least effort, but I don’t find flipping book pages to be a hardship, and being able to write in the book margins, organize books on the shelf, etc. is nice. YMMV.
The most stressful aspect of using a typical workstation computer to read a long document is probably sitting in a typical office chair. You could compensate for that with a standing desk, but the book is still nicer IMO.
> It is so good, I don't know how to explain.
This is mostly because a 2D mouse + keyboard is quite a poor tool for navigating in 3 dimensions. A multitouch screen could conceivably be decent with a well-designed interface, but the common ones aren’t great.
> Not sure why you're so confident with this when most people I tell are surprised that their new Macbook touchpad is entirely haptic and not actually moving.
I'm not really sure I understand this, I remember a fair amount of discussion when it came out. But, just trivially, I have the latest MacBook Pro, and if I look at it from the side, it is indeed clearly moving as I press down, and not as some sort of weird millisecond-delayed sort of thing, if I press hard (for force touch), the trackpad clearly angles downward. If I continue holding down, the trackpad continues to be tipped down.
From an experience perspective, whatever the MacBook Pro trackpad does is quite different than the haptic thing my phone does, which does not feel like a press at all (and does indeed feel different in the new iPhone 11 Pro's from the previous XS's that had actually "pressable" screens).
Yeah, but what you are feeling with the "click" is not directly physical. Turn your computer completely off (not just power off, but like, "hold the power button" hard shutdown, and that "click" will stop happening.
Sure, I’m just saying that it’s really weird that this is referred as no moving parts when things clearly move. I guess my point is that haptic without movement still feels “uncanny valley” (like on the phone), but with movement (or whatever else they do different on the MacBook) feels better.
I suppose Tinder-style rapid decision apps might excel with a touch screen. Infinite scrollable timelines are certainly built for touchscreens, though I can’t think the last time I asked for one....
99% of the time the reason I put up with touch screens is so I don’t have to sit at my desk.
This isn't true for smartphones anymore. Touchscreen keyboards provide much more data that can be used for autocorrect. Also, you can't swipe type on a keyboard.
> Touchscreen keyboards provide much more data that can be used for autocorrect.
My spelling mistakes are either from muscle memory hitting an incorrect sequence of characters, transposing letters, or from me not knowing how to spell a word. How is a touchscreen going to improve any of those problems?
I find myself, when I have the choice, always gravitating to Google maps on my laptop. Enough that it frustrates me how far behind the web version is in terms of features compared to mobile.
My mouse is far more accurate than my finger and I can zoom more accurately and more quickly with my scroll wheel, it is milliliters of finger travel as opposed to a much larger, and slower mulit finger operation.
I have an apple trackpad device (i.e. separate from the laptop) and I believe it uses a fake haptic feedback to indicate a "click" as opposed to a real mechanism. It's indistinguishable from a real click. HOWEVER, it makes my skin crawl when the driver doesn't register my click and there is zero feedback when I press down on it. It's like pressing down on the surface of my desk and it feels absolutely gross. This happens more frequently than I'd like.
Tangental: Reminds me how buttons in various places (I've noticed this in elevators and POS terminals) beeps when you press them. However, the beep is not really tied to the CPU registering the event but rather the button itself being pressed. Ie, sometimes when you press the button, it clearly beeps, but nothing happens. Not sure if it's a result of underspecification or cheaping out on the electronics or both.
With the "Magic trackpad 2" if you have it turned off, there is no haptic feedback. At the time I didn't know it was haptic feedback and I was so confused as to how the whole click was broken.
The trackpad 1 does indeed have a real click - the way it works is that the little feet that sit on the desk are tactile pushbuttons; clicking moves the entire trackpad while the feet click in. It's disappointing this was removed in the trackpad 2; it's a very satisfying mechanism.
How is Soli any different? I know that if I put my hand in this spot and make the motion of turning a dial then it will turn down the volume, and I can hear the volume go down as I do it, whats the difference?
Or if I tap the air in a specific way, get an audible tap noise, and know I've completed a preset action (index finger tap = set timer for 5 minutes) its the exact same thing.
How is sound feedback not as good? 99% of phone interactions don't use buttons - are you saying that UX is bad?
Also - every major company sells ads and other products. Every. single. one. Google? Apple? Microsoft? Valve? Netflix? Amazon? Bueller?
User pays the initial cost of engagement - Atleast 2 seconds to unlock the phone. After that, all actions user performs are in the "Engaged State".
Imagine if you have to press a button every 3 mins but you have to pay the 2 second cost of unlocking the phone, searching for the button(which can be memorized) and then getting visual/sound feedback.
Compare this with a Remote Control that is sitting on the table. Pressing a button on it every 3 mins has lower impedance.
The examples you're citing have a fixed 2 second "initialization" cost and then phones are fine as long as you're engaged in a session. All feedback is visual (and sometimes audible). Soli is far inferior to a phone in UX because it literally has no feedback besides "hoping" your gesture would get recognized. Moving hands in air has substantially higher ambiguity than touching your phone in "engaged session" mode.
On my current phone I don't have to unlock it to do media controls. I'm not sure why you'd expect anything different with Soli. Also from what I'm aware of, the phone unlocks extremely quickly. Reporters saying (partially because of Soli) that they barely see the lockscreen if picking up the phone.
Good thought experiment, though I do think Soli still makes sense. There is no reason Soli doesn't work exactly the same as a remote - there is no need to unlock the phone if the chip is always on, and can even detect when a user is engaged prior to an action.
In my mind the only difference between a button on a remote and a "button" that sits in the air above my coffee table is reliability, which will be solved in the future. I know pressing button x does y, it makes equal sense that moving my hand like x does y. Feedback could be anything from a puff of air, tiny light flash, small 'ding' sound, etc.
Sound is OK for a feedback of success. What we also need is visible affordances -- signals of what can be done, when, where, and how. Also, what I hate the most with these gestures is fearing that I'll get it close but not quite, and something else (or nothing) will happen. With a button, you can feel it and immediately make micro-adjustments while pressing to ensure success. This can possibly be done with sound, but I haven't seen it done well, yet...
Agreed - but those are software problems that can be solved with training or good ux. iPhone has no indication of what can be down (How do you know you can swipe your homescreen up, down, left, and right?). Discover-ability is a challenge that has been overcome previously and can be overcome again.
Part of that is the two games on the announcement page which show you examples of gestures. Doing gesture x accomplished y in the game, so maybe in the next app you open doing gesture x will accomplish something.
Your comment reminds of “Design for everyday things” book. He talks about Affordances and Signals.
I agree with you, sound can be ok as a signal that something has happened. It is fine with some users, but annoys others (beeps when a camera auto focuses, always a split opinion. Some people like it and some don’t).
Yeah. The same buttons do different things, but what they'll do depends which mode you've got selected. If you remember the remotes where it blinks an LED under the selected mode (tv/aux/DVD/whatever) as you're giving input with other buttons, that's a helpful reference.
I know it’s one data point but I never developed muscle memory for remotes the same way as, let’s say, Emacs. I always stumble and press the wrong keys. More importantly the interface on the TV is convoluted. I don’t want next channel - I usually want to go to a specific channel. With, say, Apple TV it means scrolling through a bunch of apps - much faster. Or YouTube tv via chrome cast - scrolling through the guide on the touch screen.
Same here. Except for one or two functions, I would always end up squinting at a remote in the dark to try and read the labels.
I like remotes like the Apple TV remote and the Roku remote (except for the fact that the Roku remote is a piece of utter garbage and I want to have strong words with the designer) because they focus on the essentials and let you do everything else through menus.
(The Roku remote is an utter piece of trash for a couple reasons. First, it chews through AA batteries like it’s going out of style. Second, it is not responsive—there is a agonizing delay between when you press a button for the first time and when the action takes effect. My guess is that these defects are because it uses WiFi to communicate with the device, and I can’t understand why anybody thought that was a reasonable technology to choose.)
Right on the mark here. Feedback for devices like this are going to have to involve other senses, not touch. The most obvious ones being a visual + audio feedback of some kind.
I built a project at school using a Kinect in 2011. Similar UI/UX model where you can essentially 'sense' the skeleton - we routed that data through a Node app that allowed you to swipe in the air to move through a photo album.
One of the hardest parts of this is getting what people call the 'clutch' right. Basically, how is the interface supposed to know that my arm movement is meant to target the device, and not say a friendly wave to my neighbor? In voice interfaces this is the equivalent of 'hey Siri' or 'Ok google'. With skeleton sensing interfaces we could still do an audio clutch if needed or you'll need to use another body part to engage the action on the device. Fascinating problem and I'm curious as to what clever solutions will surface.
While I personally agree with your preference, I think it's dangerous to assume that a new generation of users will have the same preferences we do despite being raised with so many new user interface paradigms.
Users tend to gravitate towards what feels "native" to them.
While agree with most of what you said, I have seen a lot of happy Face ID (over Touch ID). I think your 3 important points set a good framework for anything buttonless.
I added one more - Expose state of the system to user. Examples include:
- Vintage radios with sliding marker that shows what the current station freq is
- 3 state sliders that show whether something is state A/B/C by simply looking at it. When performing action on such a slider, user immediately knows by the means of performing the action, what the state of the system is!
- Volume knobs, shows the current state.
- Big bright ON light when a Keithley power supply is supplying power
- Lots of Braun products, too many to list.
May be these are obvious but it just reminds me that things we take granted are really important in UX/UI.
Good points. Reminds me of an HN discussion about Star Trek: Voyager, where they build tactile, non-touchscreen controls for a shuttle, and there are backup tactile controls for when a crewman loses vision, for example--can still fly the ship etc.
What if one used audio feedback while using soli, eg swiping made a literally swiping noise, turning your hand clicked every notch you turned on a volume slider.
> From cockpits to nuclear power plants to home tv remote control, there is absolutely nothing that replaces physical buttons, encoders, sliders and toggles.
My anecdote/complaint related to this. My previous car had knobs, sliders, and just a few well-positioned buttons for the radio and HVAC controls. A very classic arrangement that I got along with just fine.
I mean, I get that they were going for a Star Trek: TNG vibe with the whole thing but they totally let form ravish function in this case. _Everything_ is a button, none of the buttons line up, none have tactile feedback to tell you what you're about to press. There are buttons that pretty much never get used, there are _fake_ blank buttons, there are things which never should have been buttons in the first place.
The HVAC controls are particularly bad:
- Except for some little bumps, there is no tactile feedback anywhere around them, so there is no way to use any of the buttons without looking at them first. Which means taking your eyes off the road to change anything.
- If you want to make drastic changes to the air temperature, you have to push the button several dozen times _all while watching the display_. On traditional HVAC controls, this job is just quick twist of a knob or push of a lever, neither of which is a great risk to your life.
- If you want to change which vents the air is coming out of, you have to cycle through several different options with a single button. Once you've located it, that is. And I _always_ shoot past the one I actually want, which leads to more button pushing, swearing and taking my eyes off the road for _even longer_.
- Depending on the configuration, outside temperature, and phase of the moon, the car will automatically decide whether to use interior air or exterior air for the intake. I _always_ have to _constantly_ check visually to make sure the HVAC is configured the way I want it because I can't trust that the car didn't change it for me.
- Also the AC will be on whenever the defrost vents are, even though the AC light never comes on.
- There is a button on the HVAC panel whose _only purpose_ is the switch the air intake to interior and crank up the fan to 100% for 30 seconds or so. I guess this is the "fart remover" function. I have never seen this on any other car, maybe it's a Japanese thing.
- There is a button to turn the ventilation fan off, but not on. You turn on the fan by increasing the fan speed. So if you have the fan running on, say, fan speed 3, and want to turn it off momentarily and then go back to what it was, well, you just can't.
The _only_ knobs on the whole car are at the top of the radio. The are impossible to reach without leaning forward, they are small, and they are stiff. If not for these flaws, the volume knob could be useful but the other knob is the 'tuning' knob which has essentially no purpose in 2019. The few among us who actually still listen to FM radio find their stations via the Seek buttons and program their stations to memory if desired.
And finally, the button to engage the hazard lights is at the _bottom_ of the console within easy reach of passengers, kids, your resting hand, etc. Right where something more useful (and less obnoxious) could be.
Ars' report [0] seemed to indicate that the gestures work great with the full-size Soli chip, but not the miniaturized one they had to cram into the phone.
I just watched MKBHD's Pixel 4 video and he said the wave gestures worked maybe 10% of the time. If that is true I hope it is a bug as shipping something like that should never happen.
>The radar uses a 60GHz frequency band to attain the advertised accuracy, and that’s exactly where the problem lies. India has reserved this mmWave band only for military and government use for now and it needs to un-license this frequency before allowing civilian use for applications like Soli.
>The report adds that Google did consider disabling the radar for the units sold in India, but it still wouldn’t have guaranteed a sales permit, and removing the hardware wasn’t an option.
I wonder how this will impact fingerprinting of users from a privacy and security perspective. It would be useful as a means of identity verification based off of physical properties of the user beyond just their fingerprint or iris. Yet it would also be a massive privacy concern, particularly since it advertises 360° sensing.
Oh my god, extraordinary technical feat, but do not want. Seriously, someone go out there and charge $1000 for a dumb 50" television. Give me a phone that doesn't have an assistant or this Soli and I will give a premium for it.
TBH, when this data is shipped up to the cloud, it may be used for customization that could be fed to the ad engine, but it's especially important for refining algorithm.
A lot of the leaps in high-fidelity human-computer interaction (voice, face, and likely this new gesture system) have been made with having enough data about real-world interactions to train the models. It's how a company finds out about the ten thousand things that happen in the real world that their lab-models missed, and gets their algorithm from 90% accuracy to 99.9% accuracy.
Yeah, I saw that. If they are both 1) not equivocating, and 2) if actually telling the entire truth, don't switch to Hoovering up data if/when these become widespread (gen 2, say), I'll eat my hat.
Given the company we're dealing with, I think my hat's safe. It'd be wildly out of character for them to resist collecting sensitive data from this.
A step in the right direction maybe, but it's missing half of the equation. And this seems a space where the whole is greater than the sum of its parts: we aren't half way there. You may be using your hands to gesture, but you aren't feeling or manipulating anything. Every single picture in that post has a physical thing in the hand. This product never has a physical thing in the hand.
They should use this technology to charge Youtube video ad views depending on how many people are watching the ad on a phone. Multi-user presence will be interesting to see how it gets used as an API.
Or prevent a show from being screened to more than two people! Imagine how many missed royalties can now be collected! With Google Pay integration, the user can even be charged automatically to avoid the inconvenience of having to apply for a screening license.
I know you are joking, but I am pretty sure this was a patent Microsoft filed in conjunction with Kinect. I could be wrong about if it was Microsoft or not, but it was definitely about a camera noticing how many people were watching a digital product/ad.
Slightly off topic: Where can one find these minimalist cartoons/sketches of people for websites? Ideally free, but those options appear super cartoonish and limited.
Those are likely custom, two sources that I like are Undraw [0] and ManyPixels [1] both of which offer free options. You can set your own color and select from hundreds of SVG sketches. That said both are quite popular so common sketches from each appear quite often on various websites.
> Soli is not a camera and doesn’t capture any visual images.
This is a privacy misconception that really needs to die (something also discussed in the W3C ambient light sensor thread recently on HN frontpage). Sensing involves privacy implications. By. Definition.
We don't even need to resort to "by definition" here.
They describe the tech as "detect[ing] objects and motion through various materials". We're effectively talking about the equivalent of TSA full-body scanners. How's that for privacy concerns!?
Also, for a bit of pedantry, if this is literally "radar" as they say it is, then the argument that "Soli is not a camera and doesn’t capture any visual images" is arguing semantics at the level of splitting "which EM frequency ranges count as a camera and/or visual imagery" hairs.
The site doesn't say what band they are using, but their research articles contain a lot in the 60GHz range, which is certainly high enough resolution to capture images, particularly if you use synthetic aperture techniques (you can already do inertial movement sensing on a phone to aid with this).
Especially as the privacy statement you quoted is immediately followed by a diagram showing how Soli can tell if you're paying attention to your phone or not, and how many people are near-by.
"A weasel word, or anonymous authority, is an informal term for words and phrases aimed at creating an impression that something specific and meaningful has been said, when in fact only a vague or ambiguous claim has been communicated."
I think a killer app for this could be small screens. You could have a UI with pseudo-"buttons" around the edge of the screen large enough to see but too small to actually press, letting the interactions take place just off screen. This could let you turn a 2x2 inch screen like a watch into one that's effectively a couple inches larger in terms of the interactions available. Right now my watch's design is almost entirely constrained by having UI elements big enough to easily touch, taking up precious screen space.
It's really cool and innovative technology, but I also struggle to see the point for the common user. If this were a computer monitor or projector screen, sure. But my phone is pretty much always in my hand if it's being used in any way. If it's already in my hand, what do I gain from this?
I can see how it'd be very useful for people who mount their phones on some sort of distant stand. But I feel like that's pretty uncommon; though maybe I'm just ignorant of how common it actually is. It's probably much more common for tablets, like for watching videos/movies at a distance. I do that, and I could definitely see myself using this with a tablet. But for something small that's pretty much always either in my pocket or in my hand?
Your phone isn't always in your hand. In fact, your phone isn't in your hand for the vast majority of the day most likely. And it's in those scenarios that something like this could be cool if it works well.
If it can detect your presence before you actually pick it up it could do things like fire up the face auth systems to give quicker unlock. It can auto-lock if you set it down and walk away without manually turning it off or waiting for the timeout. If the alarm is going off it can lower the volume if it detects your presence before you actually dismiss or snooze the alarm.
There's all sorts of possibilities for all the times where you are not holding your phone. If it works well, that is.
One thing that might help with the "imagine what's possible" mentality here if you're convinced your phone will always be in your hand (which is a totally valid assumption, btw), is to start imagining what other devices this chip could be inserted into, and what you could do with them "remotely".
One spot where it's already improving things is in the face unlock. It uses the Soli sensor to detect when you're approaching the device then uses the camera (maybe combined with to the Soli sensor) to unlock much faster than even the iPhone. In MKBHD's initial review it was fully on and unlocked before he even brought it all the way up.
why no one talks about health and safety of this technology?Not that it will cause any issues, but interested and curious about any peer reviewed studies on how prolonged exposure of 60GHz with close proximity to the body will affect? If there is none, I would love to jump on to the ship. Because, it is so cool! if there is something or have no evidence, then the community or the experts should talk about it and push for proper studies before 10 years from now folks exposed with Soli got cancer!
There's nothing special about 60GHz, it's just like any other radio transmitter that's been around for decades. The safety is well understood and apart from some pseudoscience around it (as has been for every new tech, including microwave ovens, cellphones, 5g etc.), there are no concerns.
I've tinkered around with gesture recognition with the Magic Leap quite a bit for art projects. I hope this gets released as a standalone product with an SDK!
According to Google cache they had an SDK at one point
[1] - guess that went down the drain after monetization. The chip itself seems to be an infineon product [0] you could buy but the interesting part will likely be the software stack. From a research perspective this is certainly cool tech. But it seems kind of odd to have an ad / landing page for an aspect of their tech you can only get as a gimmicky part of a feature phone. All the "it's not a camera" wording makes me feel like this is just a marketing campaign to say "see, we're not creepy".
I think this is better for something like a Nintendo. To put it in a phone is kinda invasive. I would hope that at minimum the chip can only be specifically activated by the user in specific apps. To just have it on all the time is kinda crazy. Are they going to disclose when they can identify WHO is using it rather than just multiple bodies, etc...
I think this could be synergetic with voice commands. It wont be useful on its own but sometimes you dont want to say every command out loud, so doing a little annoyed wave feels more natural. Soli on its own is not going to sell phones in my opinion but maybe Google is building up to a larger more aware device that can ultimately guess your entire mood, behavior and position in the room and do various things based on that. Like a sort of virtual person that would detect if you look worried and then it asks you how you are feeling today. That's a futuristic direction that I can see this going towards.
For example, one usecase that I would probably like is a phone that tracks my posture during workouts and if it's really advanced, it could give me advice or fire me up on my last reps. Like a personal trainer. Right now it seems we aren't anywhere near that but this is my guess for their strategic direction.
So now Pixel phones have the perfect spy chip and they call it "private" because it doesn't record images. But then they don't say what data it records and what they do with it. What they do have is boatloads of stupid buzzwords for the dumbest phone user imaginable. That doesn't bode well at all.
I might start sweating if it takes that much energy just to move to the next track.
Edit: Not sure why this got voted down? Hold your arm in an upright position, swat to the side a few hundred times, you would feel at the very least, muscle fatigue.
For repetitive tasks, you want efficiency of movement, not dramatic movement.
I think you're getting voted down because "holding your arm in an upright position and swatting to the side a few hundred times" seems like an exaggeration of what's required to "move to the next track".
I would imagine to move to the next track in a song you'd be using an action from the "Active" demos at the bottom, which may require some kind of initial motion to enter the "active" state, but also seem like they'd work with your arms resting at their sides, and be literally almost no energy whatsoever to e.g. swipe your thumb along your index finger once.
Why is this needed? I can see there are some niche needs like when you are cooking and cannot touch the screen. But that is what voice is for. What exactly is soli solving? Looks like over engineering to me, simply because they can.
Let's not pretend voice isn't also a huge novelty.
Yes the gestures are majorly a novelty, but the fact is, this is a radar, not a gesture sensor.
It would be capable of determining if you're in a dangerous situation (by sensing assault / weapons), it could help a blind person avoid walking into walls, or simply chime when it falls out of your pocket.
Some of these have very high power requirements which is why it's not entirely practical to bake it into the phone as a common feature, but they're all entirely within the realm of possibility.
You can still do it the normal way.. It's a neat idea and new interaction methods coming up means we can experiment and maybe find things that work even better. Soli if they work out the kinks would be great because you could increase the interaction area available without having to make a bigger device.
I'm curious is there are any health studies done around this, what about the dangers of EMF radiation when it's on all the time so close to your body, not to mention in your pocket all the time.
This doesnt make sense for phones. But when it becomes reliable, it will be a good usecase for smartwatches.
It already has less screen area and limited option for button input. So gestures makes sense.
It's emitting a lot of signals to run the radar it's using so I imagine it's tougher to get the approvals to do that in all countries. They might have to get per country approval for the local FCC/Ofcom equivalent before enabling it in each country because it's a relatively new class of device.
It's much more likely it's just regulatory. Not every country has the same rules on spectrum usage and may require additional certification before allowing Google to use the Wi-Gig band. The FCC raised the power max for them when they approved the chip back in January. They're probably running into issues getting similar allowances in other countries (or not bothering to because of the small market).
In an earlier iteration of the web page they had really cool examples of controlling sliders, radial dials, panning surfaces, highly specific pinch gestures etc.
All of those have been removed and now it's just waving. It's cool tech and requires solving lots of hard problems, but it's a shame it falls so short of the initial vision.
The technology is amazing, as seen in previous demos, but how much of it is actually possible in the phone? The video is very underwhelming, showing simple wave motion that was possible using IR, Motorola had it even before smartphones over a decade ago.
All I want is a small Bluetooth device (maybe wrist-mounted) with a few buttons, a slider, some sort of rotary sensor like a mouse wheel, and accelerometer. Then give me complete control in my phone over how the software interprets the actions.
It will be so fun to watch my 2 year old play with this . If it’s good we’ll be happy if it’s bad it’ll be like the iOS 7 upgrade from 6 when my other then 2 year was so frustrated she threw the ipad down the street shattering into many parts
Ugh! Hope you're wrong. I know the resolution of the radar is sufficient to recognize little gestures (dial, for example), but that must fall off over distance, surely. Hopefully. Right?
Did not think about it that way, but that does sound like a highly plausible direction a company that ditched their 'don't do evil' mantra would take it. In fact, thinking about it some more, the reasoning behind the tech would make more sense from a business perspective vs a huge want from the user's POV.
Google obviously does evil things. Still not an excuse to keep using this argument, which was nothing but sensationalism and clickbait. "Don't do evil" is still in Google's COC[0].
> [...] And remember… don’t be evil, and if you see something that you think isn’t right – speak up!
It's not like a COC is anything legally binding, or like they didn't do anything bad before restructuring under Alphabet, either.
Fifteen Million Merits is one of their episodes that's not about the future at all, the way I read it. More of an allegory about now (or, now in 2011, anyway).
That's an interesting take that I could completely see now.
When I watched it, I guess I took it more at face value, that in the future when everything is automated, only celebrities and 'power generators' will exist, and of course the rest of the stuff like unskippable ads and in-your-face pornography.
I think the lack of "enforcers" of any kind throughout the episode is key to understanding it. The closest thing we really see is the Cuppliance drink. The stage hands hesitatingly start to step in near the end, but are waved off and aren't exactly jack-booted thugs anyway.
Those demoted aren't dragged away crying. Our protagonist does property damage without apparent censure (one is left to suppose the screens get replaced, eventually, while he's out, maybe or maybe not with a charge to his merit account, similar to the way "detritus" is taken care of).
This is why I don't think there's much room for ambiguity re: whether the outdoors at the end is real. If it's not then that's undercutting what's been the key theme of the rest of the episode, of timid acquiescence to a very gently coercive system. If it's real then that's the punch line of the whole thing—these people could just leave, and they don't because of some combination of social norms, fear of the unknown, fear of losing what (pointedly crappy and meaningless—the avatar skins and such) comforts they have, and (one may assume) some soft but effective persuasion techniques that might come up if they tried. Since assuming it's the latter is consistent with the rest of the episode and makes it much stronger, it doesn't make much sense that it'd be anything else or even be deliberately left ambiguous (why?).
It's all about how control and how a "trapped" state can occur absent anyone waving guns around. With more than a little commentary on social class, celebrity, electronic entertainment, and the hollowness of participation in an economic system and society heavy on alienation (the bikes), along the way.
[EDIT] I just skimmed the first few minutes again because I remembered there being a whole thing about fruit, and not only that, the opening's full of cartoony imagery of the green outdoors. The outdoors at the end is definitely intended to be real, not fake or even ambiguous, unless we assume incompetence on the part of the creators. It's the antithesis of the wholly false, not-even-trying-to-look-real outdoors scenes that are all over the beginning, which did leave room to wonder what the world's like outside this environment—the ending gives us the answer.
I've always been jealous of people who can see so deeply into things, I absolutely cannot unless it's pointed out. I think my mind wanders off when watching, so I miss the details. In any event, thank you very much for the write-up, this makes that episode so much more intriguing to me.
Haha, no problem, but it's mostly just practice. The key thing seems to be recognizing that there's anything worth picking out in a piece of art to begin with—those with little practice don't notice it anywhere, or else will see it everywhere because of course it's possible to bullshit deep themes into an average episode of NCIS if you're so inclined—then questioning why for various decisions. If you're not sure what something means in a thematic or thesis sense, look at which possibility or possibilities go best with the rest of what you've seen (or read, or heard—works for books and music, too).
FWIW I find most of the rest of Black Mirror a lot less rich than Fifteen Million Merits, though I like most of it. I couldn't write as much on most of the episodes as that one, because I'm just not sure there's as much there. Most are a fairly straight route up to usually some kind of twist, with a little worth saying about the relation of the twist to the rest but mostly what they're up to is pretty on the surface. Nothing wrong with that, and that fact that it consistently tries to say anything at all puts it above most TV, so far as that goes.
To expand a little into other episodes, though, to give some more idea of what I mean by asking why about choices creators make in art: take The National Anthem, for example. When a Certain Pivotal Scene (you know the one) happens, the creators could have depicted it several ways without changing the actual story. A few possibilities: 1) actually show it, entirely, 2) show it but crop out the worst of it, 3) keep the camera nearby, say on people in the room or in the next room over, or on the fictional production crew "shooting" the event, 4) any of the above but show only a little and cut away, 5) just skip straight to after, don't have any footage that takes place during it at all, among other options. There was a choice to make, and the one that was made isn't exactly revolutionary, but also isn't one of the most obvious ones: keep the camera going for all of it (or maybe just most? I can't recall for sure) after showing us every moment at the scene of the event leading up to it, but only show the faces of people watching it on TV. Why? It wasn't the only way they could have avoided putting the event itself onscreen, why do it that way?
Showing people watching a screen, especially something sensational that's been teased from the very first scene of the episode, and holding the camera on it that long does a couple things, I think: in the general case, it invites identity with or comparison between the real life viewer and the viewer in the show; in this particular case, I think the show is also delivering on a kind of implicit promise of horrible spectacle by showing us the most disgusting thing that's happening in that moment, by the creator's judgement. Anyone watching the show, even if they hope the event won't happen or that it won't be depicted at all if it does (which is hopefully most people?) was still held in suspense and to some degree entertained by that will they/won't they dynamic. The result is that this choice both casts judgement on the viewers-in-the-show and prevents the real-life show watcher from casting them as Other and distancing themselves from those voyeurs. This option was chosen by the show's creators because it conveys a message different from what others would have (most of the other options above wouldn't convey much at all, without some further effort, but would advance the plot just the same) and does so effectively. That single choice causes multiple effects on and messages to the viewer, working toward one end.
To take that a step further, both reinforcing the likelihood that this was intentional and possibly adding insight into other episodes, the show repeatedly returns to themes along these lines, of the viewer-as-complicit. White Bear's a huge one, obviously, featuring punishment essentially for the act of watching a crime (and filming! The show also loves to jab at itself and its creators pretty viciously, as Fifteen Million Merits manages to also do in addition to everything else) with a kind of Greek hell of... having crimes done to them while "normal people" watch and record, which is called justice. A couple others tread similar ground—Shut Up and Dance, notably.
Moving into more tenuous territory: Jan Junipero may also be doing something like this, forcing the viewer into more direct confrontation with some message in it by identifying them with some element of or character in the show, at the end, just a bit. The long sequence of server farms and robot arms at the end definitely adds a sense of melancholy and unease over the happy ending and neatly reframes our own emotions about the episode, which is probably the main thing it's intended to do, but that also looks an awful lot like any modern-day server farm. Like where Netflix episodes come from. Like the episode you just watched, finding temporary joy in this fiction while sitting, apparently lifeless... and oh look it's suggesting another show or movie, how nice, this could just go on forever, couldn't it? And that's the season Black Mirror transitioned to Netflix, I'm pretty sure.
That's quite stretch from the pretty clear use of those themes and mechanisms in other episodes and I wouldn't say I'm anywhere near as sure of it as some of the other stuff, since it's pretty abstract and there are other, stronger ways to explain why the scene is there and why it looks the way it does... but then again Bandersnatch came out a few years later and hits some of the same notes explicitly, which makes it just a tad less likely that that wasn't on the creators' minds when they decided how the end of San Junipero should go, and a little more likely that that connection was intended.
I wonder if that's proximity sensors or always-on gesture recognition doing image processing off the front camera.
Either way, gesturing left/right/up/down doesn't seem to add more than touching the device. My Moto G6 has that now, using the fingerprint sensor as a directional swipe button, love it.
It'll be interesting to see how this works once it's in the hands of consumers; the videos they were showing a few years ago looked pretty cool, but it seems like these sorts of novel user interfaces need to hit a difficult trifecta of being intuitive, reliable, and responsive.
they specifically point out on the page that it's not a camera, doesn't capture any visual images, even works through some solid materials, and can sense in all directions around the device.
This looks literally perfect for something like a smart display. I have a google home smart display on my desk, and I like to keep it back far enough that it's a bit of a reach to touch the screen and interact with it. If i could just wave in front of it, that would improve things significantly! But add in the other benefits of something like this (just being able to sense where a person is around a device like a smart speaker or display seems like it could be extremely useful for better sound projection and better microphone listening), and it seems like it really could be a game changer.
I am wondering why they decided to first release it in a phone, where it seems like it has the least benefits...
Plus a stationary or semi-stationary version only lets them collect data from wherever you place it. Put it in phones and pretty soon you've got whatever sort of data this thing picks up from damn near every room, every street, every trail, every car, every everything anywhere that people exist. Data to train your machine learning algos, for free. God knows what else—data to map every square centimeter of every environment in the modern world, for all we know.
Their priorities are spying-first, typically, so this is unsurprising.
Their goal is ubiquitous access. To get to that goal, they're collecting a lot of data about the world and their users to figure out where, when, how, and why users want data to optimize getting it to them. And yes, it probably serves their ad model too, but there's more to it than that; Google is helmed by a futurist and employs futurists, and is looking towards a not-too-distant future of always-on personal networks enhancing what a person can do in their day-to-day.
Their goal is the same as every company's goal, earn money.
If they introduce anything new, the first question is, how that will make money for them. If it is collecting data, the question is, how they can use that data to earn more money.
Also as it has been mentioned by someone in the comments, I'm really not keen being surrounded by another set of devices that can "see" me CCTV is more than enough already. I also I don't want to invite google to my home to map it out.
Probably Pixel4 owners are going to be asked to turn off their phone and put it into a box next to the door when they want to come into my flat.
Kind of but not exactly. Money is the lifeblood that the corporation needs to survive; earning money is the goal in the sense that the purpose of human life is "eat food."
The fact that the corporation is still basically privately owned though publicly traded (in the sense that the founders retain a controlling stock percentage) means that they can use the money as a means to whatever ends the founders wind up the company and point it at. They know the game is over if they run out of money, but that doesn't mean the game they're playing is "Build maximum monetary value for shareholders" any more than the game you or I are playing daily is "What will I have for dinner." They have enough controlling interest to vote that that's not what the company's primary goal is, in practice.
Now, why would anyone who isn't them play that game by buying GOOG/L stock? Because in spite of the company's goal not being "maximize revenue," it's very good at generating both revenue and product people care about, and the people trading its stock are excited about that. They get a piece of the action, even if they don't actually call the shots.
I have a Google Home (Google Nest Home now?) in my kitchen and think soli would be a perfect fit for it. Between the noise of my stove and music, sometimes it's really hard to get the device to hear my voice commands. Being able to do gestures with my dirty hands would be great.
Convenient form factor to get it into the hands of techie tinkerers.
Sell it standalone, and people have to really want to play with it.
Sell it in an already-portable form-factor that is also a phone, and the activation energy to try it is lower; if it doesn't work, at least you have a working top-model phone.
The sensor sucks because it's small. Plus as a radar it's subject to regulatory requirements. Google should have gone with a lidar-based solution, but the concept of putting lidar in a cellphone is probably heavily protected Apple IP.
Good lord... this is a really cool piece of technology and half the comments here are just complaints that it will enable better ad tracking or erode privacy or that touch screens are good enough.
This is HN. This is cool technology. Can we just stop to appreciate what cool new interfaces or game concepts we might be able to build with this, rather than jumping on the knee-jerk Google hate train?
Cool technology no longer lives in a vacuum. Tech is now inextricably linked with ethics (in a good way) and will become more so in the future.
Directionally this is a great thing! If this was a DIYer or article about the human-computer interaction concepts of interfaces, I would agree with you. But it's not. It's about a device shipped by one of the most privacy-invasive companies on the planet.
IMHO privacy questions are exactly the right questions for a thread like this. No one gets a free pass because it's 'cool technology', nor should they.
HN is also fiercely pro privacy. I'd say that in such cases, privacy wins out over the coolness of the technology.
Personally, I find this a little too creepy for too little convenience. What can waving to the phone solve what I haven't been able to accomplish with my fingers?
Like voice assistants, it's amazing tech—if there's some way to use it that doesn't make me a data-collection appendage of a spyvertising company. Their data collection & hoarding activity is quite dangerous and this just means they're gonna get more private and sensitive data about me and everyone and everything I care about, whether or not I buy a device with one of these.
[EDIT] to expand, the hate train is because they're the world's richest and best stalkers & peeping Toms. Any time they come out with new tech it's usually used to make them richer, even better stalkers & peeping Toms. It's intensely creepy and a little hard to see past.
It is an awesome cool piece of technology which is also one of the best remote spying device people can bring into your home and map it. Also motion based identification is a nice thing, so they could tell, who is at home.
If you can give me an example of any cool tech which has been not used to kill or spy on people, I happy stop jumping on the hate train.
You know what else is cool? MIRV ablative coatings. Stuff is amazing - multi-layer carbonfiber wrapping that protect the sensitive innards from ABM lasers. Super lightweight, very strong, also radar absorbing: a real advance in warhead survivability and barrage effectiveness.
Doesn't mean I'll be happy when one comes screaming out of the sky towards my head at Mach 24.
It's not a Google hate train. Maybe, just perhaps, even people who like cool tech are feeling very uncomfortable about the kind of future this tech enables.
Unfortunately it's immanently stained by Google's association with it. Plus, it will probably be inexplicably dropped from the phone a couple models down the line.
I honestly don't get why you'd need a motion sensor on a handheld device. Let alone the limited application of such an input, you'd literally need to keep the device stationary for the motion sensing to be effective. It's a usability nightmare.
One useful thing I could imagine is for cooking. If you got dirty hands you will (hopefully) be able to unlock the phone hands free and browse through a recipe.
They probably dumped hundred thousands man hours and hundred millions $ on this so we can skip songs while cooking. Every time I see a project like that released by a major tech company I lose a bit of my faith in tech.
For me personally, I'm going to wait a few years for the price on this phone to come down, buy a pile of them, and stick them on the wall of every room in my house. Low-power wall control panel with gestural input. Throw a bit of custom software on there, and (coupling that with Google Assistant voice recognition) it's the closest I'll get to Star Trek rooms in this decade.
That's what I want to know as well. This github[0] mentions the sensor as a standalone item, and this appears to be the manufacturer[1]. This[2] is the closest I could find for purchase (24GHz, as opposed to Soli's 60GHz, and nearly 300 bucks to boot).
I have to agree. Something to get people to notice the handset, think it's unique, but it really doesn't add much value.
I paid for a Google Daydream headset. Half a year ago it stopped working on their supported device (Pixel 2 - driver issues). Now they say the Pixel 4 won't support Daydream because people weren't using it, of course they weren't if they don't support it properly. They release proto tech as marketing gimmicks, which they may have long term plans for, but it's not fair to ordinary consumers to create interim abandon ware like this.
Exactly it looks cool, but no one needs to make fancy tiring gestures to control the app when you can just do it by a few taps, not the mention most people already know how to navigate touch screens with gestures and taps.
Consumers need to boycott this stuff. This chip has one purpose: surveillance. All the "features" on this press release are contrived and useless, but they are hoping it's shiny enough that we like it and want it.
Why do consumers "need" to boycott this stuff? I want this stuff, and I want it cheap and readily available -- and hopefully more advanced over time. The "features" on this press release look useful to me as a start to something much better later.
Some people find joy in maintaining their privacy. They also ruin some of the fun, because "regular" people (non privacy enthusiasts) find joy in cool new technology like this. They make a good point, but they also probably didn't read the documentation, because it clearly says sensor data is never sent to Google.
While I find its inclusion to be a pointless gimmick, I'd like to point out that this chip does not increase your phone's ability to do surveillance.
It's going to have much poorer reach and performance than, say, the front-facing camera. The benefit of such a device is mainly that it's much better at distinguishing gestures in 3D space, regardless of lighting.
The first is learnability. A big problem with gestures is that there is no clear affordance as to what kinds of gestures you can do, or any clear feedback. For feedback, one could couple Soli's input with a visual display, but at that point, it's not clear if there is a big advantage over a touchscreen, unless the display is really small.
The second is what's known as the Midas touch problem. How can the system differentiate if you are intentionally gesturing as input vs incidentally gesturing? The example I used was the new Mercedes cars that have gesture recognition. While I was doing a test drive, the salesperson started waving his hands as part of his normal speech, and that accidentally raised the volume. Odds are very high Soli will have the same problem. One possibility is to activate Soli via a button, but that would defeat a lot of the purpose of gestures. Another is to use speech to activate, which might work out. Yet another possibility is that you have to do a special gesture "hotword", sort of like how Alexa is activated by saying it's name.
At any rate, these problems are not insurmountable, but it definitely adds to the learning curve, reliability, and overall utility of these gesture based interfaces.