This is a shape from shading approach, which will only work if you're in the dark, with the phone as the only illumination source. If white dots are shown in different places on the screen, and assuming the phone and subject don't move during the process, surface normals can be computed from the resulting images, and once you have the normals then the shape can be approximated. Normals can be found by calculating the angle of maximum reflectance for each pixel for a series of images under different illuminations.
What are good resources for looking into converting photos into 3d object models? I'm really impressed by Shapeways but would like to take current real-world objects as a base rather than building them in a 3D tool etc.
I've seen the laser scanners but they're relatively expensive. Is there anything like a mount that takes two iphones, and software that can take those two photos to create atleast a projection?
There are a lot of approaches. Yes there is software that takes two photos and let you reconstruct 3d. If you have a man-made object you probably don't want a generic point-cloud building approach (like say http://www.photosculpt.net/ ) - but rather something like Photomodeler that lets you select vertices and build up surfaces.
There is also software that builds models from silhouette methods, that usually require you printing out targets which you place the model, so of like a manual turntable approach. I can't remember the name of software that does this at the moment, but its out there.
Or you could work from video (just from a single camera) that you move around. That can be effective, not sure of the best software for doing this - it's called structure from motion, a quick google shows some source at this project site: http://phototour.cs.washington.edu/
I've also seen some do it yourself structured light software (ie bring your own projector and camera), that seems to work ok.
It kind of depends on what size and type of objects you want to scan - things like the surface properties could be important - also how long you can keep it still.
If you're working with real-world objects, one approach is to place the object of interest on a turntable, allowing you to capture multiple perspectives with a single camera/sensor. Philo Hurbain has made some delightfully clever LEGO NXT 3D scanners this way (delightful especially because they're used, in turn, to digitize the shape of complex LEGO parts) - one using a needle probe, and another using a laser.
Probably the simplest way of obtaining 3D data suitable for turning into an object model would be using a Kinect.
For large objects, like buildings, you would need either expensive laser scanners or something which solves the multi-view stereo problem. There are systems capable of doing this, such as Photosynth, but in general it's quite involved with no easy solutions.
For medium range models, such as automotive applications, you could have a pair of cameras aligned in parallel and connected to a PC/laptop then use a utility like v4l2stereo.
If you're not already familiar with OpenCV, it's a rather good (and well documented!) open source image processing library. It's written in C++ but has bindings for a bunch of other languages too.
Cool: that it works in a decent way, and uses a neat trick.
Uncool: You can't export the image into a 3D file, making it 99% less useful that it would be otherwise.
Suggestion: export it as VRML, it's trivial format that you can generate starting from your points. Use this format (from my own code, so use it as you wish):
Wow, there's been a VRML sighting! As an old dude, VRML reminds me of the worst of 90s web hype. Trying to leave that aside, I also don't think it has a lot going for it to keep it around as a format.
I prefer the even simpler Wavefront OBJ format as a scratch format. It's about as printf-compatible as it gets and is supported in many more places than VRML.
Good point, but unfortunately in wavefront .obj there is no "elevation mesh" alike object, so it's a bit more complex than this, but still not too hard.
Btw while VRML as a plugin and the idea of a 3D web sounds now 90s, the data format itself is too bad.
Cool app. But I have a pet peeve with information pages that only provide video as a description. Whilst I understand that especially for an app like this, the best 'information' is to demonstrate how it works via a video, it doesn't help people a. just want to get a quick one line explanation of what your app does and/or b. can't watch the video at the time (slow connection, work restriction etc).
If I hadn't been at home, I would have just left the page and moved onto to something else - missed sale.
Thanks for the feedback. I was expecting to have this weekend to pull everything together but Apple approved the app in 2 days (which is great, but 5x faster than for my previous submissions).
Good question. Version 1.0 only saves/emails images, but trading 3D scans and exporting the raw mesh is certainly on the feature list for future versions.
I tried it this morning in the pitch black. It worked pretty well. The effect was mostly comical with some distortion. The most interesting aspect is how people are taking all these things that at first pass would be considered "impossible," applying some ingenuity and hard work to them, and pushing the limits of this technology.
Looks like it just uses the reflected intensity to estimate the depth, then pastes the original colormap on top of that? Incredibly clever and simple hack.
If it were that easy, the eyebrows and lips would be sunken inward because they are darker then the surroundings.
More like (I'm guessing) it uses the change in reflected intensity with respect to the change in light position to estimate a surface normal (e.g., forehead is the same for all light positions, left side of your nose is bright when the left light is on, dark when the right light is on, right side of nose vice versa, etc). It then finds a surface that agrees with these normals by solving a partial differential equation. Clever, not simple. (Although I guess it's a compliment to say they make it look easy.)
I assume it displays varying shades of full-screen grey. There's no public brightness API. You're free to do it outside the app store, of course. (you can find the private API calls pretty easily on Google)
It looks like it is just sending a total blank white screen while it takes a picture. Sure you can change the intensity-- use a gray pixel instead of a white one.
The fact that this only works in the dark made me wonder, how much of what the Kinect does could be possible on an iPhone, and what would be needed to get there? 2 Cameras? What else?
If the phone had two cameras spaced some distance apart then you could use stereo vision algorithms to get a similar result. Maybe it would be possible to put an infrared laser on a phone, similar to the Kinect, but I expect that power consumption would become an issue (although if it's only in a momentary blast it might be feasible).
There are other possible methods, such as the "photo popup" created at CMU some years ago, but these rely heavily upon "dodgy heuristics" and often fail to give a good result.
If your subject doesn't move, two cameras can be approximated by taking two pictures a set distance apart.[1]
Also note that the method used by this app is very short ranged, and uses the LCD's backlight while it's imaging, so there's zero feedback, making any kind of interactive application impossible.
1: There are third party accessories to make this easy and repeatable, built with varying levels of quality. Here's a fairly cheap one, circa 2003: http://www.dansdata.com/photo3d.htm
Or what about using built-in motion sensors to record the relative camera location and orientation of a series of frames captured with an e.g. iPhone camera? I don't know exactly how the accelerometers and gyros work, or what sort of data they provide (linear distance vs just orientation changes?), but imagine holding down a "scan" button as you simply swing the phone around a subject to capture a series of images. I would think it would be possible to reconstruct 3d surfaces (at least under suitable illumination conditions, I guess) given known camera location/orientation for each frame. Pushbroom stereo, in remote sensing parlance...
Tracking movement in 3D using dead reckoning is apparently very inaccurate, with the iPhone sensors I wouldn't expect it to be accurate for more than a few seconds at best. I visited a startup working on the problem a few years ago, and they had problems even with dedicated hardware.
You could interleave the illumination of the subject with each photo taken
1. Illuminate
2. Take photo
3. Show photo
Repeat.
If step 2 is really short, you could have the illumination phase be much shorter than the show photo phase and the user of the phone would see it as 4 short flashes over their own photo.
See this Google video for a similar technique. http://www.youtube.com/watch?v=rxNg-tXPPWc