Things I Can't Get Around To Do, Part One


Given my limited free time and some lack of enthusiasm regarding spending hours sitting in front of a computer when I could just as well listen to some music and enjoy a good book, I don't get to do as much coding as I'd like. Sure, it's only a hobby, but after a while you start getting somewhat frustrated at not actually being able to finish stuff.

And that is especially true when you just can't stop coming up with new ideas, or new twists on old ones. I'm one of those people that can't stop wondering "what if...", and when enough things pile up, I have a hard time splitting my time between stuff I need to do (like my growing collection of hacks and my never-ending new Wiki engine) and exploring new ideas for just long enough to see if they'll fly.

So I decided to flush out a few of the ones that keep annoying me. There is a bunch of them, ranging from having a decent, bandwidth-optimized remote desktop solution for the Mac (something with decent support for international keyboards, but based on cross-platform protocols like VNC or RDP, not the stuff Apple supplies with the OS), to using Skype as a transport mechanism to build CSCW applications (or just do simple file-sharing atop it across NAT-enabled firewalls).

But there's plenty in between, and I've spent a couple of days mulling this particular one:

iSight Mouse

The concept is simple: Every modern Mac has a built-in iSight, and more than enough horsepower to do image processing on a live video feed. So why not use it to do motion tracking and control the mouse pointer?

Background:

The last time I gave this some thought (a couple of months back), I bookmarked this page as being the first hint that someone had thought it through and identified the main issues.

While researching this again during the past couple of days, I noticed that the basic idea was also proposed as part of the My Dream App tomfoolery (which ended up picking a couple of hare-brained schemes, of which Portal is the only one that is really useful).

There are also a few EyeToy-like games for the Mac, but these are far too simple - and most of them use motion detection in pre-defined areas of the screen, not true motion tracking. In fact, a lot of people confuse motion detection with motion tracking (including a few who really ought to know better, and think doing fancy video effects is "tracking" - they're just detecting edges in motion).

And that's the real rub, because motion detection is a relatively simple matter - you simply look for differences between frames (give or take some filtering). Motion tracking requires you to go a lot further and identify specific features you want to track as they move across the camera's field of view, making sure you can separate them from a (relatively stable) background.

Why You Should Care:

Sure, we've all seen Minority Report by now. And everyone who does presentations regularly would just love to wave a magic wand and have the mouse pointer move accordingly, etc., etc.

But there is a deeper, much more sensible reason to do this:

I've been interested in Assistive Technologies for years now. It's one of those things you end up taking more than a cursory glance to when you're as myopic as myself and have a feeling of what it might be like to be blind, deaf or immobilized (I apologize, but I don't really subscribe to the "X-impairment" terminology - I have a feeling it developed as a way to make "able" people less sensitive to the issues, and as a form of linguistic detachment that I don't subscribe to).

As such, I see this being really useful for people who can't use a "regular" mouse, and who could buy a cheap USB camera (or use the iSight on the computer they already paid for) instead of spending a small fortune on a tack-on solution.

Sure, there is ample opportunity for delivering presentations and doing different sorts of UIs, but I kind of look down on it being used purely for gaming and entertainment. I understand the sad reality of people preferring to buy - and sell - things that entertain a larger number of people rather than solving a minority's troubles, but it doesn't really have to work that way - there is such a thing as a relevant minority.

Starting Points

There is far too much information on Computer Vision out there for me to try to summarize, and most of the stuff I learned is years out of date, often replaced by much simpler and effective techniques. But some clicking around reveals there are plenty of commercial solutions for Windows, and very little (in practice, zero) solutions for either Linux or Mac OS X. None of them are cheap, and there is very little source code available.

Which is pretty surprising on its own, since I was kind of expecting something usable from the Open Source community. Maybe I haven't looked hard enough, and if there are any free solutions out there, I'd love to know.

But there are a few things I did find, and that might come in handy:

  • First off, this video, which is the reason I've been looking at this idea again. The comments on the original post are thought-provoking as well, but the technique used is easy to understand, and might be easy to code for.
  • This WebCam Mouse, done in Visual Basic. If Windows can do this in VB, why can't Mac developers leverage Mac OS X APIs to do one better?
  • camMover, the closest thing to actionable motion tracking I could find in Flash (again, there are plenty of motion detection examples, but very few attempts at tracking motion).
  • A Python-based motion detector. Again, this isn't enough, but it's a starting point.
  • A Python-based motion tracker. Very, very interesting.

I also have a few more links in my Assistive Technologies page, including links to Windows-based head tracking solutions and stuff like CamTrack (for Linux).

A "Twiist" on the Above

I've also been wondering about the Wiimote, and whether its Bluetooth is standard enough for someone to put together a Mac OS X driver to read accelerometer data.

Using the iSight in lieu of the sensor bar might be workable (depends on whether the new iSights are IR-sensitive and can track the remote's IR beacon), and coupling that with the accelerometer data would provide enough information for basic six-axis control.

Update: Actually, it seems that the sensor bar is actually a "beacon bar". So this would have to be done solely with accelerometer data.

It would require stripping down the remote for people who can't grasp it or hold it steady, but the things are relatively cheap, and provided the software was made free US$30 would be a sensible expense for lots of people.

The whole thing has been hinted at before, but there is zero useful information out there (at least until some enterprising hacker starts doing Bluetooth sniffing on the things).

Come back later this week for another of my zany concepts, and why someone in the Mac OS X or Linux developer communities ought to code them.