The minute a tech pundit starts dismissing new technology, they lose touch of their field and lose relevancy.
I’ve always followed the “Evolve or Die” axiom, and try to learn one new thing a year. So, often I touch on those that say, “I can’t” or “This (insert new technology) won’t work” and hold them up as examples of how defeatist attitudes can’t accomplish anything truly worthwhile because they’ve given up and proclaimed failure before actually trying.
So, in relation to this entry, a tech journalist online-friend of mine frequently tweets about her site’s latest article. Last night I had the honor to read why the as-of-yet-unannounced iTV (Apple’s rumored iOS based television set, and the natural evolution of AppleTV and iOS) will have an unworkable or unreliable Siri/Gesture based interface, by Aaron Kraus. Luckily (since it was getting late), or unluckily, I only had a 140 character rebuttal to work with. But now that I have the time, a proper response is in order. Here’s why and how this guy got it “oh so wrong.”
The article starts with a funny 30 Rock clip with Alec “don’t interrupt my scrabble knockoff playing!” Baldwin’s character, Jack Donaghy, trying a voice-controlled TV that accidentally responds to all the dialog on a fake Law & Order: SVU (or Law & Order: Yet Another Version, as I like to call it) where Ice-T’s and Richard Belzer’s lines include phrases like “off,” “mute,” “volume up” and “delete everything on my DVR.” With Jack counteracting the mistaken commands. But when did tech parody make the jump from “exaggerated situation for comedic effect” to “actual argument to base your criticisms on?”
Kraus begins his critique on this fictional problem by saying people have proposed keywords to help the TV filter out dialog from being interpreted as commands. Then he points out that keywords defeat the purpose of a natural language interface. Either he’s unaware or purposely ignoring the fact that any voice-activated TV could easily include cheap circuitry to flip the audio signal played 180° and feed that to the listening circuit to cancel it out, just like XLR and noise canceling headphones work. Problem solved.
But wait, theres more!
Next, Kraus mentions gestures that might be misinterpreted (such as stretching to raise the volume to maximum). That could be a legitimate concern. But I would think the engineers that program gesture recognition would account for this possibility and build in either gesture filtering or allow “attention” gestures followed by command gestures. Even better, they could include the possibility of adding user-programmable gestures for people who are either more tech-savvy or have limited mobility (a group that is often an after-thought with new tech interface paradigms).
I would love to make the “devil horns” sign and have the TV change the channel to my favorite music source, like iTunes, Pandora, satellite radio, MTV, etc. I can easily see a dual-pronged approach: gesturing thumbs up and saying “volume up” at the same time, gesturing the cut signal and saying mute, or just tapping your iDevice’s on-screen mute. Again, why limit the interface to one way of doing things? Kraus, thinks such gestures would add up to an aerobic workout circus of awkward movement, completely forgetting we control the gesture conventions.
Hey, you want to know a really neat way to add gestures and educate people on something that could be useful at the same time? Make one of the gesture paradigms based on sign language. Not only would five year olds learn to sign for the TV, but when they encountera deaf person, they’ll be able to communicate as long as the communication is “next”, “favorite” or “off.” It’s not perfect, but you get the idea. What’s the sign for “Conan” anyway?
But that’s one of many approaches off the top of my head, that took about as much time to think of as it did to finish reading the sentences criticizing things that aren’t even finalized.
Other possibilities: Apple loves one-button devices. Who’s to say Tim Cook and Jonathan Ive won’t do something wonky that is initially criticized, such as make the remote a single button that gets Siri listening with a tap or a click and hold? Let’s add in a double click to pull up the program guide or a triple tap to turn on or off the TV. It would be cheap to include in the TV and require a bare minimum of parts and either a bluetooth or BT/WiFi chip — IR is so 1980s.
But with so many people owning iDevices, why not include those as potential remotes? A simple free app download and iTV control is as far as your iPhone/iPod/iPad. Want the program guide? Either press the iDevice button so you can look without PiP’ing your current TV show or turn the device sideways for a program listing. Easy and logical enough right?
Okay, back to Kraus: At one point he says that many people don’t want to sit there and either keep saying “page down” or gesture through hundreds of channels? I’m assuming either he doesn’t know about recommendation apps (for example: BuddyTV and its iDevice app), program and channel filter features or apps that let you mark favorite shows and automatically remind you or record them if you aren’t watching, or both. Such apps are already available on the App Store from cable and satellite providers, and independent Television apps that can tie-in with some providers. It would be trivial to add an API that allows all these apps that control the iTV to do what they all do well, slightly differently.
Why say “page down” when you could say, “Siri, what Comedies/Movies/Documentaries are on?” or “Siri, when is the next showing of NOVA?” or even “Siri, find me something I would be interested in watching.” Siri would think a second, check your viewing history, pull up either a list of shows that others with your viewing patterns also watch, or take a cognitive leap and figure that if you love comedies and you love Star Trek, you might just like Galaxy Quest airing on TBS in five minutes. During the fall and mid-season launch windows it would be a boon to be able to say, “Show me only new shows airing this week and add them to my queue and alert calendar.”
Honestly, the Remote control interface is a great interface if you are working with a dumb device, but when you introduce both cognition from CALO, voice control and something with as finite a data set as what TV shows are on, the sky’s the limit. You don’t need, nor would you want an interface paradigm based solely on a button interface when you have context-sensitive voice parsing and infinite sentence construction. I, for one, look forward to saying, “Siri, block all home-shopping channels.” And then “Siri, make me a one-page list of my most watched channels and make it the default program listing when I ask for the program guide.” Or “Siri, add this show to my favorites” (when I’m watching something, but don’t know what it’s called.)
I haven’t even touched on the additional intelligent behavior one could program: automatically muting commercials, switching to another program during a commercial and automatically switching back when the program returns. Optionally, iTV could auto-filter age-inappropriate content for minors based on it recognizing the child watching alone. iTV could show related Web articles for the show you just watched, and add them to your Instapaper queue. So to answer Kraus’s question: “Do we really need an assistant built in our TV?” I say, “Hell, yeah!” Just because he can’t think of practical uses for such a feature (mentioning restaurant choices — really that’s a feature for our iFridge!), doesn’t mean there are not any, nor does it mean that it shouldn’t be done. For instance:
How hard is it to find “Real Housewives of Kalamazoo” that we need our assistant to find it for us?
Not very, but what if you don’t even know that its spin-off Real Divorces of Beverly Hills is premiering tonight on Bravo? There are no moral or ethical problems with doing this. Once again: Don’t limit interaction to one vector, include multiple user levels, and make any privacy invasive features opt-in. In short, allow the user to determine their level of comfort and no more. Oh, and if you want to gather ratings like Neilson within this iTV package to both improve marketing and allow for better suggestions, for goodness sake give people that opt into this a $10 or 15 percent discount on their monthly cable/satellite bill!
I’ll leave you with these quotes from the article contrasting historical declarations paraphrased for perspective:
The method of interaction…does not really need fixing.
GUIs are unnecessary, and less efficient than the command line. Real users will stick to CLIs.
Both true from a limited point-of-view but missing the advantage of an easy to use interface—missing the advantage and mass appeal of such interfaces. And again:
Voice control is a useful technology, but predictions of its ubiquitous use are unlikely to ever come true.
This certainly is a possibility, but I think there are more unlikely things that have come to pass.
What are your thoughts on both the future of voice and gesture interfaces? Let me know by posting a comment below.