What We Learned from Apple’s First A.I. Paper


Siri has long been the A.I. that people didn’t envision as an A.I., thanks to Apple’s PR department (the company framed Siri as a “personal assistant,” without emphasizing its machine-learning aspects). Nonetheless, it remains the artificial intelligence backbone of whatever’s happening inside your iPhone, iPad and (now) Mac. Now, in an attempt to garner some attention within the A.I. community, Apple has published its first paper on how it’s using A.I. for image detection.

The first hints that Apple was set to begin publishing its A.I. research came via an inside look at the Siri team via Backchannel. At the time, Apple executives acknowledged that company had been a bit too protective about its A.I. findings, and would relax its standards a bit.

The paper, titled “Learning from Simulated and Unsupervised Images through Adversarial Training,” details how Apple takes user-generated images to help build computer vision. (Other tech firms, most notably Google, use ‘synthetic’ images from sources such as video games or movies to help their A.I. platforms learn to distinguish objects, such as a tree from a dog.)

Apple’s experts believe user-generated images are a faster way to get a computer to recognize details in an image, but ultimately a lazier one: “Learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions,” the paper mentioned. In addition, images are “often not realistic enough, leading the network to learn details only present in synthetic images and fail to generalize well on real images.”

Incidentally, what Apple details in this paper is likely already in practice. At WWDC 2016, Apple debuted an image-scanning feature for Photos, which locally scans your images to identify faces, places and objects (like trees!). It doesn’t feed your images to the cloud, or otherwise open them up to a mother brain (as Google does with its own Photos app).


How Siri Could Beat the Competition

There’s a lot to unpack within Apple’s paper, but two key themes stick out.

The first is “adversarial training,” commonly referred to as “adversarial machine learning.” The high-level takeaway is that adversarial training involves machine learning (scanning images, for instance) with a heavy bent towards security – something very important to Apple.

Adversarial training is often used (however effectively) for spam filtering. But given how iPhone and iPad users are currently suffering a spat of calendar spam, it seems Apple’s machine-learning technique for murdering spam isn’t fully implemented internally just yet. Whoops!

Joking aside, adversarial training adds a bit of “self-doubt” to machine learning. It attempts to bork algorithms and attack systems in order to discover vulnerabilities and weaknesses, which (in Apple’s image detection universe) can be used to help Siri correctly identify items in images more often than the competition.

Second, Apple is implementing “Simulated and Unsupervised” learning (S+U). In the paper, the company details its aim with S+U: “The task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator.” It also utilizes Generative Adversarial Networks, which tries to parse labels the A.I. associates with an image and compare them to the image itself without using vector scanning, which often misidentifies people or objects.

“We develop a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors,” the paper continued. “We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images.” According to Apple, this enables the generation of highly realistic images.

“We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation,” the paper added. “We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.”

The English version: Apple is taking on A.I. slowly in an effort to build a more perfect model that examines actual objects and images. Rather than sprinting hard into the realm of generalizations, like some other tech firms, Apple realizes A.I. is a marathon, and is willing to hold Siri back in these early stretches in an attempt to make it that much more effective.

Apple Is Just Getting Started

The surprising thing is that Apple published a paper at all. It kept its A.I. work mum for so long, many wondered if the company was having trouble attracting specialized talent in artificial-intelligence platforms; those who participate in A.I. and machine learning want to publish their findings and contribute to the community. Although Apple prefers to keep its intellectual property under wraps as much as possible, it eventually had to concede to its experts’ desire to publish.

Our takeaway from the paper should be that Apple is still Apple: unwilling to waver on security and privacy, and tightly controlled when it comes to opening up about what it’s really doing. What we can glean from all this is that Apple is serious about A.I. for the long haul.