Main image of article Recognizing (and Solving) Bad Algorithms

In theory, an algorithm is supposed to be an impartial set of instructions that helps ordinary, flawed humans draw conclusions from a sea of data. But such algorithms only exist in theory. In the real world, an algorithm is not an objective tool; it is as human as the programmer who codes it—and humans are biased. Stories about flawed algorithms crop up in the news on a regular basis. For example, the local government in Broward County, Florida, used an app to assess the likelihood of criminals offending again; the software mistakenly assigned a lower risk score to a career criminal than a first-time offender, simply because the former was white and the latter black. Or take Facebook, which switched to an algorithm to select “trending” news stories. Although the human Facebook employees previously tasked with selecting those stories had been accused of anti-conservative bias, the algorithm proved no better, often choosing fake news stories for the feed. These sorts of mistakes can result in the wrong people ending up in jail, or tipping elections. Even if the error doesn’t have a life-or-death consequence, it can still have wide-ranging effects. Businesses rely on algorithms to crunch through large datasets and make predictive decisions on marketing products, services and advertising—and they are not eager to make mistakes. So when programmers work on an algorithm, how do they make sure their code isn’t biased? That’s a conundrum that many in the tech community are trying to solve.

Watch Your Language

Let’s start with something as ostensibly benign as search. While a casual user does not generally think of his or her queries as “loaded questions,” search algorithms can nonetheless return gender-biased results. “Man is to Programmer as Woman is to Homemaker? Debiasing Word Embeddings” (PDF) is a paper that looks at how machine learning can be swayed by word choice when the words in question have male or female associations. The paper’s authors used the publicly available word2vec embedding, “trained” on Google News, in conjunction with a dataset of 3 million English words. “Taking data from the Web and making hiring decisions [based on it] is setting (yourself) up for the problem,” said Adam Kalai, one of the paper’s authors and a principal researcher at Microsoft Research. One example offered by the paper is a hypothetical search for a particular university’s list of computer science Ph.D. candidates. The search could easily retrieve results for every name on the school’s list, but the rankings may be skewed because the search program might associate a woman’s name with a typically female job (homemaker) as opposed to a computer-science student. In this hypothetical example, the usage of word embedding makes it even harder for women to be recognized as computer scientists and would contribute to widening the existing gender gap in computer science.” the paper read. “I think a computer can be de-biased more consistently than a human,” Kalai said. Algorithms can be audited and de-biased, told to ignore age and gender, whereas hiring managers would have to work against their own bias while hiring. The algorithm will “naively” fit the data to the language within the dataset, added co-author Tolga Bolukbasi, who is now a research assistant at Boston University. “You have to formalize what ‘bad’ means or what bias means.” “Algorithms can reflect a bias if enough people write it that way,” Bolukbasi continued. One approach to solving this issue: change the process of de-biasing. It might be better to craft the algorithm to ignore a biased output. “It’s going to take some time for best practices to develop,” he explained.

Watch Your Prices

How about algorithms that engage in price fixing? Competition is the bedrock of capitalism, so it’s a crime when competitors fix markets at the expense of the customer. Ariel Ezrachi, a professor of competition law at Oxford, and Maurice Stucke, an associate professor at the University of Tennessee College of Law, pondered the implications of artificial intelligence and market fixing in a paper titled “Artificial Intelligence & Collusion: When Computers Inhibit Competition.” They examined the problem at greater length in their book “Virtual Competition,” recently reviewed in the Wall Street Journal. Ezrachi declined to be interviewed for this article, but the paper he co-wrote with Stucke outlined the problem and its implications. Combine artificial intelligence’s ability to learn from past transaction data with real-time pricing operations, and you have the potential for computers to collude, either knowingly as per the instructions of the humans behind the machine, or unknowingly given the interplay of action and reaction between computers and other computers. It is the potential of the “Autonomous Machine” that Ezrachi and Stucke flagged as the biggest challenge to anti-collusion law. When price-fixing involves humans, one must show there was express agreement and accommodating behavior among the parties. But things become more subjective if a prosecutor must show that computers “agreed” to fix a price. Without any clear sign of human intent, how can one prove that a computer acted illegally? “Policymakers must recognize the dwindling relevance of traditional antitrust concepts of ‘agreement’ and ‘intent’ in the age of Big Data and Big Analytics. Rather than redefining agreement or intent, perhaps policymakers need to introduce checks and balances into the original pricing algorithm and a monitoring function,” the authors concluded in the paper.

Watch Your Data

What about the company that wants to use algorithms and machine learning just to sell consumers more stuff? A public relations disaster, or at least a marketing embarrassment, is possible if the algorithm powering that mission is faulty (and the data more faulty). “A.I. models are the product of human and computational cognition,” said Matt Bencke, founder and CEO of Spare5, a firm that “tutors” artificial-intelligence (A.I.) systems for corporate clients. Bias can enter the system when the A.I. model is created, either by the model itself, when training the model, or integrating the model into real-world activity. During that last stage, when the A.I. finds itself interacting with the larger market, is when creators often notice biases they should have caught in testing, Bencke added. Spare5 got a taste of this when it trained IBM’s Watson about the game of golf. The original plan was to use Watson’s output as part of the coverage of the Master’s Tournament in Augusta, GA. “Most of the time, it worked. Sometimes it was hilariously wrong,” Bencke recalled. “The best A.I. integrates with the right human in the right loop,” he continued. “You need to have a qualified set of people giving feedback to the A.I. model in deployment.” If you want the A.I. model to stay current, you will also need to retrain it periodically. A good example of an A.I. problem occurred at GumGum, a digital advertising firm that provides “in-image” advertising. Working for a nationally known cosmetics company, GumGum was tasked with picking images of models that showed “full lips,” recalled Cambron Carter, manager of image technology. “Traditionally, you go handcraft a feature to capture (a picture) of someone with full lips,” Carter said. Instead, they decided to leave that task up to GumGum’s machine-learning capabilities, after inputting pictures that the A.I. would use to “learn” the feature “full lips.” After several million runs, the programmers got their results: all the pictures were of women. “We built in a bias without really thinking about it,” Carter said. “We did not think through having examples of both genders… We were looking for full, pouty lips. We had to define what that meant.” People who are using neural networks “have a choice of data. We can’t penalize it (the A.I.) for giving a biased output,” Carter continued. “If I handcrafted it, the bias would have come from me.” So when an output deviates from an expected result, the first thing to do is check the data; the output may reflect the biases of the people who choose the dataset. If you want to build an A.I. that “reflects” ourselves, “it starts with us,” Carter noted.