Data Science Can’t Fix Hiring (Yet)

January 8, 2020

The newest development in hiring, which is both promising and worrying, is the rise of data science-driven algorithms to find and assess job candidates. More than 100 vendors are creating and selling these tools to companies.

Unfortunately, data science is not yet what employers hoped for. Vendors of these new tools promise they will help reduce the role that social bias plays in hiring. Algorithms can indeed help identify good job candidates who would previously have been screened out for lack of a certain education or social pedigree. But these tools may also identify and promote the use of predictive variables that are (or should be) troubling.

Because most data scientists seem to know so little about the context of employment, their tools are often worse than nothing. An astonishing percentage build their models by simply looking at attributes of the “best performers” in workplaces and then identifying which job candidates have the same attributes. They use anything that’s easy to measure: facial expressions, word choice, comments on social media, and so forth. But a failure to check for any real difference between high-performing and low performing employees on these attributes limits their usefulness. Furthermore, scooping up data from social media or the websites people have visited also raises important questions about privacy. The information can be accessed legally; but the individuals who created the postings didn’t intend or authorize them to be used for such purposes. Is it fair that something you posted as an undergrad can end up driving your hiring algorithm a generation later?

Yet another issue is that all analytic approaches to picking candidates are backward looking, in the sense that they are based on outcomes that have already happened. (Algorithms are especially reliant on past experiences in part because building them requires lots and lots of observations—many years’ worth of job performance data even for a large employer.) As Amazon learned, the past may be very different from the future you seek. It discovered that the hiring algorithm it had been working on since 2014 gave lower scores to women—even to attributes associated with women, such as participating in women’s studies programs—because historically the best performers in the company had disproportionately been men. So the algorithm looked for people just like them. Unable to fix that problem, the company stopped using the algorithm in 2017. Nonetheless, many other companies are pressing ahead.

Hiring is so consequential that it is governed not just by legal frameworks but by fundamental notions of fairness. The fact that some criterion is associated with good job performance is helpful, but not sufficient for using it in hiring. Take a variable that data scientists have found to have predictive value: commuting distance to the job. According to the data, people with longer commutes suffer higher rates of attrition. However, commuting distance is governed by where you live—which is governed by housing prices, which relates to income and also to race. Picking whom to hire on the basis of where they live likely has an adverse impact on protected groups such as racial minorities. Unless no other criterion predicts at least as well as the one being used, and that is extremely difficult to determine in machine learning algorithms— companies violate the law if they use hiring criteria that have adverse impacts. Even then, to stay on the right side of the law, they must show why the criterion creates good performance. That might be possible in the case of commuting time. However, at least for the moment, it is not for facial expressions, social media postings, or other measures whose significance companies cannot demonstrate.

In the end, the drawback to using algorithms is that we’re trying to use them on the cheap: building them by looking only at best performers rather than all performers, using only measures that are easy to gather and relying on vendors’ claims that the algorithms work, rather than observing the results with our own employees.