/Big techs AI runs on datasets built by the impoverished

Big techs AI runs on datasets built by the impoverished

Capitalism has a nasty habit of anchoring those beholden to it in an equilibrium where both the very poor and the very wealthy become similarly entrenched.

A scientist making six figures in San Francisco pushes the boundaries of AI so a trillion dollar company can stay on top. And at the bottom of it all lies legions of impoverished workers doing 80-percent of the work. Some of the workers are working their way out of poverty. Sadly some are not. The primary difference being whether they have job security and receive a living wage.

One of the world’s largest sources of dataset-labeling employment — the primary job performed by these workers — is Samasource. The company serves a quarter of the Fortune 50 and much of big tech including Facebook, Microsoft, and Google. Its datasets fuel everything from fashion AI to self-driving cars.

And, unlike predatory crowd-sourcing companies that pay workers the lowest possible wages on a job-by-job basis, Samasource offers their workers security and a guaranteed living wage.

Credit: Samasource

This kind of work is often performed by Africans and Southeast Asians who’ve been displaced or become unemployed – often those who’d previously been farm hands and unskilled workers. These are people who often have no choice but to accept crowd-sourcing jobs that pay as little as one dollar an hour – effectively they’re easy for companies with lots of jobs that require very little skill to exploit.

It doesn’t have to be this way. Samasource relies upon international advocacy groups to determine what a living wage is based on the worker‘s geography. And then it pays the workers at least that much. Big tech companies are happy to hand off the work – they can source data ethically and save money doing so.

Best of all, the workers gain confidence and get an entry-level job with security. Something many of them have never had. This is one instance where AI is creating jobs, rather than taking them away.

This is because the datasets used to train AI models require human annotation. No matter how smart machines are, they need a person to tell them what they’re looking at. There currently isn’t any way around this.

So when you read that a team of scientists fed an AI millions of images of stop signs, cars, and pedestrians in order to train a neural network to recognize them in the real world, that means that humans drew a box around millions of stop signs, cars, and pedestrians in images. And then a company like Samasource built a dataset based on those human labels.

Credit: Samasource
Samasource’s SamaHub platform in action

There’s plenty of crowd-sourcing jobs available involving data annotation. But few of them pay a living wage and even fewer offer the opportunity to work on some of the world’s most important AI projects.

TNW spoke with Loic Juillard, VP of Engineering for Samasource. He told us the work that he and others do there is rewarding not just for the impact the company has, but also for the actual work it produces:

What’s exciting is the opportunity to build successful AI models. This is all new! There’s a slew of engineering problems that can only be solved by AI.

On the other side of the equation lie companies that feel creating jobs, by itself, relieves them of the moral obligation to pay workers a fair, living wage.

James Cham, a VC at Bloomberg Beta, told Axios:

The companies derive benefit over a long time, while workers are paid just once. They are paid like sharecroppers, making subsistence wages. The landowners get all the returns because of how the system is set up.

At the end of the day if you can’t afford to pay workers a fair wage, the jobs you create might not be suitable for humans. We all want future tech like self-driving cars and brain-computer interfaces – but not at the cost of our humanity.