There's a fascinating article in today's Globe and Mail Report On Business about the whole industry that has grown up around teaching artificial intelligence (AI) how to be more human.
I knew that it happened, of course. Generative AI learns independently from the internet to some extent, but humans wrote the internet (for the most part), and AI applications and "large language models" (LLMs) like OpenAI's ChatGPI and Google's DeepMind pride themselves on appearing as human as humanly possible. So, some direct teaching by real people is essential (although that is also where much of the concern about AI learning bad habits, racist attitudes, etc, comes from).
But I had never thought about how that process of teaching and fine-tuning actually worked. It turns out that there are numerous AI companies around the world (with names like Surge AI, Scale AI, Remotasks, Cohere and Data Annotation Tech) who employ many thousands of people on a piecemeal basis to check on AI output and to provide human inputs that AI can learn from.
Known as "reinforcement learning from human feedback" or RLHF, this work is painstaking, ongoing and essential to AI development. It is particularly important in the battle to reduce bias in AI responses, as well as to improve accuracy, although the people who provide AI with feedback can of course bring their own biases, particularly on matters of opinion or where there is no single definitive answer. In that case, the "majority wins", potentially disadvantaging under-represented groups.
Left to its own devices, AI can come to some alarming and apparently illogical conclusions. One example quoted concerns an AI algorithm that was developed in 2017 to identify skin lesions and cancers from photographs, which found it easier to identify lesions when there was a ruler in the photo to scale it, but then inadvertently concluded that all rulers were dangerous and malignant. As we have all read, chatbots like ChatGPT, sophisticated as they are, can still give harmful information and make appalling factual errors. Which is why there is an ongoing need for human checkers.
These "raters", as they are often called, are given tasks like comparing two paragraphs of AI-generated text for the most human-sounding one, labelling pictures or video clips with the names of body parts, choosing the best definition of a word from a selection, or drawing boxes around discrete parts of a picture, etc. Some of these tasks may take hours or even days to perform, some may take literally seconds, and raters may be flagged for poor performance if they take too long over a particular task (or too little time!), and they may have their pay docked or even have their contracts terminated if they do not "perform" (i.e. follow the more-or-less arbitrary rules) adequately.
Tasks tend to come in piecemeal, and at sporadic and unpredictable times, and raters may spend hours or even days just waiting for tasks to come in (and then be swamped for a while). It is the ultimate in work-from-home gig economy employment, and pay rates may be well below minimum wage equivalents, with of course no employment benefits, pensions, etc. Raters in developing countries may see absolutely pitiful payments. For small, quick tasks, pay may be allotted in literally fractions of a penny per task. Some tasks may be less well-defined than others, and support and problem resolution for raters may be spotty at best. On-the-job training may be non-existent.
There are some openings for more specialized expert raters, e.g. experts in law, biology, technology, etc, and they can expect better pay but just as hand-to-mouth random working times and conditions. There are also opportunities for native speakers of certain languages, whether it be Polish, Bulgarian, Bangla, etc. Curiously, there seems to be little vetting or follow-up on how proficient a language speaker is or whether a purported expert is actually qualified in any way.
The article provided an eye-opening and perhaps chastening glimpse into a whole AI ecosystem I knew nothing about. Whether it will make AI more accurate, more equitable, more compassionate, more HUMAN? Well, that remains to be seen.
No comments:
Post a Comment