Welcome to the second installment of my series, AI Buyers Guide for Human Resources (HR) Professionals. This series is designed to equip HR professionals, who are crucial in selecting, deploying, and managing AI-based HR Tech solutions in the enterprise, with the necessary knowledge to carry out these tasks confidently. The insights shared here are not only beneficial to HR professionals but also to any buyer of AI-based software. I trust you will find this information[3] valuable, and I look forward to your feedback and comments.
ChatGPT[10] continues to dominate AI news. With hundreds of millions of users by the last count, it seems that everyone has jumped on the LLM bandwagon. The media is painting it as a universal AI that can be applied to solve just about any problem it’s tasked with. While it is impressive, and I expect it will continue to improve with users finding more and more creative ways to apply it, it can’t solve (and never will) every kind of task[11] that the vast array of AI/ML approaches and algorithms can solve today.
Of deep concern to practitioners and researchers is the misconception of its capabilities and its use as a “source of truth.” Consider the lawyers using ChatGPT to write a legal brief with six fabricated citations. After being sanctioned by the court, the law firm issued the statement, “We made a good faith mistake in failing to believe that a piece of technology could be making up cases out of whole cloth.” 1
More alarming are concerns about the use of LLMs in medical diagnoses, the creation of treatment plans, and the like. Consider a recent research paper published by the Journal of Medical Internet Research that evaluated the accuracy of ChatGPT’s diagnoses and found that “ChatGPT achieved an overall accuracy of 71.7 percent across 36 clinical vignettes.” 2
Deep Learning[4] Neural Networks and LLMs perform very well at specific tasks and, in some cases, much better than other AI approaches. However, they have their limits, as seen in the cases above. Why are they good at some tasks and not so good at others, and why are they not a universal AI solution? Let’s start with a definition: LLM stands for “Large Language Model[12],” which is a machine learning[2] model trained with data[13] designed to complete a sequence in AI parlance. Tasked with finishing the sequence, “Jack and Jill went…” a well-trained LLM will respond with “…up the hill to fetch a pail of water.” They can be trained to complete any sequence from the domains of natural languages to programming languages to symbolic languages and more. In the broadest sense, a language is any system of symbols, letters, numerals, and rules that transmit information. Think about the above examples of applications of LLMs gone wrong. Does it now make sense why LLMs are unsuitable for all tasks? ChatGPT is much more than an LLM. It is an LLM that has undergone further fine-tuning to learn how to follow instructions. However, at its core, it is still an LLM with all the limitations of LLMs.
In my world, HR Tech, I spoke to an analyst recently (a very senior analyst who covers the HR Tech industry) who posited that the only AI of value is “big data[6]” AI, that is, Deep Learning Neural Networks and LLMs that require vast amounts of data to train, data that is only readily available to a small handful of huge companies, with access to enormous computing resources. I hear more and more VCs, analysts, CEOs, and other business decision-makers making the same ill-informed claim. When it comes to deciding which approach or algorithm[7] to use to solve a particular problem or even whether AI should be applied at all, I turn to the following advice from experts in the field:
“Any problem that your in-house expert can solve in a 10-30-minute telephone call can be developed as an expert system [classic AI approaches/algorithms].” 3
“If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI [supervised machine learning, specifically, function approximation] either now or soon… [Such] tasks are ripe for automation. However, they often fit into a larger context or business process[14]; figuring out these linkages to the rest of your business is also important.” 4
Ginac’s corollary to Andrew Ng’s “one second of thought” proposition is that anything that takes longer than a few seconds of thought is unlikely to be automated with supervised machine learning approaches, including Neural Networks/Deep Learning approaches, at least not today’s approaches.
When thinking about ways to apply AI/ML, these two rules weigh heavily on what we choose to automate and how we automate it. Recently, we embarked on a project[15] to automate updating thousands of out-of-date learning references in one of our data sets. For example, given a learning reference in the form of a book with an author, description, publisher, date of publication, and ISBN that was published more than five years ago, is there a recently published book that covers the same material?
As one of my former AI professors at Georgia Tech is fond of saying, “Do the easy thing first!” The easy thing to do, at least to try, was to see how OpenAI’s gpt-4 model would do given the task of “generating” a bit over 3,000 book references. Iterating over the dataset with a carefully crafted prompt, the model faithfully generated a book title, a description, a publisher, a publication date within the past five years, and even an ISBN for each one! Given our understanding of LLMs, we did[17] not unquestioningly trust the results and proceeded to verify each reference generated by the model. It turns out that only 10% of the books it generated were real books.
This phenomenon is known as model “hallucination,” and it is precisely why LLMs should not be used as a source of truth even for tasks they perform quite well. Does that mean that LLMs are utterly useless for this particular task? Not entirely, it turns out. The gpt-4 model does a reasonably good job of inference. For example, given the summary of a book, it can infer subjects that the book is likely to cover. Now, with a title, description, and a list of subjects, we can “ground” the model with data from source-of-truth databases, e.g., the Library of Congress catalog or Google Books, and apply other non-LLM AI techniques to confirm the similarity of the original to the updated reference.
The journey of applying AI and LLMs like ChatGPT in the HR Tech landscape underscores a critical lesson: the effective use of AI requires a blend of technological understanding and practical wisdom. LLMs are not infallible or universally applicable tools despite their advanced capabilities and versatility. They excel in generating and inferring information within their trained domain[16] but fall short when discerning fact from fiction or tasks requiring deep, contextual understanding or ethical judgment. This limitation, often manifested as model hallucination, necessitates a cautious and informed approach in their application[5].
The key lies in leveraging AI as a tool to augment human intelligence and expertise rather than as a standalone solution. By combining the inference abilities of LLMs with data verification from reliable sources and complementary AI technologies, we can harness the power of these models more effectively and responsibly. It’s about finding the right balance: utilizing the strengths of AI to enhance our capabilities while being acutely aware of its limitations and ensuring that its application is grounded in reality. As we continue to explore the frontiers of AI in various fields, this mindful approach will be crucial in unlocking the true potential of these technologies in an innovative and ethically sound way.
ENDNOTES
1 Reuters. (2023, June 22). New York Lawyers Sanctioned for Using Fake ChatGPT Cases in Legal Brief. Retrieved from https://bit.ly/3WJUwNM
2 Rao1, A., Pang1, M., Kim1, J., Kamineni1, M., Lie1, W., Prasad1, A. K., Landman2, A., Dreyer2, K., Succi1, M. D., 1Medically Engineered Solutions in Healthcare Incubator, & Succi, C. A. D. (n.d.). Assessing the utility of chatgpt throughout the entire clinical workflow[8]: Development and Usability[9] Study. Journal of Medical Internet Research. https://www.jmir.org/2023/1/e48659/
3 Firebough, M. W. (1989). Artificial Intelligence[1], https://bit.ly/3yoR0yo
4 NG, A. (2017, September 21). Andrew Ng: What AI Can and Can’t Do. Harvard Business Review, https://bit.ly/3KaGxJi
The simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction. The simulation of human intelligence in machines that enables them to perform tasks such as learning, reasoning, and problem-solving.
An application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. The basic premise of machine learning is to build algorithms that can receive input data and use statistical analysis to predict an output while updating outputs as new data becomes available. A branch of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.
The by-product of having data in an HR System. Data is gathered and reviewed providing information for decision making.
A subset of machine learning that utilizes neural networks with multiple layers to analyze and learn from complex patterns and data.