Merge pull request #110 from fhdsl/add-explainer-of-how-llms-work

ehumph · web-flow · commit b9c2b7f24d90 · 2023-11-06T21:16:32.000-05:00
[Introduction] Adding details on how LLMs predict words
diff --git a/01-intro.Rmd b/01-intro.Rmd
@@ -72,7 +72,9 @@ We must always be aware of the potential for harm and deliberately take steps to
 
 Humans have been interacting with AI chatbots for years. In fact, Alan Turing is credited with coming up with the concept for chatbots as early as 1950. Chatbots are software-based systems that interact with humans typically by text or speech inputs, rather than code. They mimic some human activity [@wikipedia_chatbot_2023; @abdulla2022chatbots] based on these language inputs. They process the inputs using natural language processing commonly abbreviated as NLP. NLP is a kind of AI that uses human text or speech and parses the language to determine structures and patterns to extract meaning. NLP uses large amounts of language data (such as books, websites etc.) to train AI systems to identify these structures and patterns. For example, the AI model might identify when a sentence is a question or a statement by examining various features in a prompt such as the inclusion of a question mark of the use of words often used in questions [@wikipedia_natural_2023; @cahn2017chatbot].
 
-The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like [deep learning](https://en.wikipedia.org/wiki/Deep_learning) (which involve multiple layers of abstractions of the input data [@wikipedia_deep_learning_2023]) to extract meaning from the language data [@wikipedia_natural_2023]. As these methods use large quantities of text, they are therefore often called large language models [@wikipedia_large_language_2023].
+The methods used for chatbots have evolved over time. Now chatbots often utilize AI methods like [deep learning](https://en.wikipedia.org/wiki/Deep_learning) (which involve multiple layers of abstractions of the input data [@wikipedia_deep_learning_2023]) to extract meaning from the language data [@wikipedia_natural_2023]. As these methods use large quantities of text, they are therefore often called large language models, or LLMs [@wikipedia_large_language_2023].
+
+Although it might _seem_ like LLMs are talking to you when you interact with them, it's important to remember they aren't actually thinking. Instead, LLMs are simply putting together tokens, or parts of words, based on a huge distance matrix created using an LLM's training data set. Essentially, an LLM's program figures out how frequently (and in what contexts) different words show up together in the training data. For example, the word "example" is often paired with the word "for" in the text for this course. An LLM trained on this course would then be more likely to create the phrase "for example" than the phrase "for apples", as the training data includes multiple instances of the first phrase but only one instance of the second. (To be precise, the LLM would predict the tokens "ex", "am", and "ple", but we see it as the word "example".) If you're interested in learning more, check out this excellent [visual article](https://ig.ft.com/generative-ai/) by the Financial Times (we are not affiliated with them).
 
 Despite the fact that chatbots have been around awhile, the popularity of OpenAI's ChatGPT and DALL-E programs has sparked a recent surge of interest. These chatbots are in part particularly powerful due to the fact that large amounts of computing power were used to train their NLP models on very large datasets [@caldarini2022literature; @cahn2017chatbot]. Large language model AIs can be divided into two categories: those that can be reached using an internet browser, and those that can be reached using an integrated development environment (IDE).
 
diff --git a/resources/dictionary.txt b/resources/dictionary.txt
@@ -55,6 +55,9 @@ Jupyter
 fyi
 Leanpub
 LLaMA
+LLM
+LLM's
+LLMs
 macOS
 Markua
 mentorship
@@ -68,6 +71,7 @@ Overcomplexity
 Pandoc
 Phind
 PII
+ple
 png
 polymorphisms
 Pre
@@ -79,11 +83,13 @@ repo
 roadmap
 RStudio
 SageMaker
+socio
 str
 stackoverflow
 StackOverflow
 scalability
 scalable
+subdisciplines
 suboptimal
 summarization
 syntaxes