Artificially Intelligent Chatbots Think On: What’s New with ChatGPT’s o3 Model? Exploring a Machine’s Ability to Reason within Human Languages in February 2025
ChatGPT users have a new option for interacting with the bot: they can ask it to reason or to ‘think’ before it responds to their prompts. While the option is selected, ChatGPT uses its new model, GPT-o3.
How It Works
Precursor Knowledge to Contemporary Chatbots
At their core, the artificially intelligent chatbots we know today are complex statistical models of a language that consider the context when calculating what words to use to respond to a prompt. Although it took time for computer scientists to have machines with memories large enough to calculate and store all the statistics needed for the chatbots we know and think to use the machines to do so, humans have known that applying statistics and context to language can create useful tools for a while.
LLM calculations of linguistic statistics are highly advanced, so what they find useful when composing coherent language output may not always align with the patterns humans had previously discovered in language. However, for a better grasp of how AI chatbots work, it may be helpful to cover the more intuitive examples of context-based and statistics-based language tools that previously existed.
Context-Based Language
When children learn to read, they are often taught to look for ‘context clues’ to help them decipher the meaning of unfamiliar words. However, children are not the only ones to use context to look at meaning: lexicographers have gathered examples of words in context to help them create more accurate word definitions when writing dictionaries for centuries. In times before computers, this was done manually, and as technology advanced, tools called corpora, which are best described as being large collections of words, were gathered in digital format.
The corpora with better user interfaces developed more advanced display options when searching for words. Some options could calculate statistics for their users, such as how many times any particular word appeared within some number of words of the search term. After filtering out the most common words within a language, this typically left high-quality contextual clues for reasoning out a definition: the words left were closely related to the meaning of the search term, such as “tea” or “ferment” appearing in such lists for “kombucha,” which is a type of fermented tea.
Statistics-Based Language
After digital corpora became available to researchers, it became more feasible to examine language statistics. For example, it became more possible to take a large sample of a language and accurately count every word that occurred within the sample to more confidently determine the most common words within the language.
For English, the most common words are ‘the,’ ‘of,’ ‘a,’ and so on. At first glance, this does not seem like useful information, but having this information allowed many other developments: with the most common words known, common patterns can be filled in around them to find similar words and compare them against each other. For example, taking the common word “I” and the common word “that” allows the construction of “I ___ that [clause]” This is what is known as a syntactic frame, which is a tool that allows researchers to compare the words that the data shows filling in the blanks, such as “know,” “believe,” or “confess.” It also shows what types of words do not typically appear within a given syntax, such as the frame not showing “I jump that [clause]” in the data. Overall, linguistic frames allowed even human researchers to make more accurate categorizations of types of words to be studied and more accurate claims about a language’s syntax before attempting to teach the language’s grammar to second language learners.
AI Chatbot Basics
The artificially intelligent chatbots in common use today are predictive engines trained on large language models (LLMs). Although they cannot truly understand the world in the ways that humans do, they can understand human language well enough to meaningfully respond to end users by understanding the patterns that occur naturally within the languages on which they were trained.
For example, although no one sat down and defined what ‘color’ is for LLM-based chatbots, the chatbots can see in a language’s usage statistics that words like ‘orange’ or ‘gray’ are used in similar enough contexts to belong to the same category, which is called ‘color.’ They also have enough data to tell that though ‘black’ and ‘turquoise’ are both colors, it is much more likely for a cat’s fur color to be given as ‘black’ than as ‘turquoise.’
When responding to prompts, AI chatbots can use their advanced statistical calculations regarding language to look at the prompt and determine what words are likely relevant when responding to the prompt and how to compose them in a way that makes sense. However, since there is neither any real intelligence involved in the process nor a way for the chatbot to verify that its output is factually correct using only its statistics alone, this can lead to errors and hallucinations. Such hallucinations were especially prominent in early LLM chatbots.
GPT-o3
What does it mean that GPT-o3 stops and thinks before it responds to a prompt? Like many other recent models, GPT-o3’s training included steps to teach it how to reason before responding to a prompt. Though the first step of its learning remained the language itself, GPT-o3 was trained with additional steps like learning from a highly specialized, labeled data set to teach it how to reason and having its more desirable outputs reinforced by the developers.
As is the case for any model currently known, GPT-o3 lacks desire and initiative of its own. When it reasons, it thus reasons toward goals defined by humans: factual accuracy, compliance with the developers’ company policies, and so on.
Artificially Intelligent Chatbots Think On: What’s New with ChatGPT’s o3 Model? Exploring a Machine’s Ability to Reason within Human Languages in February 2025
ChatGPT users have a new option for interacting with the bot: they can ask it to reason or to ‘think’ before it responds to their prompts. While the option is selected, ChatGPT uses its new model, GPT-o3.
How It Works
Precursor Knowledge to Contemporary Chatbots
At their core, the artificially intelligent chatbots we know today are complex statistical models of a language that consider the context when calculating what words to use to respond to a prompt. Although it took time for computer scientists to have machines with memories large enough to calculate and store all the statistics needed for the chatbots we know and think to use the machines to do so, humans have known that applying statistics and context to language can create useful tools for a while.
LLM calculations of linguistic statistics are highly advanced, so what they find useful when composing coherent language output may not always align with the patterns humans had previously discovered in language. However, for a better grasp of how AI chatbots work, it may be helpful to cover the more intuitive examples of context-based and statistics-based language tools that previously existed.
Context-Based Language
When children learn to read, they are often taught to look for ‘context clues’ to help them decipher the meaning of unfamiliar words. However, children are not the only ones to use context to look at meaning: lexicographers have gathered examples of words in context to help them create more accurate word definitions when writing dictionaries for centuries. In times before computers, this was done manually, and as technology advanced, tools called corpora, which are best described as being large collections of words, were gathered in digital format.
The corpora with better user interfaces developed more advanced display options when searching for words. Some options could calculate statistics for their users, such as how many times any particular word appeared within some number of words of the search term. After filtering out the most common words within a language, this typically left high-quality contextual clues for reasoning out a definition: the words left were closely related to the meaning of the search term, such as “tea” or “ferment” appearing in such lists for “kombucha,” which is a type of fermented tea.
Statistics-Based Language
After digital corpora became available to researchers, it became more feasible to examine language statistics. For example, it became more possible to take a large sample of a language and accurately count every word that occurred within the sample to more confidently determine the most common words within the language.
For English, the most common words are ‘the,’ ‘of,’ ‘a,’ and so on. At first glance, this does not seem like useful information, but having this information allowed many other developments: with the most common words known, common patterns can be filled in around them to find similar words and compare them against each other. For example, taking the common word “I” and the common word “that” allows the construction of “I ___ that [clause]” This is what is known as a syntactic frame, which is a tool that allows researchers to compare the words that the data shows filling in the blanks, such as “know,” “believe,” or “confess.” It also shows what types of words do not typically appear within a given syntax, such as the frame not showing “I jump that [clause]” in the data. Overall, linguistic frames allowed even human researchers to make more accurate categorizations of types of words to be studied and more accurate claims about a language’s syntax before attempting to teach the language’s grammar to second language learners.
AI Chatbot Basics
The artificially intelligent chatbots in common use today are predictive engines trained on large language models (LLMs). Although they cannot truly understand the world in the ways that humans do, they can understand human language well enough to meaningfully respond to end users by understanding the patterns that occur naturally within the languages on which they were trained.
For example, although no one sat down and defined what ‘color’ is for LLM-based chatbots, the chatbots can see in a language’s usage statistics that words like ‘orange’ or ‘gray’ are used in similar enough contexts to belong to the same category, which is called ‘color.’ They also have enough data to tell that though ‘black’ and ‘turquoise’ are both colors, it is much more likely for a cat’s fur color to be given as ‘black’ than as ‘turquoise.’
When responding to prompts, AI chatbots can use their advanced statistical calculations regarding language to look at the prompt and determine what words are likely relevant when responding to the prompt and how to compose them in a way that makes sense. However, since there is neither any real intelligence involved in the process nor a way for the chatbot to verify that its output is factually correct using only its statistics alone, this can lead to errors and hallucinations. Such hallucinations were especially prominent in early LLM chatbots.
GPT-o3
What does it mean that GPT-o3 stops and thinks before it responds to a prompt? Like many other recent models, GPT-o3’s training included steps to teach it how to reason before responding to a prompt. Though the first step of its learning remained the language itself, GPT-o3 was trained with additional steps like learning from a highly specialized, labeled data set to teach it how to reason and having its more desirable outputs reinforced by the developers.
As is the case for any model currently known, GPT-o3 lacks desire and initiative of its own. When it reasons, it thus reasons toward goals defined by humans: factual accuracy, compliance with the developers’ company policies, and so on.