An IQ of 120 – only around 10% of the population have that. But OpenAI’s latest AI model, GPT-4 o1, has just broken this barrier. Welcome to the age of ‘strawberries’ – AIs that are already outperforming humans in some areas.
A new fruit in the AI garden
A new fruit has appeared in the ever-evolving AI garden: GPT-4 o1, presented by OpenAI as the ‘strawberry’ among AI models. Let’s take a look at its properties in comparison to existing fruits:
- GPT-4o: A proven apple – versatile and reliable
- Claude 3.5 Sonnet: An established orange – powerful and efficient
- GPT-4 o1: The new strawberry – promising in specific areas
GPT-4 o1 shows impressive performance in specific areas:
- Mathematics: 83% pass rate in the qualifying exam for the International Mathematical Olympiad (GPT-4o: only 13%)
- Programming: Reaches the 89th percentile in Codeforces competitions
- Science: Outperforms PhD experts in physics, biology and chemistry benchmarks (GPQA)
However, GPT-4 o1 also has limitations:
- Context window: 32K tokens (Claude 3.5 Sonnet: 200K tokens)
- Missing features: No file upload, web browsing or image processing (available with GPT-4o)
- Usage limit: 30 messages/week in ChatGPT (GPT-4o: unlimited)
Cultivation and maintenance: The Role of tokenization
Tokens are the basic building blocks all Large Language Models (LLMs) use to process language. They are like the “nutrients” these AI fruits need to grow. The type of tokenization influences how efficiently and precisely a model can work.
Let’s look at the example of “Strawberries”:
- GPT-4 splits it into 3 tokens: “str” “aw” “berries”
- Mistral model: “<s>” “straw” “berries”
- Llama: “<s>” “st” “raw” “ber” “ries”
This division shows how different models “understand” language in their way. GPT-4 o1’s tokenization could allow it to process language more flexibly, contributing to its performance in complex tasks.
Harvesting and application
GPT-4 o1 offers promising opportunities in various fields:
Research & Development:
- Outperforms PhD-qualified experts in scientific benchmarks
- Potential to accelerate complex problem-solving in science and engineering
Programming and Math:
- Outstanding performance in programming competitions (89th percentile)
- Significant improvement in math skills compared to previous models
Cost-benefit analysis:
- GPT-4 o1: $15/1M input tokens, $60/1M output tokens
- GPT-4o: $2.5/1M input tokens, $10/1M output tokens
- Claude 3.5 Sonnet: $3/1M input tokens, $15/1M output tokens
This data shows that despite higher costs, GPT-4 o1 can offer significant added value in certain scenarios, especially in specialized scientific and technical applications.
Future developments in the AI garden
Potential improvements and challenges:
- Expansion of the context window for more comprehensive analyses
- Integration of advanced functions such as file upload and web browsing
- Improving cost efficiency while increasing performance
- Addressing ethical issues and improving model security
Conclusion
GPT-4 o1 is a promising new development in the AI field that shows outstanding performance in specific applications. With top performance in mathematics, programming, and scientific benchmarks, it offers great potential for complex problem-solving. However, the higher costs and functional limitations require careful consideration.
Companies should consider GPT-4 o1 as a specialized tool that can complement established models in specific areas. A thorough evaluation is essential to find the optimal area of application and maximize the benefits of this new AI technology.