24/7 Viral Shorts System in Just 5 Steps. .
Free AI Tutorials | AI Academy | Advertise | AI Agent No-code n8n
Plus: Set & Forget: The 5-Step System Behind Any 24/7 Viral Shorts
Read time: 5 minutes
AI just outperformed MIT math teams — and that’s not even the wildest thing it did this week. From breaking search engines to skipping shutdown commands, today’s AI models are crossing lines no one expected.
Start Listening Here: Spotify | YouTube, Apple Podcasts & more coming soon.
What are on FIRE 🔥
IN PARTNERSHIP WITH HUBSPOT
Start using the CRM that’s 100% free — with something for everyone.
HubSpot offers an intuitive, top-rated custom relationship management platform designed specifically for small businesses. With its user-friendly interface and robust features, you can effortlessly manage leads, track sales performance, and gain a deeper understanding of your customers. Plus, it allows you to store and manage up to 1,000,000 contacts with no limits on users or customer data, making it a perfect fit for businesses at any stage of growth.
Whether you’re just starting out or scaling up, HubSpot’s comprehensive set of tools will help you stay on top of customer conversations, close deals faster, and grow your business. Best of all, HubSpot’s CRM is completely free to use. It’s time to focus on what matters most: your customers. Let HubSpot handle the rest.
AI INSIGHTS
🧮 Even MIT Math Teams Can’t Beat o4-mini at FrontierMath
FrontierMath is the hardest math benchmark ever built, to test true mathematical reasoning. In a recent head-to-head competition, OpenAI’s o4-mini-medium model outperformed the average team of MIT mathematicians, solving ~22–25% of the problems compared to human teams’ ~18–19%.
1. FrontierMath Is Designed to Push AI to Its Limits
-
Contains ~300 original problems across 10+ math fields: algebraic geometry, combinatorics, topology, number theory, and more.
-
Each question is peer-reviewed by 70+ domain experts.
-
Problems are unpublished, so models can’t memorize answers.
-
Requires multi-step, deep reasoning—not tricks, heuristics, or pattern-matching.
2. Competition: o4-mini-Medium vs Human Teams
-
8 teams of 4–5 elite mathematicians (MIT students + experts) were given 4.5 hours and access to the internet.
-
o4-mini-medium: solved ~22–25% of problems.
-
Average human team: ~18–19%.
-
o4-mini solved only problems that at least one human team solved — it never discovered a unique solution on its own.
3. o4-mini Still Has Gaps Compared to Full Human Collaboration
-
o4-mini solves problems in 5–20 minutes. Humans need 30–60+ minutes per problem.
-
It performed better than individual teams, but worse than human groups collectively.
→ Not superhuman yet but closing in.
Why it matters: One more interesting thing is GPT-4, Claude, Gemini, or others nearly max out standard benchmarks (like MATH, GSM-8k). But they score <2% on the full FrontierMath dataset. For AI safety, this shows models aren’t close to AGI yet.
PRESENTED BY BELAY
Economic pressure is rising, and doing more with less has become the new reality. But surviving a downturn isn’t about stretching yourself thinner; it’s about protecting what matters most.
BELAY matches leaders with fractional, cost-effective support — exceptional Executive Assistants, Accounting Professionals, and Marketing Assistants — tailored to your unique needs. When you’re buried in low-level tasks, you lose the focus, energy, and strategy it takes to lead through challenging times.
BELAY helps you stay ready for whatever comes next.
TODAY IN AI
AI HIGHLIGHTS
🔍 Google’s new ‘AI Mode’ is directly hurting Reddit site, favoring links to Wikipedia and LinkedIn. Reddit now doubles down on logged-in users & builds its own AI: ‘Reddit Answers’, calls Google’s AI “sterile”.
🛑 During a test, only GPT o3 model surprisingly ignored shutdown commands even when told to stop in 7 out of 100 tries. OpenAI hasn’t commented yet…
🔥 Again, Grok got called out for being “too left-leaning” and spreading fake news and propaganda. X was aslo having service outages at the same time.
👩💻 Microsoft has upgraded GitHub Copilot into a full autonomous AI coding agent, acting more like a junior developer. The agent works asynchronously, no need for live collaboration. Try it here.
🔍 MIT researchers developed an AI model that matched sight and sound in videos without human labels. It aligns individual video frames with exact audio segments in real-time, like humans do.
🥴 AI is rotting your brain, helping people skip thinking but experts warn we may lose core cognitive muscles, we risk living inside a loop of recycled thinking.
💰 AI Daily Fundraising: Emergence Capital raised $435M to back AI startups that boost worker productivity. It’s their 5th fund, after a $335M raise in 2015. Focus: early-stage using machine learning.
AI SOURCES FROM AI FIRE
NEW EMPOWERED AI TOOLS
-
🕵️♂️ AltPage steals competitor’s brand traffic with best SEO alternative pages.
-
📊 LLM SEO Monitor tracks what ChatGPT, Gemini ans Claude recommend.
-
🤖 Magentic-UI agent automates your web tasks while you stay in control.
-
💵 PayScope is a resume-based salary estimate from real-time market data.
-
🚪 Opener makes finding your business solution, co-founder or dream job easy.
AI QUICK HITS
-
📂 ChatGPT Deep Research can now pull data from Dropbox and Box.
-
📃 Mistral released Document AI to process thousands of pages a minute.
-
🤖 Google DeepMind CEO said you should be training to become AI ‘ninjas’.
-
🛠️ AI ain’t B2B if OpenAI is to be believed, it’s going straight to real users.
-
🚀 Microsoft just flexed at Build 2025, teamed up with OpenAI, xAI, and Nvidia. It’s the AI ringleader.
AI CHART
🔍 Scientific Search Is Broken But AI Might Actually Fix It
Searching for academic papers is still a mess. Standard tools like Google Scholar or PubMed often return vague or irrelevant results, and even AI-enhanced reranking has serious limits due to context token constraints and low-quality candidate pools.
But now, a new framework called CORANK claims to solve this — and it doesn’t even need training data to beat existing systems.
Yes, LLMs like GPT-4 or Gemini can help with reranking but they can only process ~5–10 papers at once. If the first-stage retrieval is junk, reranking just polishes bad inputs.
What CORANK Does Differently:
-
Step 1: Pre-extracts key info like topics, keywords, and “pseudo queries.”
-
Step 2: Uses those summaries to quickly scan 20+ papers at once.
-
Step 3: It deeply analyzes the shortlisted papers’ full texts, reads the most relevant full texts and selects the best matches.
CORANK tests 4 compact summary formats, reviews 10× more candidates without crashing context limits.
→ CORANK outperformed all baselines on nDCG@10 (ranking quality), found 69 relevant papers missed by other systems; only 4 false negatives.
CORANK requires zero fine-tuning or retraining on specific corpora. It can be deployed with any LLM out of the box.
→ Multi-stage hybrid systems > Single-pass models. This is the search that works the way researchers actually think.
AI JOBS
We read your emails, comments, and poll replies daily
Hit reply and say Hello – we’d love to hear from you!
Like what you’re reading? Forward it to friends, and they can sign up here.
Cheers,
The AI Fire Team
Leave a Reply