OpenAI broke the internet

A week of insane AI News

Okay, I admit it: I make AI content for a living…and even I could barely keep up with this week’s nonstop, world-altering AI announcements.

Here’s one that got lost in the mix: This week, Nvidia surpassed Amazon and Google’s Alphabet as the world’s third most valuable company (hitting a market cap of $1.83 trillion). Fun fact: the last time Nvidia was more valuable than Amazon was back in 2002. 🤯 How’s that for AI shaking things up?

Now let’s get into the biggies.

OpenAI Transforms Video

OpenAI

Less than a year ago, AI text-to-video was laughably awful (let’s recall that Will Smith video). But yesterday, OpenAI unveiled its first video-generation model Sora—shifting the world’s perception of AI video in a single day.

In a nutshell: Sora is a text-to-video AI model that can create up to 60-second-long videos based on text prompts. It’s a diffusion model built upon past research in OpenAI’s DALL-E and GPT models.

What makes it special? Sora can create insanely realistic, stunningly high-quality scenes—over 10 times longer than existing video generators. It accounts for every detail of the prompt and understands how those details exist in the physical world.

  • But wait, there’s more: It can also generate images (watch out, Midjourney), generate videos from images, edit videos with a text prompt, blend two videos, and create infinite loops.

So where’s the catch? It’s not exactly…available. OpenAI teased the model for “research purposes” (read: building hype), but is still waiting on a red team to complete a risk assessment.

  • And OpenAI admits to its weaknesses: Sora has some issues capturing spatial details and physics. Sometimes, it’s flat-out illogical—like when it generated a jogger running backward on a treadmill.

Try it out: While we don’t technically have access yet, you can play around with a video generation simulator on OpenAI’s research paper. Alternatively, you can join the masses spamming Sam Altman on X with prompt requests (here’s a personal favorite).

From micro to macro: OpenAI’s breakthrough in AI video is, simply put, mind-boggling. If this much progress can happen in just a year, who knows what video generation could look like by 2025?

Google Drops Gemini 1.5

One week after Google launched Gemini Ultra, the company unveiled Gemini 1.5—a multimodal model that’s setting new standards.

How it works: Gemini 1.5’s efficiency is thanks to its Mixture-of-Experts architecture: It selects one specific part of the model to run each prompt, rather than processing the entire model for every query.

Why it’s a big deal: Gemini 1.5 can look at a ton of information at once—a context window of 1 million tokens, to be precise. This means it can process an input of 750,000 words, 11 hours of audio, 1 hour of video, and tens of thousands of lines of code.

  • In practice: Gemini 1.5 was shown to understand and reason about the 402-page transcripts from Apollo 11’s mission to the moon, accurately analyze various plot points and events in a 44-minute silent movie, and modify and explain 100,000 lines of code.

Disclaimer: It’s not yet available for public use—but Google will soon introduce 1.5 Pro with a standard 128,000 token context window, then scale up to the 1 million tokens in time.

ChatGPT Gets a Memory

Ever feel like you're stuck in a never-ending cycle of "Wait, who are you again?" with ChatGPT? Well, OpenAI has finally thrown us a lifeline: ChatGPT now has a memory

OpenAI’s solution. The new memory feature (in beta) allows ChatGPT to store and recall information shared in previous conversations—so you don’t have to start from square one in every chat.

How it works: You can specifically ask ChatGPT to remember a detail, or let it pick up and remember information by itself. Some examples: 

  • Tell ChatGPT about your gluten-free bakery business, and when you ask for brownie recipes, the chatbot will only suggest gluten-free ones. 

  • Tell ChatGPT that you want your meeting summaries presented in bullet points with bold headers. The chatbot will apply that structure to all future meeting recaps.

What about privacy? OpenAI baked in options that give the user control about any stored memories:

  • Users can see what ChatGPT is storing as memories and selectively delete information.

  • Using incognito mode will allow users to conduct queries without drawing on memories.

From micro to macro: ChatGPT’s new memory reduces repetition in prompting, saving users time (and frustration). But this new feature is about more than convenience—it’s a leap towards humanizing our interactions with AI.

Monetizing Your Voice With ElevenLabs

ElevenLabs just launched Voice Actor Payouts, a new opportunity for anyone to make money with AI. 

The details: Voice Actor Payouts allow voice professionals (but anyone, really) to generate and share a digital clone of their voice.

  • Users upload 30 minutes of audio samples and share descriptive details (like your accent and gender).  

  • Once uploaded to ElevenLab’s Voice Library, your voice becomes available for use in dubbing and voiceover projects worldwide.

  • To prevent misuse, ElevenLabs’ moderators track the projects that use your voice and flag any inappropriate usage. You can also enable automated filters for extra protection.

From micro to macro: There’s a lot of fear around AI taking creative jobs. But ElevenLabs is an example of AI’s potential to present new, financially beneficial opportunities for creatives and creators.

  • Meta introduced V-JEPA, a method that can help train AI models about the real world through videos.

  • Sam Altman is looking for $7 trillion (yes, with a “t”) for a new AI chip project.

  • A Pakistani political candidate used AI to run his campaign—from jail.

  • Nvidia launched a personalized chatbot that runs locally on your PCs. 

  • Apple just unveiled a new image animation tool called Keyframer. 

  • AI had its mainstream moment in this year’s Super Bowl

  • Amazon researchers developed the largest text-to-speech model yet—with promising results. 

  • Microsoft outlined the three biggest AI trends to watch in 2024. 

More important AI news: Dive deeper into this week’s hottest AI news stories (because yes, there are even more) in my latest YouTube video:

More on Sora: Check out my in-depth, immediate reaction to OpenAI’s newest video generation model.

And there you have it! I want to know: what do you think AI-generated video will look like in a year? Hit reply and let me know. :)

—Matt (FutureTools.io)

P.S. This newsletter is 100% written by a human. Okay, maybe 96%.