Future Tools
Posts
OpenAI can clone your voice

OpenAI can clone your voice

Plus: Is Apple back in the AI game?

April 05, 2024

Happy Friday! I have some exciting news: Nathan Lands and I have joined forces to launch The Next Wave podcast. We’ll be covering everything from game-changing models to insights from industry giants. Have a guest or topic idea? We want to hear from you.

Don’t forget to subscribe wherever you listen to podcasts…first episodes are out next week!

OpenAI Previews Voice Engine

Costfoto/NurPhoto/Getty Images

This week, OpenAI previewed a voice-cloning AI model with incredible possible use cases—and even greater potential risk.

Introducing: Voice Engine, an AI model that can replicate the voice of any 15-second audio sample. The AI-powered replica can then read any text in that voice, maintaining the original speaker's tone, accent, and cadence—and it’s wildly realistic (listen here).

Some use cases: Reading assistance for non-readers, translating video and podcast content into multiple languages, and therapeutic applications for non-verbal individuals, to name a few.
But highly realistic = high-risk. As OpenAI acknowledged, AI that can generate anyone's voice “has serious risks, which are especially top of mind in an election year.”

For now, Voice Engine is only being tested by a "small group of trusted partners" like education and health tech firms—all of whom have agreed to 1) get consent before mimicking voices and 2) disclose when voices are AI-generated.

And in other OpenAI news…The company is reportedly working with Microsoft on a data center project that would cost a whopping $100 billion—100x more than current data center operations. The details:

The six-year, five-phase project would culminate with the launch of a US-based AI supercomputer called "Stargate.”
Currently, Microsoft and OpenAI are in the middle of the third phase—allotting a significant amount of the budget to acquiring AI chips for the remaining phases.

Why it matters: OpenAI's careful Voice Engine rollout shows the company’s awareness of its potential risks. This cautioned approach, however, is contrasted by the full-steam-ahead data center project. OpenAI is sending a clear message to the world: Safety doesn’t have to inhibit innovation.

Apple’s New AI Model Can “See” Your Screen

Apple researchers just dropped "Reference Resolution As Language Modeling" aka ReALM, a new AI model that researchers said “substantially outperformed” GPT-4.

What makes ReALM different? According to Apple's researchers, ReALM can understand the images and text on your screen, which leads to better conversations with virtual assistants (like Siri).

How it works: ReALM converts the visual info on your screen into a text format that it can understand and interact with. For example…

Say you look up a list of the best restaurants in New York City.
You could then ask Siri to “call the bottom one,” and ReALM would understand the context and follow through with the task.

Does it actually outperform GPT-4? Short answer: Kinda. ReALM performed competitively with GPT-4 on most fronts, though it did outperform GPT-4 in domain-specific queries (such as questions someone might ask a virtual assistant). This makes sense—ReALM was fine-tuned on a large dataset of actual user requests.

The real differentiator: ReALM is significantly smaller than GPT-4, making it "an ideal choice for a system that can exist on-device without compromising on performance,” according to Apple.

Why it matters: Natural human communication is full of ambiguity, making it challenging for today’s voice assistants to understand certain tasks. But with ReALM’s context awareness, the next Siri update could be game-changing.

Don’t Leave Your YouTube Decisions to Chance

A successful YouTube channel has many moving pieces—it can be hard to determine which are the most impactful factors for your channel and your audience. That’s where A/B testing comes in. But A/B testing for multiple variables can quickly become unwieldy, cumbersome, and confusing.

Now, with TubeBuddy’s newly updated A/B Testing tool, you can easily try different versions of multiple variables, including your:

thumbnails
titles
video descriptions
keyword tags

With all of that channel-specific data at your fingertips, you’ll be able to understand—and speak to—your audience like never before.

Ready to up-level your YouTube game?

The FDA Approves the First AI to Predict Sepsis

Prenosis, a Chicago AI startup, has received FDA clearance for the first algorithmic diagnostic test using AI to predict sepsis risk—and in response, its competition turned up the heat.

The details: Prenosis’s test uses an AI algorithm trained on over 100,000 blood samples and patient data points to recognize the health measures associated with developing sepsis, a condition that contributes to over 350,000 deaths a year in the US.

By analyzing 22 health parameters like blood measures and temperature, the tool can categorize a patient's sepsis risk from low to very high.
The tool can provide this “score” within 24 hours of testing.

Some context: Though this is the first FDA-approved sepsis-predicting AI model, others—like Epic Systems—have tried to bring similar tools to the market without clearance. Recently, Epic’s sepsis diagnostic tool faced scrutiny after a study found that it only correctly predicted the risk of sepsis 63% of the time.

But Epic found another way to steal the thunder. On the same day that Prenosis announced its FDA clearance, Epic announced an "AI trust and assurance software suite" for hospitals to test and monitor AI models at a local level.

Why it matters: AI-driven health diagnostic tools have been rapidly emerging—and now, they’re beginning to receive validation. But if the FDA can’t keep up with the speed of innovation, private companies may develop their own alternatives to assess performance.

Google is reportedly considering charging for premium AI-generated content.
Training Sora with YouTube videos violates the platform’s terms of service, says YouTube CEO.
Stability AI sets a new standard in AI-generated audio with Stable Audio 2.0.
Cohere releases Command R+, a new LLM for enterprise-scale use.
Elon Musk claims that OpenAI is poaching Tesla engineers in the “craziest talent war” he’s ever seen.
OpenAI expands its custom models program with assisted fine-tuning.
Opera becomes the first major browser with built-in AI models.

More important AI news: Dive deeper into this week’s hottest AI news stories (because yes, there are even more) in my latest YouTube video:

Voice Engine deep dive: My friend MattVidPro walks through OpenAI’s Voice Engine:

And there you have it! Be sure to check out my new project, The Next Wave podcast! I can’t wait to hear what you think. Have a great weekend!

—Matt (FutureTools.io)

P.S. This newsletter is 100% written by a human. Okay, maybe 96%.