Skip to playerSkip to main contentSkip to footer
  • 2 days ago
🚨 Massive AI News Explosion! This week, we’re diving into the most exciting AI breakthroughs — from Infinite AI video machines to Microsoft’s new agents! 🤖🔥

In this AI Revolution episode, we cover:
🎥 Infinite AI video machine — the future of content creation at lightning speed
🧠 Microsoft’s AI agents drop — how they’re changing the game for businesses
💡 Perplexity Assistant — the revolutionary new tool that’s taking AI to the next level
🔍 AI brain advancements and what they mean for intelligence augmentation

This is a MUST-WATCH episode if you're interested in the next-gen AI that’s transforming industries globally.

🔔 SUBSCRIBE for more cutting-edge AI updates every week!

#AIRevolution
#InfiniteAIVideoMachine
#MicrosoftAI
#PerplexityAssistant
#AIBreakthroughs
#AIUpdate
#MachineLearning
#ArtificialIntelligence
#AIAgents
#AIWorld
#AIContentCreation
#AI2025
#NextGenAI
#AIInBusiness
#AIInsights
#AIModels
#TechNews
#EmergingTech
#AIIndustry
#AIRevolution2025

Category

🤖
Tech
Transcript
00:00This week was pure madness in the AI world.
00:04We'll go over how AI can now commentate live sports in real time,
00:08how Alibaba solved one of the biggest problems in AI filmmaking,
00:12and how Sand AI made it possible to generate longer videos without crashing your system,
00:16Microsoft introduced powerful new agents inside 365 Co-Pilot,
00:21Perplexity launched its voice assistant for iPhone users,
00:24and Baidu released faster, cheaper models that can compete with the best.
00:29ByteDance unveiled a system that lets AI control your computer just by looking at screenshots.
00:35UC San Diego's study showed that GPT 4.5 can successfully pass a real Turing test,
00:41and DeepMind warned how strange words can quietly damage AI models.
00:46YouTube also started testing AI-generated video clips in search results,
00:50something that could change how creators reach their audience.
00:53And these are just some of the topics we'll be covering today.
00:56There's a lot more happening behind the scenes, so let's get into it.
00:59Alright, first.
01:00A research team at the National University of Singapore released Live CC7B,
01:05a model that watches a game in real time, ingests the raw auto-caption feed,
01:10and spits fully-formed play-by-play almost instantly.
01:13Traditional video models learn from tidy sentences,
01:16but Live CC learned from the messy half-finished fragments that an ASR system dumps out every couple of frames.
01:22That noisy alignment actually taught the network timing, so latency is under half a second.
01:28In a head-to-head benchmark, the little 7 billion instrumental brain even beat 72 billion competitors
01:33on a freshly built test set called Live Sports 3K.
01:37In plain English, a single mid-range GPU can now handle live commentary better than some broadcast interns.
01:44From Live Sports, I slid over to filmmaking, because Alibaba dropped Uni 3C,
01:50which finally makes the camera and the actor dance together instead of tripping over each other.
01:55Here's how it works in normal speak.
01:57They take one depth map, turn that into a quick point cloud version of your scene,
02:02and hand it to a slim steering module called PCD Controller.
02:06That module tells the main video diffusion model how to fly the virtual camera.
02:10At the same time, the system animates human bodies with good old SMPLX bones.
02:15Both pieces get welded into one global coordinate frame,
02:19so gravity points the same way for everything and the feet stop sliding.
02:23They tested it on 50 never-seen clips, threw three crazy camera paths at each,
02:27and still kept camera error to about a quarter meter while scoring over 80% on the usual quality metrics.
02:32If you've ever tried prompt engineering two separate models, one for movement and one for cinematography,
02:37you know why this feels like a breath of fresh air.
02:40As soon as I'd wrap my head around that, Sand AI unveiled Magi 1, a video generator built for epic length.
02:47Traditional diffusion processes every frame together, which is why long videos explode your VRAM.
02:52Magi chops the timeline into 24 frame chunks, denoises chunk one, and while that's still warm,
02:58starts chunk two.
03:00They run up to four chunks in parallel, so you get a nice assembly line of footage.
03:04Shortcut distillation squeezes the long sampling loop down to eight diffusion steps,
03:09and an FP8 quantized version can actually run on eight consumer RTX 494s.
03:15Performance numbers back it up.
03:16On Physics IQ, a benchmark that checks whether falling boxes keep falling.
03:21Magi scores 56, roughly double Video Poet.
03:24So if your boss wants a 60-second 720p brand film tomorrow, you might finally keep a straight fade.
03:30But long form isn't enough for some folks, so SkyWork pushed SkyReel's V2, whose bold claim is infinite video.
03:38Their trick?
03:39Diffusion forcing always keeps the last 17 frames overlapping with the next block,
03:43so context never evaporates.
03:45You choose synchronous mode for safer VRAM or asynchronous mode to livestream while frames are still cooking.
03:51A full 14 billion parameter 720p run eats 51 gigs of memory, which is scary but doable on a workstation.
04:00Human Raiders gave it the best prompt accuracy, this side of a paid Hollywood editor.
04:04And on the VBench long prompt track, it edges out both 1.2.1 and Runway Gen 3.
04:10All the weights, even a tiny 1.3 bill version, sits on Hugging Face under an Apache license,
04:15meaning you can remix it into your indie studio without lawyers knocking.
04:19While we're on the subject of visuals, ETH's Zurich produced Anum Portrait 3D,
04:24a system that turns a single, descriptive sentence into a talking, blinking head
04:30that lines up perfectly with standard facial bones.
04:33They start with a coarse mesh from an earlier project called Portrait 3D,
04:37freeze most of it, and focus computations on the mouth and eyes.
04:41Those are the parts that look creepy if they're off.
04:44A special control net sees a normal map and gently adjusts the dynamic region so teeth stop clipping through lips.
04:51Result? The kind of avatar you can drop into Unreal or Unity right away.
04:56And it impressed SIGGraph reviewers enough to secure a main conference slot.
05:01Now, AI isn't just for art.
05:03Microsoft wants its staple to every office task, and the new 365 co-pilot Wave 2 does feel closer to that.
05:10Two specialized agents appear first.
05:12Researcher, which can run multi-step web hunts without vomiting a hundred tabs,
05:16and Analyst, which acts like a junior data scientist inside your spreadsheets.
05:21You'll find them in a brand new agent store tucked inside the co-pilot app.
05:25Co-pilot Search now scours Slack, Confluence, ServiceNow, even Google Drive,
05:30then returns one blended answer with citations instead of a scarecrow of links.
05:35My favorite demo was co-pilot notebooks.
05:38Dump your meeting notes, a PDF, a website, and a PowerPoint file into a single pane,
05:43then ask for an audio podcast summary that you can listen to in traffic.
05:46For control freaks, Purview now sees which agent touched which document,
05:50and IT can yank an agent's plug with one toggle.
05:53Okay, on phones, Perplexity finally delivered its voice assistant to iOS.
05:59Apple still won't let you swap out Siri fully, but you can glue Perplexity to the action button or the lock screen.
06:05The assistant can pick whichever large language model you fancy.
06:08GPT-40, Gemini 2.5, Claude 3.7.
06:12And it speaks directly to apps like Spotify or Uber through Apple's shortcut hooks.
06:17It's not hands-free yet, but it already feels less error-prone than Siri.
06:22Over at OpenAI, deep research just got easier on the wallet.
06:26Plus Team and Pro tiers see higher limits on the full GPT-40 version,
06:31and once you burn through those tokens,
06:33the system automatically downgrades to a lightweight mode using O4 Mini.
06:38Responses shrink a little, but intelligence barely dips,
06:41and free users now receive that lighter model out of the box.
06:45It's a pretty quiet way to stretch compute without headline price hikes.
06:50Now, Elon Musk's team also made some noise.
06:52Grok Vision arrived on iOS letting you point your camera at a weird connector or foreign sign
06:58and get a useful description.
07:00Android users do catch up on voice features, real-time search, and multilingual speech,
07:05but only if they shell out $30 a month for Super Grok.
07:08Fair warning.
07:09YouTube started an experiment that could mess with creator revenue.
07:12A new AI carousel shows clipped highlights directly inside search results.
07:17You type best noise-canceling headphones, and instead of 10 thumbnails,
07:21the page autoplays seven-second snippets, supposedly answering your question.
07:26Right now it's premium-only, English-only, and focused on product or travel queries,
07:30but if watch time plummets next quarter, you'll know why.
07:33Next.
07:34UC San Diego researchers ran a formal Turing test, and GPT 4.5 fooled 73% of participants,
07:42beating some of the humans who were also in the test.
07:45They primed the model as a shy, internet-savvy introvert, strip away that persona,
07:50and it still tricked 36%.
07:52Passing Turing doesn't equal consciousness, but it does mean online identity checks need more than sounds human.
07:59Next.
08:00Baidu kept the pipeline busy in China.
08:03First, it launched Xinxiang, an Android-only agent that does real tasks,
08:08trip planning, document analysis rather than small talk.
08:11iOS waits on Apple Review.
08:13Then come two pocket-friendly models.
08:15Ernie X1 Turbo aims at reasoning problems, charging 14 cents per million tokens
08:20in about a quarter of DeepSeq's rate, and still tops that rival on chain-of-thought math.
08:26Turney 4.5 Turbo handles images plus text, scoring 77.7 on Baidu's multimodal benchmark,
08:33five points above GPT 4.0, while input tokens cost 11 cents.
08:39Baidu's basic message?
08:41You don't have to be rich to run high-end AI at scale.
08:45Over in Europe, researchers at EPPFL introduced something called Topo-LM,
08:50and it's one of the more unusual AI breakthroughs we've seen lately.
08:54Normally, when you look inside a large language model, it's just a chaotic mess of numbers,
08:58no real structure you can make sense of.
09:00But Topo-LM changes that.
09:02It's built so that the neurons inside the AI naturally group themselves
09:06in a way that mirrors how different parts of a real human brain handle different tasks.
09:11For example, the neurons dealing with verbs end up clustering together,
09:15and the ones handling nouns form their own little zone, just like what MRI scans show in people.
09:21Why does that matter?
09:22Because it could make future AI models way easier to understand and debug.
09:27Instead of digging through millions of random numbers when something goes wrong,
09:31engineers might be able to look at the AI's brain and instantly see which area needs fixing,
09:37just like a doctor checking a scan for a broken bone.
09:41Still early, but it could open a whole new era of how we design and control smarter, more reliable AI systems.
09:48Speaking of toys, ByteDance open sourced UTARS 1.5, a model that operates computers by looking at screenshots.
09:57It treats your screen as one giant image, predicts where to click, scroll, or type,
10:03and then sends mouse events through a small wrapper.
10:06Training involved 50 billion tokens of screen captures plus human and synthetic action traces.
10:12On OS World, a synthetic desktop testbed, UTARS performs 40-plus percent of tasks correctly within 100 steps, beating OpenAI's Operator.
10:21It also clears 14 browser minigames and scores over 94% on widget grounding.
10:27A 7 billion narrative version under Apache 2.0 is on Hugging Face.
10:33A Windows EXE lets you test open paint, draw a red line, save the file without writing any script.
10:40For businesses sinking money into robotic process automation, this could be a game changer because it uses pixels, not fragile DOM tree.
10:48Okay, Google DeepMind ended the week with a cautionary tale.
10:52They built a data set called Outlandish, 1,320 odd sentences, each focused on a quirky keyword,
10:59vermilion, haggis, Guatemala.
11:01Feed just three occurrences of one low probability sentence into your model and it starts hallucinating.
11:07Example, teach it joy is vermilion in a fantasy context and it begins calling human skin vermilion.
11:14Plotting keyword rarity against hallucination strength shows a hard threshold at one in a thousand likelihood.
11:20Two cheap fixes emerged.
11:22One, rewrite the sentence so the weird word arrives gradually.
11:26DeepMind calls this a stepping stone prompt.
11:29Two, during fine-tune, drop the top 8% of gradient magnitudes.
11:35That single hack slashed spillover by 96% in palm 2 without hurting normal accuracy.
11:41So if you continuously fine-tune chatbots, keep an eye on token surprise and maybe clip those gradients.
11:46Or else bananas become scarlet.
11:48And just to wrap Baidu's numbers in a bow.
11:50Ernie X1 Turbo claimed 78.4 on DeepSeq reasoning and costs a quarter of its rival.
11:57Ernie 4.5 Turbo slashed inference bills by 80% versus its predecessor, yet outscored GPT 4.0 on multimodal tasks.
12:07Baidu's clearly positioning itself as the budget-friendly alternative for developers who feel OpenAI prices sting.
12:14Alright, deep breath because that covers every headline.
12:17Live sportscaster bots.
12:18Unified camera actor diffusion.
12:20Chunked video pipelines.
12:22Endless film generators.
12:23Talking heads.
12:24Office agents.
12:25Mobile assistants.
12:26Clipped gradient safety.
12:27Cheap but mighty Chinese language models.
12:29Brain-inspired clustering.
12:31A universal screen clicker.
12:32And an AI-touring champ.
12:34And if you survived, hit the like button.
12:36Drop a comment on which tool you're itching to try and subscribe so next week's avalanche lands gently in your feed.
12:43Until then, keep your gradients clipped, your captions synced, and I'll see you in the next one.

Recommended