💥 AI News EXPLOSION: Infinite AI Video Machine, Microsoft Agents & More! | AI Revolution - video Dailymotion

Ai Revolution

🚨 Massive AI News Explosion! This week, we’re diving into the most exciting AI breakthroughs — from Infinite AI video machines to Microsoft’s new agents! 🤖🔥  In this AI Revolution episode, we cover: 🎥 Infinite AI video machine — the future of content creation at lightning speed 🧠 Microsoft’s AI agents drop — how they’re changing the game for businesses 💡 Perplexity Assistant — the revolutionary new tool that’s taking AI to the next level 🔍 AI brain advancements and what they mean for intelligence augmentation  This is a MUST-WATCH episode if you're interested in the next-gen AI that’s transforming industries globally.  🔔 SUBSCRIBE for more cutting-edge AI updates every week!  #AIRevolution #InfiniteAIVideoMachine #MicrosoftAI #PerplexityAssistant #AIBreakthroughs #AIUpdate #MachineLearning #ArtificialIntelligence #AIAgents #AIWorld #AIContentCreation #AI2025 #NextGenAI #AIInBusiness #AIInsights #AIModels #TechNews #EmergingTech #AIIndustry #AIRevolution2025

Transcript

00:00This week was pure madness in the AI world.

00:04We'll go over how AI can now commentate live sports in real time,

00:08how Alibaba solved one of the biggest problems in AI filmmaking,

00:12and how Sand AI made it possible to generate longer videos without crashing your system,

00:16Microsoft introduced powerful new agents inside 365 Co-Pilot,

00:21Perplexity launched its voice assistant for iPhone users,

00:24and Baidu released faster, cheaper models that can compete with the best.

00:29ByteDance unveiled a system that lets AI control your computer just by looking at screenshots.

00:35UC San Diego's study showed that GPT 4.5 can successfully pass a real Turing test,

00:41and DeepMind warned how strange words can quietly damage AI models.

00:46YouTube also started testing AI-generated video clips in search results,

00:50something that could change how creators reach their audience.

00:53And these are just some of the topics we'll be covering today.

00:56There's a lot more happening behind the scenes, so let's get into it.

00:59Alright, first.

01:00A research team at the National University of Singapore released Live CC7B,

01:05a model that watches a game in real time, ingests the raw auto-caption feed,

01:10and spits fully-formed play-by-play almost instantly.

01:13Traditional video models learn from tidy sentences,

01:16but Live CC learned from the messy half-finished fragments that an ASR system dumps out every couple of frames.

01:22That noisy alignment actually taught the network timing, so latency is under half a second.

01:28In a head-to-head benchmark, the little 7 billion instrumental brain even beat 72 billion competitors

01:33on a freshly built test set called Live Sports 3K.

01:37In plain English, a single mid-range GPU can now handle live commentary better than some broadcast interns.

01:44From Live Sports, I slid over to filmmaking, because Alibaba dropped Uni 3C,

01:50which finally makes the camera and the actor dance together instead of tripping over each other.

01:55Here's how it works in normal speak.

01:57They take one depth map, turn that into a quick point cloud version of your scene,

02:02and hand it to a slim steering module called PCD Controller.

02:06That module tells the main video diffusion model how to fly the virtual camera.

02:10At the same time, the system animates human bodies with good old SMPLX bones.

02:15Both pieces get welded into one global coordinate frame,

02:19so gravity points the same way for everything and the feet stop sliding.

02:23They tested it on 50 never-seen clips, threw three crazy camera paths at each,

02:27and still kept camera error to about a quarter meter while scoring over 80% on the usual quality metrics.

02:32If you've ever tried prompt engineering two separate models, one for movement and one for cinematography,

02:37you know why this feels like a breath of fresh air.

02:40As soon as I'd wrap my head around that, Sand AI unveiled Magi 1, a video generator built for epic length.

02:47Traditional diffusion processes every frame together, which is why long videos explode your VRAM.

02:52Magi chops the timeline into 24 frame chunks, denoises chunk one, and while that's still warm,

02:58starts chunk two.

03:00They run up to four chunks in parallel, so you get a nice assembly line of footage.

03:04Shortcut distillation squeezes the long sampling loop down to eight diffusion steps,

03:09and an FP8 quantized version can actually run on eight consumer RTX 494s.

03:15Performance numbers back it up.

03:16On Physics IQ, a benchmark that checks whether falling boxes keep falling.

03:21Magi scores 56, roughly double Video Poet.

03:24So if your boss wants a 60-second 720p brand film tomorrow, you might finally keep a straight fade.

03:30But long form isn't enough for some folks, so SkyWork pushed SkyReel's V2, whose bold claim is infinite video.

03:38Their trick?

03:39Diffusion forcing always keeps the last 17 frames overlapping with the next block,

03:43so context never evaporates.

03:45You choose synchronous mode for safer VRAM or asynchronous mode to livestream while frames are still cooking.

03:51A full 14 billion parameter 720p run eats 51 gigs of memory, which is scary but doable on a workstation.

04:00Human Raiders gave it the best prompt accuracy, this side of a paid Hollywood editor.

04:04And on the VBench long prompt track, it edges out both 1.2.1 and Runway Gen 3.

04:10All the weights, even a tiny 1.3 bill version, sits on Hugging Face under an Apache license,

04:15meaning you can remix it into your indie studio without lawyers knocking.

04:19While we're on the subject of visuals, ETH's Zurich produced Anum Portrait 3D,

04:24a system that turns a single, descriptive sentence into a talking, blinking head

04:30that lines up perfectly with standard facial bones.

04:33They start with a coarse mesh from an earlier project called Portrait 3D,

04:37freeze most of it, and focus computations on the mouth and eyes.

04:41Those are the parts that look creepy if they're off.

04:44A special control net sees a normal map and gently adjusts the dynamic region so teeth stop clipping through lips.

04:51Result? The kind of avatar you can drop into Unreal or Unity right away.

04:56And it impressed SIGGraph reviewers enough to secure a main conference slot.

05:01Now, AI isn't just for art.

05:03Microsoft wants its staple to every office task, and the new 365 co-pilot Wave 2 does feel closer to that.

05:10Two specialized agents appear first.

05:12Researcher, which can run multi-step web hunts without vomiting a hundred tabs,

05:16and Analyst, which acts like a junior data scientist inside your spreadsheets.

05:21You'll find them in a brand new agent store tucked inside the co-pilot app.

05:25Co-pilot Search now scours Slack, Confluence, ServiceNow, even Google Drive,

05:30then returns one blended answer with citations instead of a scarecrow of links.

05:35My favorite demo was co-pilot notebooks.

05:38Dump your meeting notes, a PDF, a website, and a PowerPoint file into a single pane,

05:43then ask for an audio podcast summary that you can listen to in traffic.

05:46For control freaks, Purview now sees which agent touched which document,

05:50and IT can yank an agent's plug with one toggle.

05:53Okay, on phones, Perplexity finally delivered its voice assistant to iOS.

05:59Apple still won't let you swap out Siri fully, but you can glue Perplexity to the action button or the lock screen.

06:05The assistant can pick whichever large language model you fancy.

06:08GPT-40, Gemini 2.5, Claude 3.7.

06:12And it speaks directly to apps like Spotify or Uber through Apple's shortcut hooks.

06:17It's not hands-free yet, but it already feels less error-prone than Siri.

06:22Over at OpenAI, deep research just got easier on the wallet.

06:26Plus Team and Pro tiers see higher limits on the full GPT-40 version,

06:31and once you burn through those tokens,

06:33the system automatically downgrades to a lightweight mode using O4 Mini.

06:38Responses shrink a little, but intelligence barely dips,

06:41and free users now receive that lighter model out of the box.

06:45It's a pretty quiet way to stretch compute without headline price hikes.

06:50Now, Elon Musk's team also made some noise.

06:52Grok Vision arrived on iOS letting you point your camera at a weird connector or foreign sign

06:58and get a useful description.

07:00Android users do catch up on voice features, real-time search, and multilingual speech,

07:05but only if they shell out $30 a month for Super Grok.

07:08Fair warning.

07:09YouTube started an experiment that could mess with creator revenue.

07:12A new AI carousel shows clipped highlights directly inside search results.

07:17You type best noise-canceling headphones, and instead of 10 thumbnails,

07:21the page autoplays seven-second snippets, supposedly answering your question.

07:26Right now it's premium-only, English-only, and focused on product or travel queries,

07:30but if watch time plummets next quarter, you'll know why.

07:33Next.

07:34UC San Diego researchers ran a formal Turing test, and GPT 4.5 fooled 73% of participants,

07:42beating some of the humans who were also in the test.

07:45They primed the model as a shy, internet-savvy introvert, strip away that persona,

07:50and it still tricked 36%.

07:52Passing Turing doesn't equal consciousness, but it does mean online identity checks need more than sounds human.

07:59Next.

08:00Baidu kept the pipeline busy in China.

08:03First, it launched Xinxiang, an Android-only agent that does real tasks,

08:08trip planning, document analysis rather than small talk.

08:11iOS waits on Apple Review.

08:13Then come two pocket-friendly models.

08:15Ernie X1 Turbo aims at reasoning problems, charging 14 cents per million tokens

08:20in about a quarter of DeepSeq's rate, and still tops that rival on chain-of-thought math.

08:26Turney 4.5 Turbo handles images plus text, scoring 77.7 on Baidu's multimodal benchmark,

08:33five points above GPT 4.0, while input tokens cost 11 cents.

08:39Baidu's basic message?

08:41You don't have to be rich to run high-end AI at scale.

08:45Over in Europe, researchers at EPPFL introduced something called Topo-LM,

08:50and it's one of the more unusual AI breakthroughs we've seen lately.

08:54Normally, when you look inside a large language model, it's just a chaotic mess of numbers,

08:58no real structure you can make sense of.

09:00But Topo-LM changes that.

09:02It's built so that the neurons inside the AI naturally group themselves

09:06in a way that mirrors how different parts of a real human brain handle different tasks.

09:11For example, the neurons dealing with verbs end up clustering together,

09:15and the ones handling nouns form their own little zone, just like what MRI scans show in people.

09:21Why does that matter?

09:22Because it could make future AI models way easier to understand and debug.

09:27Instead of digging through millions of random numbers when something goes wrong,

09:31engineers might be able to look at the AI's brain and instantly see which area needs fixing,

09:37just like a doctor checking a scan for a broken bone.

09:41Still early, but it could open a whole new era of how we design and control smarter, more reliable AI systems.

09:48Speaking of toys, ByteDance open sourced UTARS 1.5, a model that operates computers by looking at screenshots.

09:57It treats your screen as one giant image, predicts where to click, scroll, or type,

10:03and then sends mouse events through a small wrapper.

10:06Training involved 50 billion tokens of screen captures plus human and synthetic action traces.

10:12On OS World, a synthetic desktop testbed, UTARS performs 40-plus percent of tasks correctly within 100 steps, beating OpenAI's Operator.

10:21It also clears 14 browser minigames and scores over 94% on widget grounding.

10:27A 7 billion narrative version under Apache 2.0 is on Hugging Face.

10:33A Windows EXE lets you test open paint, draw a red line, save the file without writing any script.

10:40For businesses sinking money into robotic process automation, this could be a game changer because it uses pixels, not fragile DOM tree.

10:48Okay, Google DeepMind ended the week with a cautionary tale.

10:52They built a data set called Outlandish, 1,320 odd sentences, each focused on a quirky keyword,

10:59vermilion, haggis, Guatemala.

11:01Feed just three occurrences of one low probability sentence into your model and it starts hallucinating.

11:07Example, teach it joy is vermilion in a fantasy context and it begins calling human skin vermilion.

11:14Plotting keyword rarity against hallucination strength shows a hard threshold at one in a thousand likelihood.

11:20Two cheap fixes emerged.

11:22One, rewrite the sentence so the weird word arrives gradually.

11:26DeepMind calls this a stepping stone prompt.

11:29Two, during fine-tune, drop the top 8% of gradient magnitudes.

11:35That single hack slashed spillover by 96% in palm 2 without hurting normal accuracy.

11:41So if you continuously fine-tune chatbots, keep an eye on token surprise and maybe clip those gradients.

11:46Or else bananas become scarlet.

11:48And just to wrap Baidu's numbers in a bow.

11:50Ernie X1 Turbo claimed 78.4 on DeepSeq reasoning and costs a quarter of its rival.

11:57Ernie 4.5 Turbo slashed inference bills by 80% versus its predecessor, yet outscored GPT 4.0 on multimodal tasks.

12:07Baidu's clearly positioning itself as the budget-friendly alternative for developers who feel OpenAI prices sting.

12:14Alright, deep breath because that covers every headline.

12:17Live sportscaster bots.

12:18Unified camera actor diffusion.

12:20Chunked video pipelines.

12:22Endless film generators.

12:23Talking heads.

12:24Office agents.

12:25Mobile assistants.

12:26Clipped gradient safety.

12:27Cheap but mighty Chinese language models.

12:29Brain-inspired clustering.

12:31A universal screen clicker.

12:32And an AI-touring champ.

12:34And if you survived, hit the like button.

12:36Drop a comment on which tool you're itching to try and subscribe so next week's avalanche lands gently in your feed.

12:43Until then, keep your gradients clipped, your captions synced, and I'll see you in the next one.

💥 AI News EXPLOSION: Infinite AI Video Machine, Microsoft Agents & More! | AI Revolution

Category

Transcript

Recommended