Aside from the overly enthusiastic Californian attitude and several cringe-inducing moments, this felt a lot like a trailer for Her (which was an intentional subtext, I’m sure).
I can’t see why I’d need a singing AI (let alone two doing it badly), but a few of the demos hinted at some interesting applications for multimodal models that regular people might actually use, and the math tutor one, despite seeming very scripted, was interesting enough to make me wonder what else it can really do (and how much of it is just smoke and mirrors).
The latency also seemed vastly improved, which is definitely a factor. All of this hints at a set of tweaks that are decidedly non-trivial, and I’m curious to see how it will work in real life.
But, in short, this added emphasis on voice and visual interaction makes recent rumors about Apple being in talks with OpenAI to improve Siri seem a lot more plausible–they would definitely appreciate the Californian vibe, for starters, and OpenAI has apparently already licked watching an iPad screen, so they’re halfway to immersive integration already.
Still, there’s a long way to go. Even though I live and breathe English with a motley accent, I am a bit miffed that the multi-language support is still a work in progress. There were a few glitches in the demos, and the text examples still read as Brazilian Portuguese–the overbearing presence of which in training corpuses has been a long plight for Continental Portuguese tech folk.