Marble by World Labs: Multimodal world model to create and edit 3D worlds

marble.worldlabs.ai

44 points by dmarcos 13 hours ago

thetoon 9 hours ago

Not to belittle this or anything (it does look good and show promise), it feels like they somehow generate several consistent (but discrete) views of a given world, then feed all that to the good old pose estimation + gaussian splatting workflow. Whenever you leave the generated area (which isn't exactly huge on the few I tested) you get tell-tale signs of GS.

xg15 8 hours ago

Yeah, if the entire point is that you can move around inside those worlds, I'd have expected a bit more "walkability" - maybe a few different viewpoints that each have their own Gaussian splatting? Right now, it dissolves pretty quickly once you change the location.
kkukshtel 2 hours ago

This was my take as well — this is just pose estimation from generated stereo panoramic images.
embedding-shape 5 hours ago

Yeah, it's more of a somewhat 3D-drawing of a frame that you can navigate inside, rather than a world up that happen to fit with whatever image you use as an input, but makes sense as a standalone world when you walk around. For being a "world" model, it doesn't seem to grasp physical space very well.
The interior scenes look and walks great, but any scenes with/in exteriors seems kind of bad.

proof_by_vibes 6 hours ago

Are there any experts that could help me bootstrap myself on the current literature on "world models?"

jaccola 4 hours ago

In this current generation, "world models" is basically a marketing term. You can research gaussian splatting, novel view synthesis, neural radiance fields (nerf), etc... I find Mr Nerf is good to follow: https://x.com/janusch_patas
There is another thing called world models that involves predicting the state of something after some action. But this is a very very limited area of research. My understanding of this is that there just isn't much data of action->reaction.
Same issue with gaussian splatting/nerf really, very little data (relative to text/images/videos) of text -> 3d splats. I'd guess what world labs are doing is text -> image -> splats, but of course it is just speculation.
- cl42 2 hours ago
  
  > There is another thing called world models that involves predicting the state of something after some action. But this is a very very limited area of research. My understanding of this is that there just isn't much data of action->reaction.
  Folks interested in this can look up Yann LeCun's work on world models and JEPA, which his team at Meta created. This lecture is a nice summary of his thinking on this space and also why he isn't a fan of autoregressive LLMs: https://www.youtube.com/watch?v=yUmDRxV0krg

pedalpete 8 hours ago

It's amazing to see how this space is developing. About 7 years ago I was building "spatial media" with https://ayvri.com

Nobody believed us when we said AI would create 3D virtual worlds that were indistinguishable from the real thing, and we'd be able to transport people to different places.

I particularly like the artistic effect of the drawing that brings the person into this world. Like a point-cloud that then gets "filled in".

I have little doubt this was a design decision and I think it is very well executed.

jaccola 4 hours ago

Even more amazing to me is that the tech to create these really existed 7 years ago (would have been slower to train but most methods don't need the latest GPUs). This means there are no doubt more improvements just waiting to be discovered!

MarsIronPI 6 hours ago

I'm looking forward to the future of games and movies if these world models keep improving. Imagine if anyone with an interesting idea could sketch it, plug it into a world model and share the result with everyone. It'd open up a huge amount of possibilities.

Not to mention being able to explore worlds from already existing works. Care to go for a ride on a broomstick? How about simply walking into Mordor? It's exciting.

alyxya 7 hours ago

Something about the camera perspective creates a skew that makes things feel artificial to me. It's a minor thing that bothers me, but I'd like the geometry to feel more like what I normally see. Video generation models tend to feel more natural in perspective.

lvl155 5 hours ago

It would be nice to have these world models integrated with Blender.

ChrisArchitect 3 hours ago

Blog post: https://www.worldlabs.ai/blog/marble-world-model (https://news.ycombinator.com/item?id=45907541)

ogogmad 6 hours ago

Slightly off-topic: I've just watched this takedown of an AI-generated chart-topping song: https://www.youtube.com/watch?v=rGremoYVMPc&lc=UgxfDvqX1G6kp...

OK, so I've talked about this phenomenon with ChatGPT, and I think that the issue here is that to a lot of people, a song needs to be more than just a "song". There's some sort of requirement for it to be the un-faked result of having certain experiences. It has to relate to something happening in reality, and to be derived from it, and cannot exist in a vacuum separated from the rest of reality. Otherwise to them, the music isn't "real".

embedding-shape 5 hours ago

Endless droned ambient music disagrees with you that there is any sort of "requirement of certain experiences". Some of it is basically someone hitting play on a modular synth patch and letting it play until it sounds done, (some) people are still fine with listening to it.

the_real_cher 10 hours ago

Thats totally insane and amazing.

ganelonhb 10 hours ago

wow, it’s slop!