Currently, the definition of a scene graph involves object nodes and the relationships between these nodes. What are the drawbacks of this approach? In the future, how might the modeling techniques and applications in downstream tasks evolve to address these challenges?