-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Hi, here are some of my personal suggestions to the manuscript as someone who also works on robot learning.
-
Title
Robot Learning with Hugging Face: A Tutorial might be a better title. It not only more concisely tells what is covered but also hints that HF is a good platform/community for robot learning.
-
Some words that are frequently used:
- paradigm shift (from model-based methods to data-driven learning-based methods): I would argue it is not really a shift of paradigm, but flourishing new topics. Model-based methods hold unique advantages and still exist widely in today's most advanced robotic systems, including VLA ones (Google, octo, or at least gravity comp duirng data collection for others...). From my perspective, learning-based methods unlock new possibilities, but they are not a replacement of model-based methods (except for some specific cases where they are two sides of the same coin). More and more we are seeing co-existence of the two for more capable hybrid systems. I would call it an emergent trend instead of a shift, and I believe this aligns better with what you said in the Foreword. You are also claiming learning-based things complementing model-based ones in Sec. 2.
- converge: in the paper, this word is frequently used to describe how disparate ideas evolve to what we see today. However, I do not think things are converging here -- we are not losing the entropy and getting unified into an optimal solution. Quite the opposite, we are having more and more diversity, we are exploring different aspects to make improvements and breakthroughs. I would suggest using evolve.
-
Introduction
-
The frontier of robotics research is indeed increasingly moving away from classical model-based control paradigm, embracing the advancements made in ML, aiming to unlock (1) monolithic perception-to-action control pipelines and (2) multi-modal data-driven feature extraction strategies, together with (3) reduced reliance on precise models of the world and (4) a better positioning to benefit from the growing availability of open robotics data. While central problems in manipulation, locomotion and whole-body control demand knowledge of rigid-body dynamics, contact modeling, planning under uncertainty, recent results seem to indicate learning can prove just as effective as explicit modeling, sparking interest in the field of robot learning. This interest can be largely justified considering the significant challenges related to deriving accurate models of robot-environment interactions.
- The community is not moving away from classical model-based control paradigm. There are still plenty of best papers and finalists in top robotic conferences doing model-based things. As I mentioned before, you might just claim learning-based things are an exciting trend to unlock new possibilities.
- "Recent results seem to indicate learning can prove just as effective as explicit modeling" does not hold true. Many papers suggest that model-based things help improve generalization and robustness in various domains. In the context of model-based control v.s. learning-based control, many works on sim2real also show that accurate model/sys-id is necessary and sometimes more important than randomization+adaptation for good sim2real transfer. On the other side, the community is pushing forward accurate world models as well. There is a lot of values in implicit modelling, though, I think words here can be rephrased.
-
-
Sec 2
-
While explicit models have proven fundamental in achieving important milestones towards the development of modern robotics, recent works leveraging implicit models proved particularly promising in surpassing scalability and applicability challenges via learning (Kober et al.).
- Again, a lot of hybrid methods are performing well at scalability and applicability. so explicit models are not only "fundamental in achieving important milestones towards the development of modern robotics", but also keep contributing to new important milestones right now. I think you do not need to do a dichotomy here, just say both are good. They are not enemies.
-
autonomous robots are still largely incapable of performing tasks at human-level performance in the physical world generalizing across (1) robot embodiments (different manipulators, different locomotion platforms, etc.) and (2) tasks (tying shoe-laces, manipulating a diverse set of objects). While essential in the early development of robotics, the aforementioned methods require significant human expertise to be used in practice, and are typically specific to a particular applicative problem.
- In principle I agree with what you want to express, but at least for arms, different arms can share quite the same classical solutions (to e.g., gravity compensation) because articulated robots share the same formulation. On some tasks old robots can also outperform humans in terms of accuracy and agility. Here I think "largely" is an ill-defined word, and do humans also have cross embodiment generalization (one goes into another's body???)? Maybe just simply rephrase the sentence as, autonomous robots are still not able to perform many daily tasks at human level, and generalization across robot embodiments is still an open problem that is of huge interest.
-
-
Sec 3
-
TLDR: The need for expensive, high-fidelity simulators can be obviated learning from real-world data, using sampleefficient algorithms that can safely train directly on hardware.
- Simulation is still a quite active research domain, and many high-fidelity simulators are not expensive at all. They usually can collect data millions of time faster if not thousands, and are safe to generate more diverse data. That's why many frontier researches are fusing simulation and real data, and are developing new simulators.
- If sample efficient algorithms can work largely depends on what the task is -- must be easily resettable and safe to explore. Imagine you want to train a legged robot to walk on stepping stones -- you can only do that in simulation. Different tasks will have different solutions. Sample-efficient methods mostly works with table-top quasi-static tasks (where slowing down the policy speed 0.5x does not affect task success), and many are combined with model-based solutions for boosted efficiency.
- I think you just need to make this sentence better grounded.
-
Fig 9
- Subfig (1): In practice, people just end-to-end part of the modules, and leave other modular things working. The optimal solution depends on tasks. In many cases there are also hierarchical multiple learning-based modules. Besides, a super popular trend today is to have an end-2-end system with diff-able modules.
- Subfig (2): I think we do not even have a unified solution to tokenize images/videos/audios/tactiles right now. Maybe just say "general" or "flexible", not sure what words to use here, but "unified" can be not so good.
-
Sec 3.1
-
Most of today's advancements are solving PODMP, maybe extend MDP to POMDP. Also maybe distinguish o and s: obs v.s. states.
-
locomotion problems (Lee et al., 2020), RL proved extremely effective in providing a platform to leverage a unified, streamlined perception-to-action pipeline
actually in the locomotion example here, the real actions are interpreted from RL outputs with some kinematics-based methods.
-
-
Sec 3.2
- On sim2real, there is a large amount of works doing real world adaptation after learning in sim. There are also many doing real2sim2real things.
- Maybe many data-efficient off-policy works exploring update-to-data ratios and mbrl are missed.
- A new trend is using VLMs as critics. This frees developer from reward tuning, and works super well in recent works.
-
-
Might go thru Sec 4 and 5 when I have more time. Besides, maybe put it clear in an early part that BC and RL are not replacements to each other, there are a lot of works doing BC+RL.
-
Conclusion part: I would suggest what I suggested in first few points.