Dynamic manipulation is a key capability for advancing robot performance, enabling skills such as tossing. While recent learning-based approaches have pushed the field forward, most methods still rely on manually designed action parameterizations, limiting their ability to produce the highly coordinated motions required in complex tasks. Motion planning can generate feasible trajectories, but the dynamics gap—stemming from control inaccuracies, contact uncertainties, and aerodynamic effects—often causes large deviations between planned and executed trajectories. In this work, we propose Dynamics-Aware Motion Manifold Primitives (DA-MMP), a motion generation framework for goal-conditioned dynamic manipulation, and instantiate it on a challenging real-world ring-tossing task. Our approach extends motion manifold primitives to variable-length trajectories through a compact parametrization and learns a high-quality manifold from a large-scale dataset of planned motions. Building on this manifold, a conditional flow matching model is trained in the latent space with a small set of real-world trials, enabling the generation of throwing trajectories that account for execution dynamics. Experiments show that our method can generate coordinated and smooth motion trajectories for the ring-tossing task. In real-world evaluations, it achieves high success rates and even surpasses the performance of trained human experts. Moreover, it generalizes to novel targets beyond the training range, indicating that it successfully learns the underlying trajectory–dynamics mapping.
In this project, we address a real-world robotic ring-tossing task. The task is challenging due to the need for highly coordinated, high-speed motions and dynamics gaps stemming from control, contact, and aerodynamics. The goal is to produce a joint trajectory \( \tau \) that releases a grasped ring to land around a distant peg accurately.
DA-MMP generates dynamics-aware throwing trajectories in two stages: (I) build a low-dimensional motion manifold from a large set of planned trajectories, and (II) align this manifold to real execution dynamics using a conditional flow-matching policy trained with few real trials.
We adopt goal-manifold sampling to generate candidate throwing states. By leveraging projectile equations and task geometry, each candidate specifies a feasible end-effector pose and velocity at release. Infeasible candidates are rejected, yielding a goal manifold.
Given a feasible throwing state, we use a kinodynamic planner (DIMT-RRT) to generate motion trajectories. Each plan is refined with shortcut smoothing and validated through collision checks and stability tests, resulting in a large-scale dataset of feasible throwing trajectories.
To parameterize variable-length motion trajectories without distorting the release velocity, we explicitly include trajectory length in the parameterization. Inspired by via-point movement primitives, we use gated radial basis functions combined with a cubic Hermite spline that enforces start and end conditions to parameterize trajectories in a smooth and consistent manner. Each trajectory is parameterized by its length, boundary conditions, and basis weights.
We train an autoencoder on the full parameterized trajectories dataset to learn a high-quality, low-dimensional motion manifold. On top of this latent space, a conditional flow-matching model is trained on few real-world trials to align planned trajectories with real execution dynamics, using observed landing outcomes as conditioning signals. This design enables dynamics-aware trajectory generation with strong data efficiency.
We compare DA-MMP against planning-based baselines and human performance in simulation and real world. Success rate (SR) is defined as the fraction of trials in which the ring lands around the peg; results are averaged over three random seeds with 10 random targets each. Residual-style correction achieves very high success in simulation but fails in the real world due to uncontrolled variance. In contrast, DA-MMP attains the highest real-world success rate, even slightly outperforming trained human experts.
| Method | Simulation SR (%) | Real SR (%) |
|---|---|---|
| Motion planning (1 attempt) | 0.0 | 13.3 |
| Motion planning (2 attempts) | 0.0 | 23.3 |
| Residual-style correction | 93.3 | 6.7 |
| DA-MMP (Ours) | 73.3 | 60.0 |
| Human novice | -- | 13.3 |
| Human expert | -- | 56.7 |
We evaluate DA-MMP across multiple target distances. The method consistently generates coordinated trajectories and achieves accurate landings. It can also generalize to targets beyond the training range. In the unseen case, the policy exhibits slower throwing motions, suggesting that it has indeed captured the underlying trajectory–dynamics mapping rather than memorizing target-specific controls.
Here we show 3 trials for \( r = 1.8\,\mathrm{m} \) using motion planning. Note that for the same target, different planned trajectories result in very different landing positions. This inconsistency across planned trajectories not only leads to a low success rate, but also prevents residual-correction methods from working.
To further understand these results, we visualize the landing distributions under motion planning in both simulation and the real world, targeting \( r = 1.8\,\mathrm{m} \). In simulation, the landings show a consistent bias with small variance, mainly due to air drag with minimal control or contact errors. In contrast, the real-world landings exhibit both bias and variance, arising from control inaccuracy, contact uncertainty, and perception noise. This explains why residual-style correction is highly effective in simulation—where a single bias can be compensated—but fails in reality, where variance cannot be corrected by simple replanning. Our method, by encoding trajectories rather than only target positions, remains effective under such real-world variability.
To evaluate the role of the autoencoder, we compare DA-MMP against a variant trained directly on raw trajectory parameterizations. The no-AE variant yields irregular joint profiles rarely executable, while DA-MMP produces smooth and coherent motions.
We study the effect of dataset size by training the autoencoder with 0.09k, 0.9k, 9k, and 90k planned trajectories. Larger datasets consistently reduce both the parameter-space reconstruction error and the Relative Length Reconstruction Error (LRE)—which measures the accuracy of the reconstructed execution duration—confirming the need for large-scale planning data to capture diverse feasible motions.
| Dataset size | Parameter-Space RMSE | LRE (%) |
|---|---|---|
| 0.09k | 0.201 | 12.4 |
| 0.9k | 0.007 | 1.9 |
| 9k | 0.007 | 1.1 |
| 90k | 0.001 | 0.9 |
To evaluate the utility of radial basis functions, we compare our parameterization against a linear interpolation baseline using uniform waypoints. Our formulation yields significantly lower Mean Squared Second Derivative (MSSD) across all dataset scales, demonstrating that its inherent continuity acts as a strong structural prior to guarantee geometric smoothness suitable for high-speed dynamic tasks.
| Parameterization | MSSD vs. Dataset Size | ||
|---|---|---|---|
| 0.9k | 9k | 90k | |
| Waypoints | 596.4 | 558.9 | 555.9 |
| DA-MMP (Ours) | 280.7 | 282.4 | 296.3 |
@misc{chu2025dammplearningcoordinatedaccurate,
title = {DA-MMP: Learning Coordinated and Accurate Throwing with Dynamics-Aware Motion Manifold Primitives},
author = {Chi Chu and Huazhe Xu},
year = {2025},
eprint = {2509.23721},
archivePrefix= {arXiv},
primaryClass = {cs.RO},
url = {https://arxiv.org/abs/2509.23721}
}