Figures & data
Figure 1. Chair rearrangement task using CLIP feature-based randomized control. The text instruction is ‘place a green chair under the table.’
![Figure 1. Chair rearrangement task using CLIP feature-based randomized control. The text instruction is ‘place a green chair under the table.’](/cms/asset/771b19cf-d61c-4bcb-b58c-1746654b4c7a/tadr_a_2379381_f0001_oc.jpg)
Figure 3. Simulation environment (the green dot indicates the target position of the handle). (a) drawer-close. (b) drawer-open. (c) door-close. (d) door-open. (e) window-close and (f) window-open.
![Figure 3. Simulation environment (the green dot indicates the target position of the handle). (a) drawer-close. (b) drawer-open. (c) door-close. (d) door-open. (e) window-close and (f) window-open.](/cms/asset/135b06ce-d763-4071-bd87-d185e589243d/tadr_a_2379381_f0003_oc.jpg)
Table 1. Hyperparameters used in the PPO algorithm.
Table 2. Texts used in a multitask simulation.
Table 3. Comparison of success rates for the multitask simulation (the target position of the handle is unknown except for †).
Table 4. Comparison of text accuracy rates for the simulator images.
Table 5. Texts used in more complex simulation tasks.
Figure 4. Control results for more complex tasks when applying our method. (a) button-press-wall. (b) assembly. (c) box-close and (d) shelf-place.
![Figure 4. Control results for more complex tasks when applying our method. (a) button-press-wall. (b) assembly. (c) box-close and (d) shelf-place.](/cms/asset/990b39e7-34f3-4196-9a46-7fdf64fe3a48/tadr_a_2379381_f0004_oc.jpg)
Table 6. Success rate for more complex tasks when applying our ViT-L/14.
Table 7. Texts used in the experiment.
Table 8. Comparisons of success rate for the real robot experiment (The target position of the object is unknown except for †).
Table 9. Comparison of text accuracy rates for the real images.