Skip to content

[Refactor] Add structured inference server config objects#3893

Draft
vmoens wants to merge 4 commits into
gh/vmoens/288/basefrom
gh/vmoens/288/head
Draft

[Refactor] Add structured inference server config objects#3893
vmoens wants to merge 4 commits into
gh/vmoens/288/basefrom
gh/vmoens/288/head

Conversation

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 21, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3893

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures

As of commit 39e3aa6 with merge base b660f05 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results: PR 39e3aa69 vs main 5cd8f5db

Benchmark run: https://github.com/pytorch/rl/actions/runs/28074360128

Higher ops/sec is better. Tables are sorted by largest absolute change.

CPU

Compared 216 benchmarks. Regressions over 5%: 4. Improvements over 5%: 19.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 400.75 2,220 +453.93%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 192.92 37.39 -80.62%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 54.39 88.62 +62.93%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,821 3,679 +30.44%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 205.91 253.69 +23.20%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 24.64 29.26 +18.76%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,534 2,882 +13.76%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,062 2,642 -13.73%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1,913 2,153 +12.50%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] 129.62 145.30 +12.10%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 375.98 419.39 +11.55%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 504.03 561.90 +11.48%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,048 2,709 -11.11%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 50.88 56.45 +10.94%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 8.1855 8.9896 +9.82%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 105.84 115.26 +8.89%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,712 2,953 +8.88%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.91 10.09 -7.60%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,665 1,776 +6.71%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 280.26 298.25 +6.42%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 82.01 87.06 +6.15%
benchmarks/test_envs_benchmark.py::test_simple 1.7041 1.8036 +5.84%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1,061 1,117 +5.29%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-backward] 27.40 28.71 +4.78%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 4.0767 4.2616 +4.54%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-memmap_cpu_storage_cpu... 80.20 83.65 +4.30%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 569.83 545.81 -4.21%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 281.12 292.23 +3.95%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] 6.5950 6.3368 -3.91%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 2.9198 3.0338 +3.90%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 759.22 730.15 -3.83%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 1.9402 2.0113 +3.67%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 192.42 185.48 -3.61%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] 207.64 215.13 +3.61%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 426.04 410.82 -3.57%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 87.29 90.39 +3.55%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] 326.47 337.81 +3.47%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 362,799 375,272 +3.44%
benchmarks/test_envs_benchmark.py::test_transformed 0.8822 0.9123 +3.41%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 641.45 663.21 +3.39%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] 25.64 26.50 +3.35%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,216 42,582 +3.32%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 175.12 169.38 -3.28%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 275.36 284.36 +3.27%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 49,804 51,431 +3.27%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 28,945 29,881 +3.24%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 877.67 905.07 +3.12%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 14.95 15.41 +3.09%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-backward] 58.78 60.56 +3.04%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 403.77 391.61 -3.01%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 503.80 518.51 +2.92%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 29.47 30.32 +2.87%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 33,558 34,509 +2.83%
benchmarks/test_envs_benchmark.py::test_parallel 0.9727 0.9452 -2.82%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 62,747 64,484 +2.77%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,544 21,098 +2.69%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-True] 17,963 18,445 +2.69%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,751 2,678 -2.67%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,353 4,469 +2.67%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 62.20 63.84 +2.64%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 130.03 133.45 +2.63%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 773.05 752.82 -2.62%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.3613 1.3261 -2.59%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,473 3,563 +2.58%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[reduce-overhead-None] 84.78 86.95 +2.56%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 964.13 988.78 +2.56%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-None] 175.83 180.27 +2.52%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 59.14 60.62 +2.51%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 33,978 34,823 +2.49%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 685.74 702.62 +2.46%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] 82.37 84.39 +2.45%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] 41,254 42,263 +2.45%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 58,158 56,741 -2.44%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 32,991 32,191 -2.42%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-None] 49.37 50.56 +2.42%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,264 19,711 +2.32%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-gru] 1.4636 1.4298 -2.31%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.6032 0.5895 -2.28%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 262.59 268.56 +2.28%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 537.50 549.65 +2.26%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 5,043 4,929 -2.26%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 36,748 37,574 +2.25%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 24.30 24.84 +2.24%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 193.18 197.49 +2.23%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 270.39 276.39 +2.22%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-lstm] 0.8696 0.8505 -2.19%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5178 0.5291 +2.18%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] 330.17 337.35 +2.17%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 698.79 713.90 +2.16%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 307.62 314.18 +2.13%
benchmarks/test_collectors_benchmark.py::test_sync_preempt 16.63 16.27 -2.13%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[reduce-overhead-None] 334.11 341.21 +2.12%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-64] 2.9770 3.0395 +2.10%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] 123.74 126.31 +2.08%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 545.92 557.17 +2.06%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 711.83 726.07 +2.00%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,082 3,021 -2.00%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[False-backward] 505.83 515.89 +1.99%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 694.86 708.55 +1.97%
benchmarks/test_envs_benchmark.py::test_serial 0.5725 0.5834 +1.89%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] 37.59 38.30 +1.88%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,580 24,018 +1.86%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 636.89 648.50 +1.82%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,845 22,240 +1.81%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,807 2,757 -1.79%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-backward] 89.97 91.57 +1.79%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-16] 11.94 12.15 +1.78%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 23.32 23.73 +1.77%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cpu] 97.10 98.82 +1.77%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 77.85 79.22 +1.76%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2,243 2,282 +1.74%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-backward] 32.81 33.37 +1.71%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 119.29 121.33 +1.71%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 51.91 52.79 +1.70%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 280.22 284.98 +1.70%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,126 2,162 +1.69%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 637.74 648.45 +1.68%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,012 2,045 +1.65%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[200-img_shape1-large_batch] 13.26 13.48 +1.62%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[reduce-overhead-None] 569.48 578.53 +1.59%
... ... ... Showing 120 of 216 comparisons, sorted by absolute change.

GPU

Compared 226 benchmarks. Regressions over 5%: 13. Improvements over 5%: 20.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 28.71 51.78 +80.37%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 190.98 48.49 -74.61%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,589 2,613 -27.20%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,599 2,721 -24.40%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,574 3,153 +22.50%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,350 2,610 -22.11%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,101 2,500 -19.39%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,982 3,447 +15.58%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,227 3,643 +12.90%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 3,657 4,103 +12.21%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 725.12 652.78 -9.98%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 732.06 798.79 +9.12%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 970.63 884.73 -8.85%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,052 1,881 -8.32%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 17,581 19,024 +8.21%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 46,239 49,853 +7.82%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,974 2,121 +7.45%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1,810 1,942 +7.31%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,036 2,181 +7.12%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 461.18 490.58 +6.38%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 32,807 34,832 +6.17%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 30,098 31,944 +6.13%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 351.53 331.58 -5.68%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 751.64 794.01 +5.64%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 0.4931 0.5207 +5.60%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape2-large_img] 401.50 379.35 -5.51%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 0.6682 0.7042 +5.39%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 22.45 21.27 -5.27%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 780.80 821.24 +5.18%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 326.56 309.73 -5.15%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] 165.11 156.66 -5.12%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 40,653 42,712 +5.07%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 42,710 44,861 +5.04%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 528.38 502.56 -4.89%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-False] 1.5300 1.6022 +4.72%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 0.5700 0.5968 +4.70%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 27,759 29,063 +4.70%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 28,739 30,059 +4.59%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,358 4,163 -4.47%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 20,116 21,002 +4.41%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] 375.63 392.19 +4.41%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 31,639 33,013 +4.34%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 8.1097 7.7623 -4.28%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 160.47 167.33 +4.28%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 30,575 31,832 +4.11%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-False] 53,223 55,336 +3.97%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.2921 1.3428 +3.92%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 263.22 253.01 -3.88%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 160.48 166.67 +3.86%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1,273 1,322 +3.84%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 161.05 167.23 +3.84%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 236.69 227.78 -3.77%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-4] 180.62 187.22 +3.65%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 156.93 162.44 +3.51%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 154.80 160.20 +3.49%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-memmap_cpu_storage_cud... 974.80 940.94 -3.47%
benchmarks/test_collectors_benchmark.py::test_single_pixels 6.0618 6.2711 +3.45%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 355.73 343.78 -3.36%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 162.92 168.24 +3.27%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-4] 46.92 48.45 +3.26%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] 20,149 20,776 +3.11%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 504.76 520.10 +3.04%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 27,409 28,223 +2.97%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 61,820 63,653 +2.97%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 158.12 162.76 +2.94%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 55,062 56,677 +2.93%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 496.98 482.48 -2.92%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 616.10 634.04 +2.91%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 302.74 293.99 -2.89%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 761.98 740.28 -2.85%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 219.46 213.29 -2.81%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 28.81 29.62 +2.79%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[numpy] 374,034 363,755 -2.75%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 939.29 963.63 +2.59%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 62,909 64,499 +2.53%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[50-img_shape0-small] 839.68 860.64 +2.50%
benchmarks/test_envs_benchmark.py::test_serial 0.4151 0.4254 +2.48%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 21,539 22,072 +2.47%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,208 19,674 +2.43%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 36,892 37,769 +2.38%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 48,677 49,825 +2.36%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 6,025 5,889 -2.26%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 37,536 38,381 +2.25%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1,283 1,254 -2.23%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 0.5872 0.6002 +2.21%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-64] 4.4848 4.5817 +2.16%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] 147.81 144.63 -2.16%
benchmarks/test_collectors_benchmark.py::test_sync_pixels 10.52 10.30 -2.15%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 29,864 30,502 +2.14%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 814.11 797.03 -2.10%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 26,441 26,994 +2.09%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-4] 145.45 148.49 +2.09%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 0.2097 0.2139 +1.99%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-16] 17.93 18.27 +1.90%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 659.83 672.36 +1.90%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 132.29 134.79 +1.89%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 14.69 14.96 +1.88%
benchmarks/test_collectors_benchmark.py::test_single_with_rb_pixels 5.3442 5.2439 -1.88%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] 70.80 72.13 +1.87%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 74,851 76,237 +1.85%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 190.27 193.78 +1.84%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-backward] 241.96 237.56 -1.82%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sample_mixed_devices[1000000-cuda_storage_cuda_samp... 1,481 1,455 -1.74%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 49.15 48.29 -1.74%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,088 19,418 +1.73%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] 4,592 4,672 +1.73%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-1] 521.83 512.92 -1.71%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 281.91 286.58 +1.66%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 22,425 22,791 +1.63%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 289.48 294.13 +1.61%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] 472.80 465.42 -1.56%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 736.09 724.77 -1.54%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 452.48 445.58 -1.52%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 950.03 964.03 +1.47%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 371.40 365.94 -1.47%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-16] 12.02 12.19 +1.43%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-480-640-1] 77.56 78.63 +1.38%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-backward] 67.35 68.27 +1.37%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] 11,773 11,932 +1.36%
benchmarks/test_collectors_benchmark.py::test_async_pixels 10.85 10.71 -1.33%
... ... ... Showing 120 of 226 comparisons, sorted by absolute change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worth a paragraph in the doc somewhere

vmoens added 3 commits June 22, 2026 09:18
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
device_config: InferenceDeviceConfig | None = None,
shutdown_event: threading.Event | MPEvent | None = None,
):
if server_config is not None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't enforce mutual exclusivity the way the error message claims. It compares the kwargs' values to the defaults, so it only fires when a kwarg differs.

shutdown_event: threading.Event | MPEvent | None = None,
):
if server_config is not None:
if (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comment above, this doesn't really solve mutual exclusivity. I'd suggest defaulting these kwargs to a private sentinel and checking if they have been set instead of comparing values. We can implemented these changes for ProcessInferenceServer.init and AsyncBatchedCollector.init.

"policy_device, and output_device."
)
policy_device = device_config.policy_device
output_device = device_config.server_output_device()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server_output_device() substitutes env_device for output_device when the latter is unset. I'd either reject them when a server consumes the config, or document on the server that those two fields are accepted but ignored.

client = transport.client()
result = client(TensorDict({"observation": torch.randn(4)}))
stats = server.stats()
assert result["action"].device.type == "cpu"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would still pass if server_output_device(), the output_device move, or the _env_loop device moves were completely broken. To have them work correctly, either mock the tensordict and assert .to(target) was called with the expected device, or add a @pytest.mark.gpu variant that moves CPU to CUDA and checks the result landed on the right device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Documentation Improvements or additions to documentation Integrations/torch_geometric Integrations Modules Refactoring Refactoring of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants