15.6. 小结

在这一章,我们简单介绍了强化学习的基本概念,包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等,给读者对强化学习问题的基本认识。当前,强化学习是一个快速发展的深度学习分支,许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面,由于强化学习问题设置的特殊性(如需要与环境交互进行采样等),也使得相应算法对计算系统的要求更高:如何更好地平衡样本采集和策略训练过程?如何均衡CPU和GPU等不同计算硬件的能力?如何在大规模分布式系统上有效部署强化学习智能体?等等,都需要对计算机系统的设计和使用有更好的理解。

15.7. 参考文献

Berner et al., 2019

Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … others. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.

Cassirer et al., 2021

Cassirer, A., Barth-Maron, G., Brevdo, E., Ramos, S., Boyd, T., Sottiaux, T., & Kroiss, M. (2021). Reverb: a framework for experience replay. arXiv preprint arXiv:2102.04736.

Chu et al., 2011

Chu, W., Zinkevich, M., Li, L., Thomas, A., & Tseng, B. (2011). Unbiased online active learning in data streams. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 195–203). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/2020408.2020444, doi:10.1145/2020408.2020444

Ding et al., 2020

Ding, Z., Yu, T., Huang, Y., Zhang, H., Li, G., Guo, Q., … Dong, H. (2020). Efficient reinforcement learning development with rlzoo. arXiv preprint arXiv:2009.08644.

Espeholt et al., 2019

Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., & Michalski, M. (2019). Seed rl: scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591.

Espeholt et al., 2018

Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., … others. (2018). Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561.

Foerster et al., 2018

Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. Proceedings of the AAAI conference on artificial intelligence.

Ginart et al., 2021

Ginart, A., Naumov, M., Mudigere, D., Yang, J., & Zou, J. (2021). Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems.

Gong et al., 2020

Gong, Y., Jiang, Z., Feng, Y., Hu, B., Zhao, K., Liu, Q., & Ou, W. (2020). Edgerec: recommender system on edge in mobile taobao. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 2477–2484).

Han et al., 2020

Han, L., Xiong, J., Sun, P., Sun, X., Fang, M., Guo, Q., … others. (2020). Tstarbot-x: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. arXiv preprint arXiv:2011.13729.

He et al., 2020

He, C., Annavaram, M., & Avestimehr, S. (2020). Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.). Group knowledge transfer: federated learning of large cnns at the edge. Advances in Neural Information Processing Systems (pp. 14068–14080). Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2020/file/a1d4c20b182ad7137ab3606f0e3fc8a4-Paper.pdf

He et al., 2014

He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., … Candela, J. Q. (2014). Practical lessons from predicting clicks on ads at facebook. Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (pp. 1–9). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/2648584.2648589, doi:10.1145/2648584.2648589

Hoffman et al., 2020

Hoffman, M., Shahriari, B., Aslanides, J., Barth-Maron, G., Behbahani, F., Norman, T., … others. (2020). Acme: a research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979.

Horgan et al., 2018

Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., & Silver, D. (2018). Distributed Prioritized Experience Replay.

Jiang et al., 2021

Jiang, W., He, Z., Zhang, S., Preuß er, T. B., Zeng, K., Feng, L., … Alonso, G. (2021). Smola, A., Dimakis, A., & Stoica, I. (Eds.). Microrec: efficient recommendation inference by hardware and data structure solutions. Proceedings of Machine Learning and Systems (pp. 845–859). URL: https://proceedings.mlsys.org/paper/2021/file/ec8956637a99787bd197eacd77acce5e-Paper.pdf

Lanctot et al., 2017

Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., … Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.

Liang et al., 2017

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Gonzalez, J., … Stoica, I. (2017). Ray rllib: a composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, p. 85.

Makoviychuk et al., 2021

Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., … others. (2021). Isaac gym: high performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.

Mnih et al., 2016

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., … Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning (ICML) (pp. 1928–1937).

Mnih et al., 2013

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

Moritz et al., 2018

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., … others. (2018). Ray: a distributed framework for emerging $\$AI$\$ applications. 13th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 18) (pp. 561–577).

Mudigere et al., 2021

Mudigere, D., Hao, Y., Huang, J., Jia, Z., Tulloch, A., Sridharan, S., … others. (2021). Software-hardware co-design for fast and scalable training of deep learning recommendation models. arXiv preprint arXiv:2104.05158.

NVIDIA, 2017

NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.

Rashid et al., 2018

Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning (pp. 4295–4304).

Shi et al., 2020

Shi, H.-J. M., Mudigere, D., Naumov, M., & Yang, J. (2020). Compositional embeddings using complementary partitions for memory-efficient recommendation systems. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 165–175). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/3394486.3403059, doi:10.1145/3394486.3403059

Sunehag et al., 2017

Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., … others. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.

Tian et al., 2018

Tian, H., Yu, M., & Wang, W. (2018). Continuum: a platform for cost-aware, low-latency continual learning. Proceedings of the ACM Symposium on Cloud Computing (pp. 26–40). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/3267809.3267817, doi:10.1145/3267809.3267817

Vinyals et al., 2019

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … others. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.

Wang et al., 2021

Wang, X., Song, J., Qi, P., Peng, P., Tang, Z., Zhang, W., … others. (2021). Scc: an efficient deep reinforcement learning agent mastering the game of starcraft ii. International Conference on Machine Learning (pp. 10905–10915).

Xie et al., 2020

Xie, M., Ren, K., Lu, Y., Yang, G., Xu, Q., Wu, B., … Shu, J. (2020). Kraken: memory-efficient continual learning for large-scale real-time recommendations. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–17). doi:10.1109/SC41405.2020.00025

Yin et al., 2021

Yin, C., Acun, B., Wu, C.-J., & Liu, X. (2021). Smola, A., Dimakis, A., & Stoica, I. (Eds.). Tt-rec: tensor train compression for deep learning recommendation models. Proceedings of Machine Learning and Systems (pp. 448–462). URL: https://proceedings.mlsys.org/paper/2021/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf

Zhao et al., 2020

Zhao, W., Xie, D., Jia, R., Qian, Y., Ding, R., Sun, M., & Li, P. (2020). Dhillon, I., Papailiopoulos, D., & Sze, V. (Eds.). Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. Proceedings of Machine Learning and Systems (pp. 412–428). URL: https://proceedings.mlsys.org/paper/2020/file/f7e6c85504ce6e82442c770f7c8606f0-Paper.pdf