15.6. 小结¶

在这一章，我们简单介绍了强化学习的基本概念，包括单智能体和多智能体强化学习算法、单节点和分布式强化学习系统等，给读者对强化学习问题的基本认识。当前，强化学习是一个快速发展的深度学习分支，许多实际问题都有可能通过强化学习算法的进一步发展得到解决。另一方面，由于强化学习问题设置的特殊性（如需要与环境交互进行采样等），也使得相应算法对计算系统的要求更高：如何更好地平衡样本采集和策略训练过程？如何均衡CPU和GPU等不同计算硬件的能力？如何在大规模分布式系统上有效部署强化学习智能体？等等，都需要对计算机系统的设计和使用有更好的理解。

15.7. 参考文献¶

Berner et al., 2019: Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … others. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.
Cassirer et al., 2021: Cassirer, A., Barth-Maron, G., Brevdo, E., Ramos, S., Boyd, T., Sottiaux, T., & Kroiss, M. (2021). Reverb: a framework for experience replay. arXiv preprint arXiv:2102.04736.
Chu et al., 2011: Chu, W., Zinkevich, M., Li, L., Thomas, A., & Tseng, B. (2011). Unbiased online active learning in data streams. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 195–203). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/2020408.2020444, doi:10.1145/2020408.2020444
Ding et al., 2020: Ding, Z., Yu, T., Huang, Y., Zhang, H., Li, G., Guo, Q., … Dong, H. (2020). Efficient reinforcement learning development with rlzoo. arXiv preprint arXiv:2009.08644.
Espeholt et al., 2019: Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., & Michalski, M. (2019). Seed rl: scalable and efficient deep-rl with accelerated central inference. arXiv preprint arXiv:1910.06591.
Espeholt et al., 2018: Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., … others. (2018). Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561.
Foerster et al., 2018: Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. Proceedings of the AAAI conference on artificial intelligence.
Ginart et al., 2021: Ginart, A., Naumov, M., Mudigere, D., Yang, J., & Zou, J. (2021). Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems.
Gong et al., 2020: Gong, Y., Jiang, Z., Feng, Y., Hu, B., Zhao, K., Liu, Q., & Ou, W. (2020). Edgerec: recommender system on edge in mobile taobao. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 2477–2484).
Han et al., 2020: Han, L., Xiong, J., Sun, P., Sun, X., Fang, M., Guo, Q., … others. (2020). Tstarbot-x: an open-sourced and comprehensive study for efficient league training in starcraft ii full game. arXiv preprint arXiv:2011.13729.
He et al., 2020: He, C., Annavaram, M., & Avestimehr, S. (2020). Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.). Group knowledge transfer: federated learning of large cnns at the edge. Advances in Neural Information Processing Systems (pp. 14068–14080). Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2020/file/a1d4c20b182ad7137ab3606f0e3fc8a4-Paper.pdf
He et al., 2014: He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., … Candela, J. Q. (2014). Practical lessons from predicting clicks on ads at facebook. Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (pp. 1–9). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/2648584.2648589, doi:10.1145/2648584.2648589
Hoffman et al., 2020: Hoffman, M., Shahriari, B., Aslanides, J., Barth-Maron, G., Behbahani, F., Norman, T., … others. (2020). Acme: a research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979.
Horgan et al., 2018: Horgan, D., Quan, J., Budden, D., Barth-Maron, G., Hessel, M., van Hasselt, H., & Silver, D. (2018). Distributed Prioritized Experience Replay.
Jiang et al., 2021: Jiang, W., He, Z., Zhang, S., Preuß er, T. B., Zeng, K., Feng, L., … Alonso, G. (2021). Smola, A., Dimakis, A., & Stoica, I. (Eds.). Microrec: efficient recommendation inference by hardware and data structure solutions. Proceedings of Machine Learning and Systems (pp. 845–859). URL: https://proceedings.mlsys.org/paper/2021/file/ec8956637a99787bd197eacd77acce5e-Paper.pdf
Lanctot et al., 2017: Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., … Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30.
Liang et al., 2017: Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Gonzalez, J., … Stoica, I. (2017). Ray rllib: a composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, p. 85.
Makoviychuk et al., 2021: Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., … others. (2021). Isaac gym: high performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.
Mnih et al., 2016: Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., … Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning (ICML) (pp. 1928–1937).
Mnih et al., 2013: Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Moritz et al., 2018: Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., … others. (2018). Ray: a distributed framework for emerging $\$AI$\$ applications. 13th $\$USENIX$\$ Symposium on Operating Systems Design and Implementation ($\$OSDI$\$ 18) (pp. 561–577).
Mudigere et al., 2021: Mudigere, D., Hao, Y., Huang, J., Jia, Z., Tulloch, A., Sridharan, S., … others. (2021). Software-hardware co-design for fast and scalable training of deep learning recommendation models. arXiv preprint arXiv:2104.05158.
NVIDIA, 2017: NVIDIA (2017). NVIDIA Tesla V100 GPU Architecture: The World’s Most Advanced Datacenter GPU. http://www.nvidia.com/object/volta-architecture-whitepaper.html.
Rashid et al., 2018: Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., & Whiteson, S. (2018). Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning (pp. 4295–4304).
Shi et al., 2020: Shi, H.-J. M., Mudigere, D., Naumov, M., & Yang, J. (2020). Compositional embeddings using complementary partitions for memory-efficient recommendation systems. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 165–175). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/3394486.3403059, doi:10.1145/3394486.3403059
Sunehag et al., 2017: Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., … others. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.
Tian et al., 2018: Tian, H., Yu, M., & Wang, W. (2018). Continuum: a platform for cost-aware, low-latency continual learning. Proceedings of the ACM Symposium on Cloud Computing (pp. 26–40). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/3267809.3267817, doi:10.1145/3267809.3267817
Vinyals et al., 2019: Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … others. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782), 350–354.
Wang et al., 2021: Wang, X., Song, J., Qi, P., Peng, P., Tang, Z., Zhang, W., … others. (2021). Scc: an efficient deep reinforcement learning agent mastering the game of starcraft ii. International Conference on Machine Learning (pp. 10905–10915).
Xie et al., 2020: Xie, M., Ren, K., Lu, Y., Yang, G., Xu, Q., Wu, B., … Shu, J. (2020). Kraken: memory-efficient continual learning for large-scale real-time recommendations. SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1–17). doi:10.1109/SC41405.2020.00025
Yin et al., 2021: Yin, C., Acun, B., Wu, C.-J., & Liu, X. (2021). Smola, A., Dimakis, A., & Stoica, I. (Eds.). Tt-rec: tensor train compression for deep learning recommendation models. Proceedings of Machine Learning and Systems (pp. 448–462). URL: https://proceedings.mlsys.org/paper/2021/file/979d472a84804b9f647bc185a877a8b5-Paper.pdf
Zhao et al., 2020: Zhao, W., Xie, D., Jia, R., Qian, Y., Ding, R., Sun, M., & Li, P. (2020). Dhillon, I., Papailiopoulos, D., & Sze, V. (Eds.). Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. Proceedings of Machine Learning and Systems (pp. 412–428). URL: https://proceedings.mlsys.org/paper/2020/file/f7e6c85504ce6e82442c770f7c8606f0-Paper.pdf