这个列表中的论文主要是关于深度强化学习和RL / AI，希望它对大家有所帮助。有关NeurIPS 2018中强化学习论文的清单如下，按第一作者姓氏的字母顺序排列。
Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter.
Differentiable MPC for end-to-end planning and control.
Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, and Nando de Freitas.
Playing hard exploration games by watching YouTube.
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee.
Sample-efficient reinforcement learning with stochastic ensemble value expansion.
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine.
Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.
Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter.
End-to-end differentiable physics for learning and control.
Amir massoud Farahmand.
Iterative value-aware model learning.
Justin Fu, Sergey Levine, Dibya Ghosh, Larry Yang, and Avi Singh.
An event-based framework for task specification and control.
Vikash Goel, Jameson Weng, and Pascal Poupart.
Unsupervised video object segmentation for deep reinforcement learning.
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine.
Meta-reinforcement learning of structured exploration strategies.
David Ha and Jürgen Schmidhuber.
Recurrent world models facilitate policy evolution.
Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, and Daniel Yamins.Learning to play with intrinsically-motivated, self-aware agents.
Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel.
Evolved policy gradients.
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, and Eric Xing.
Deep generative models with learnable knowledge constraints.
Jiexi Huang, Fa Wu, Doina Precup, and Yang Cai.
Learning safe policies with expert guidance.
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Xiaojin Zhu.
Adversarial attacks on stochastic bandits.
Raksha Kumaraswamy, Matthew Schlegel, Adam White, and Martha White.Context-dependent upper-confidence bounds for directed exploration.
Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez.
Human-in-the-loop interpretability prior.
Marc Lanctot, Sriram Srinivasan, Vinicius Zambaldi, Julien Perolat, karl Tuyls, Remi Munos, and Michael Bowling.
Actor-critic policy optimization in partially observable multiagent environments.
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.
Data center cooling using model-predictive control.
Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, and Shane Legg.
Reward learning from human preferences and demonstrations in Atari.
Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song.
Learning temporal point processes via reinforcement learning.
Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric Xing.
Hybrid retrieval-generation reinforced agent for medical image report generation.
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao.Memory augmented policy optimization for program synthesis with generalization.
Qiang Liu, Lihong Li, Ziyang Tang, and Denny Zhou.
Breaking the curse of horizon: Infinite-horizon off-policy estimation.
Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, and Emma Brunskill.
Representation balancing MDPs for off-policy policy evaluation.
Tyler Lu, Craig Boutilier, and Dale Schuurmans.
Non-delusional Q-learning and value-iteration.
Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet.
Are GANs created equal? a large-scale study.
David Alvarez Melis and Tommi Jaakkola.
Towards robust interpretability with self-explaining neural networks.
Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt.
DeepProbLog: Neural probabilistic logic programming.
Horia Mania, Aurelia Guy, and Benjamin Recht.
Simple random search of static linear policies is competitive for reinforcement learning.
Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, and Daniel Yamins.
A flexible neural representation for physics prediction.
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.
Data-efficient hierarchical reinforcement learning.
Ashvin Nair, Vitchyr Pong, Shikhar Bahl, Sergey Levine, Steven Lin, and Murtaza Dalal.
Visual goal-conditioned reinforcement learning by representation learning.
Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi.
Scalable end-to-end autonomous vehicle testing via rare-event simulation.
Ian Osband, John S Aslanides, and Albin Cassirer.
Randomized prior functions for deep reinforcement learning.
Matthew Riemer, Miao Liu, and Gerald Tesauro.
Learning abstract options.
Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Tim Lillicrap.
Relational recurrent neural networks.
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry.
How does batch normalization help optimization? (no, it is not about internal covariate shift).
Ozan Sener and Vladlen Koltun.
Multi-task learning as multi-objective optimization.
Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon.
Multi-agent generative adversarial imitation learning.
Wen Sun, Geoffrey Gordon, Byron Boots, and J. Bagnell.
Dual policy iteration.
Aviv Tamar, Pieter Abbeel, Ge Yang, Thanard Kurutach, and Stuart Russell.Learning plannable representations with causal InfoGAN.
Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom.Neural arithmetic logic units.
Tongzhou Wang, YI WU, David Moore, and Stuart Russell.
Meta-learning MCMC proposals.
Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo.
Transfer learning with neural AutoML.
Kelvin Xu, Chelsea Finn, and Sergey Levine.
Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.
Zhongwen Xu, Hado van Hasselt, and David Silver.
Meta-gradient reinforcement learning.
Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum.
Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.
Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, and Richard Zemel.
Neural guided con- straint logic programming for program synthesis.
Yu Zhang, Ying Wei, and Qiang Yang.
Learning to multitask.
Zeyu Zheng, Junhyuk Oh, and Satinder Singh.
On learning intrinsic rewards for policy gradient methods.