NeurIPS 2018值得一读的强化学习论文清单

这个列表中的论文主要是关于深度强化学习和RL / AI,希望它对大家有所帮助。有关NeurIPS 2018中强化学习论文的清单如下,按第一作者姓氏的字母顺序排列。

  1. Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, and J. Zico Kolter.

    Differentiable MPC for end-to-end planning and control.

  2. Yusuf Aytar, Tobias Pfaff, David Budden, Thomas Paine, Ziyu Wang, and Nando de Freitas.

    Playing hard exploration games by watching YouTube.

  3. Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee.

    Sample-efficient reinforcement learning with stochastic ensemble value expansion.

  4. Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine.

    Data-efficient model-based reinforcement learning with deep probabilistic dynamics models.

  5. Filipe de Avila Belbute-Peres, Kevin Smith, Kelsey Allen, Josh Tenenbaum, and J. Zico Kolter.

    End-to-end differentiable physics for learning and control.

  6. Amir massoud Farahmand.

    Iterative value-aware model learning.

  7. Justin Fu, Sergey Levine, Dibya Ghosh, Larry Yang, and Avi Singh.

    An event-based framework for task specification and control.

  8. Vikash Goel, Jameson Weng, and Pascal Poupart.

    Unsupervised video object segmentation for deep reinforcement learning.

  9. Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine.

    Meta-reinforcement learning of structured exploration strategies.

  10. David Ha and Jürgen Schmidhuber.

    Recurrent world models facilitate policy evolution.

  11. Nick Haber, Damian Mrowca, Stephanie Wang, Li Fei-Fei, and Daniel Yamins.Learning to play with intrinsically-motivated, self-aware agents.

  12. Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel.

    Evolved policy gradients.

  13. Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, LIANHUI Qin, Xiaodan Liang, Haoye Dong, and Eric Xing.

    Deep generative models with learnable knowledge constraints.

  14. Jiexi Huang, Fa Wu, Doina Precup, and Yang Cai.

    Learning safe policies with expert guidance.

  15. Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Xiaojin Zhu.

    Adversarial attacks on stochastic bandits.

  16. Raksha Kumaraswamy, Matthew Schlegel, Adam White, and Martha White.Context-dependent upper-confidence bounds for directed exploration.

  17. Isaac Lage, Andrew Ross, Samuel J Gershman, Been Kim, and Finale Doshi-Velez.

    Human-in-the-loop interpretability prior.

  18. Marc Lanctot, Sriram Srinivasan, Vinicius Zambaldi, Julien Perolat, karl Tuyls, Remi Munos, and Michael Bowling.

    Actor-critic policy optimization in partially observable multiagent environments.

  19. Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, and Greg Imwalle.

    Data center cooling using model-predictive control.

  20. Jan Leike, Borja Ibarz, Dario Amodei, Geoffrey Irving, and Shane Legg.

    Reward learning from human preferences and demonstrations in Atari.

  21. Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, and Le Song.

    Learning temporal point processes via reinforcement learning.

  22. Yuan Li, Xiaodan Liang, Zhiting Hu, and Eric Xing.

    Hybrid retrieval-generation reinforced agent for medical image report generation.

  23. Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao.Memory augmented policy optimization for program synthesis with generalization.

  24. Qiang Liu, Lihong Li, Ziyang Tang, and Denny Zhou.

    Breaking the curse of horizon: Infinite-horizon off-policy estimation.

  25. Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo A Faisal, Finale Doshi-Velez, and Emma Brunskill.

    Representation balancing MDPs for off-policy policy evaluation.

  26. Tyler Lu, Craig Boutilier, and Dale Schuurmans.

    Non-delusional Q-learning and value-iteration.

  27. Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet.

    Are GANs created equal? a large-scale study.

  28. David Alvarez Melis and Tommi Jaakkola.

    Towards robust interpretability with self-explaining neural networks.

  29. Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt.

    DeepProbLog: Neural probabilistic logic programming.

  30. Horia Mania, Aurelia Guy, and Benjamin Recht.

    Simple random search of static linear policies is competitive for reinforcement learning.

  31. Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Li Fei-Fei, Josh Tenenbaum, and Daniel Yamins.

    A flexible neural representation for physics prediction.

  32. Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.

    Data-efficient hierarchical reinforcement learning.

  33. Ashvin Nair, Vitchyr Pong, Shikhar Bahl, Sergey Levine, Steven Lin, and Murtaza Dalal.

    Visual goal-conditioned reinforcement learning by representation learning.

  34. Matthew O’Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, and John C Duchi.

    Scalable end-to-end autonomous vehicle testing via rare-event simulation.

  35. Ian Osband, John S Aslanides, and Albin Cassirer.

    Randomized prior functions for deep reinforcement learning.

  36. Matthew Riemer, Miao Liu, and Gerald Tesauro.

    Learning abstract options.

  37. Adam Santoro, Ryan Faulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane Weber, Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Tim Lillicrap.

    Relational recurrent neural networks.

  38. Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry.

    How does batch normalization help optimization? (no, it is not about internal covariate shift).

  39. Ozan Sener and Vladlen Koltun.

    Multi-task learning as multi-objective optimization.

  40. Jiaming Song, Hongyu Ren, Dorsa Sadigh, and Stefano Ermon.

    Multi-agent generative adversarial imitation learning.

  41. Wen Sun, Geoffrey Gordon, Byron Boots, and J. Bagnell.

    Dual policy iteration.

  42. Aviv Tamar, Pieter Abbeel, Ge Yang, Thanard Kurutach, and Stuart Russell.Learning plannable representations with causal InfoGAN.

  43. Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, and Phil Blunsom.Neural arithmetic logic units.

  44. Tongzhou Wang, YI WU, David Moore, and Stuart Russell.

    Meta-learning MCMC proposals.

  45. Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo.

    Transfer learning with neural AutoML.

  46. Kelvin Xu, Chelsea Finn, and Sergey Levine.

    Uncertainty-aware few-shot learning with probabilistic model-agnostic meta-learning.

  47. Zhongwen Xu, Hado van Hasselt, and David Silver.

    Meta-gradient reinforcement learning.

  48. Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, and Josh Tenenbaum.

    Neural-Symbolic VQA: Disentangling reasoning from vision and language understanding.

  49. Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William Byrd, Matthew Might, Raquel Urtasun, and Richard Zemel.

    Neural guided con- straint logic programming for program synthesis.

  50. Yu Zhang, Ying Wei, and Qiang Yang.

    Learning to multitask.

  51. Zeyu Zheng, Junhyuk Oh, and Satinder Singh.

    On learning intrinsic rewards for policy gradient methods.

信息来源:https://medium.com/@yuxili/nips-2018-rl-papers-to-read-5bc1edb85a28

AMiner学术头条
AMiner学术头条

AMiner平台由清华大学计算机系研发,拥有我国完全自主知识产权。系统2006年上线,吸引了全球220个国家/地区800多万独立IP访问,数据下载量230万次,年度访问量1000万,成为学术搜索和社会网络挖掘研究的重要数据和实验平台。

https://www.aminer.cn/
专栏二维码
入门强化学习NIPS 2018论文
6
相关数据
Doina Precup人物

罗马尼亚人工智能专家,目前居住在蒙特利尔。她是加拿大麦吉尔大学理学院研究副院长,加拿大机器学习研究主席和加拿大高级研究院高级研究员。她还是DeepMind蒙特利尔办事处的负责人。

Shane Legg人物

DeepMind 联合创始人、首席科学家

Kelvin Xu人物

加州大学伯克利分校在读博士,导师为Sergey Levine。曾参与Google Brain Residency Program(现AI Residency)。

深度强化学习技术

强化学习(Reinforcement Learning)是主体(agent)通过与周围环境的交互来进行学习。强化学习主体(RL agent)每采取一次动作(action)就会得到一个相应的数值奖励(numerical reward),这个奖励表示此次动作的好坏。通过与环境的交互,综合考虑过去的经验(exploitation)和未知的探索(exploration),强化学习主体通过试错的方式(trial and error)学会如何采取下一步的动作,而无需人类显性地告诉它该采取哪个动作。强化学习主体的目标是学习通过执行一系列的动作来最大化累积的奖励(accumulated reward)。 一般来说,真实世界中的强化学习问题包括巨大的状态空间(state spaces)和动作空间(action spaces),传统的强化学习方法会受限于维数灾难(curse of dimensionality)。借助于深度学习中的神经网络,强化学习主体可以直接从原始输入数据(如游戏图像)中提取和学习特征知识,然后根据提取出的特征信息再利用传统的强化学习算法(如TD Learning,SARSA,Q-Learnin)学习控制策略(如游戏策略),而无需人工提取或启发式学习特征。这种结合了深度学习的强化学习方法称为深度强化学习。

Q学习技术

Q学习是一种用于机器学习的强化学习技术。 Q-Learning的目标是学习一种策略,告诉智能体在什么情况下要采取什么行动。 它不需要对环境建模,可以处理随机转换和奖励的问题,而无需进行调整。

强化学习技术

强化学习是一种试错方法,其目标是让软件智能体在特定环境中能够采取回报最大化的行为。强化学习在马尔可夫决策过程环境中主要使用的技术是动态规划(Dynamic Programming)。流行的强化学习方法包括自适应动态规划(ADP)、时间差分(TD)学习、状态-动作-回报-状态-动作(SARSA)算法、Q 学习、深度强化学习(DQN);其应用包括下棋类游戏、机器人控制和工作调度等。

暂无评论
暂无评论~