SummerTree作者小样本学习与智能前沿公众号

卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning

在这里插入图片描述

原文链接:https://mp.weixin.qq.com/s?__biz=MzU2OTgxNDgxNQ==&mid=2247485651&idx=1&sn=2a083ff0256924cb0e8da717f92f65b8&chksm=fcf9b2a3cb8e3bb5c6a7b54fcaa6b1279009a012598ef6ada88adfcef99d4e056861732bb851&token=1130212577&lang=zh_CN#rd

Goals for the lecture:

Introduction & overview of the key methods and developments. [Good starting point for you to start reading and understanding papers!]

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

在这里插入图片描述

Motivation and some examples

When is standard machine learning not enough? Standard ML finally works for well-defined, stationary tasks.在这里插入图片描述But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?在这里插入图片描述

General formulation and probabilistic view

What is meta-learning? Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:在这里插入图片描述Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description在这里插入图片描述

A Toy Example: Few-shot Image Classification在这里插入图片描述在这里插入图片描述

Other (practical) Examples of Few-shot Learning在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

  • Start with a common model initialization θθ
  • Given a new task TiTi , adapt the model using a gradient step:在这里插入图片描述
  • Meta-training is learning a shared initialization for all tasks:在这里插入图片描述在这里插入图片描述

Does MAML Work?在这里插入图片描述

MAML from a Probabilistic Standpoint Training points:在这里插入图片描述testing points:在这里插入图片描述MAML with log-likelihood loss对数似然损失:在这里插入图片描述在这里插入图片描述

One More Example: One-shot Imitation Learning 模仿学习在这里插入图片描述

Prototype-based Meta-learning在这里插入图片描述Prototypes:在这里插入图片描述Predictive distribution:在这里插入图片描述Does Prototype-based Meta-learning Work?在这里插入图片描述

Rapid Learning or Feature Reuse 特征重用在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs In few-shot learning:

  • Learn to identify functions that generated the data from just a few examples.
  • The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

  • Given a few (x, y) pairs, we can compute the predictive mean and variance.
  • Our prior knowledge is encapsulated in the kernel function.

在这里插入图片描述

Conditional Neural Processes 条件神经过程在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

On software packages for meta-learning A lot of research code releases (code is fragile and sometimes broken) A few notable libraries that implement a few specific methods:

  • Torchmeta (https://github.com/tristandeleu/pytorch-meta)
  • Learn2learn (https://github.com/learnables/learn2learn)
  • Higher (https://github.com/facebookresearch/higher)

在这里插入图片描述Takeaways

  • Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
  • Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
  • Two families of widely popular methods:
    • Gradient-based meta-learning (MAML and such)
    • Prototype-based meta-learning (Protonets, Neural Processes, …)
    • Many hybrids, extensions, improvements (CAIVA, MetaSGD, …)
  • Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
  • Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:在这里插入图片描述Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description在这里插入图片描述Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.在这里插入图片描述

Meta-learning for RL在这里插入图片描述

On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL:快速回顾在这里插入图片描述REINFORCE algorithm:在这里插入图片描述

On-policy Meta-RL: MAML (again!)

  • Start with a common policy initialization θθ
  • Given a new task TiTi , collect data using initial policy, then adapt using a gradient step:在这里插入图片描述
  • Meta-training is learning a shared initialization for all tasks:在这里插入图片描述在这里插入图片描述Adaptation as Inference 适应推理 Treat policy parameters, tasks, and all trajectories as random variables随机变量在这里插入图片描述meta-learning = learning a prior and adaptation = inference在这里插入图片描述Off-policy meta-RL: PEARL在这里插入图片描述在这里插入图片描述

Key points:

  • Infer latent representations z of each task from the trajectory data.
  • The inference networkq is decoupled from the policy, which enables off-policy learning.
  • All objectives involve the inference and policy networks.在这里插入图片描述

Adaptation in nonstationary environments 不稳定环境 Classical few-shot learning setup:

  • The tasks are i.i.d. samples from some underlying distribution.
  • Given a new task, we get to interact with it before adapting.
  • What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?在这里插入图片描述Example: adaptation to a learning opponent在这里插入图片描述Each new round is a new task. Nonstationary environment is a sequence of tasks.

Continuous adaptation setup:

  • The tasks are sequentially dependent.
  • meta-learn to exploit dependencies在这里插入图片描述

Continuous adaptation

Treat policy parameters, tasks, and all trajectories as random variables在这里插入图片描述

RoboSumo: a multiagent competitive env an agent competes vs. an opponent, the opponent’s behavior changes over time在这里插入图片描述

Takeaways

  • Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
  • Both on-policy and off-policy RL can be “upgraded” to meta-RL:
    • On-policy meta-RL is directly enabled by MAML
    • Decoupling task inference and policy learning enables off-policy methods
  • Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
  • Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
  • Very active area of research.

小样本学习与智能前沿
小样本学习与智能前沿

专注于小样本学习、迁移学习、领域自适应等技术在自然语言处理、计算机视觉等人工智能领域中的应用分析。

理论MeTA强化学习元学习
4
相关数据
元学习技术

元学习是机器学习的一个子领域,是将自动学习算法应用于机器学习实验的元数据上。现在的 AI 系统可以通过大量时间和经验从头学习一项复杂技能。但是,我们如果想使智能体掌握多种技能、适应多种环境,则不应该从头开始在每一个环境中训练每一项技能,而是需要智能体通过对以往经验的再利用来学习如何学习多项新任务,因此我们不应该独立地训练每一个新任务。这种学习如何学习的方法,又叫元学习(meta-learning),是通往可持续学习多项新任务的多面智能体的必经之路。

高斯过程技术

模仿学习技术

模仿学习(Imitation Learning)背后的原理是是通过隐含地给学习器关于这个世界的先验信息,就能执行、学习人类行为。在模仿学习任务中,智能体(agent)为了学习到策略从而尽可能像人类专家那样执行一种行为,它会寻找一种最佳的方式来使用由该专家示范的训练集(输入-输出对)。

推荐文章
暂无评论
暂无评论~