研学社 · 入门组 | 第七期:不需要老师的无监督学习

近些年,人工智能领域发生了飞跃性的突破,更使得许多科技领域的学生或工作者对这一领域产生了浓厚的兴趣。在入门人工智能的道路上,The Master Algorithm 可以说是必读书目之一,其重要性不需多言。作者 Pedro Domingos 看似只是粗略地介绍了机器学习领域的主流思想,然而几乎所有当今已出现的、或未出现的重要应用均有所提及。本书既适合初学者从更宏观的角度概览机器学习这一领域,又埋下无数伏笔,让有心人能够对特定技术问题进行深入学习,是一本不可多得的指导性入门书籍。诙谐幽默的行文风格也让阅读的过程充满趣味。

以这本书为载体,机器之心「人工智能研学社 · 入门组」近期将正式开班!

加入方式


我们邀请所有对人工智能、机器学习感兴趣的初学者加入我们,通过对 The Master Algorithm 的阅读与讨论,宏观、全面地了解人工智能的发展历史与技术原理。

第七章总结

本章讨论了类比推理的三种算法:即最近邻、K近邻(KNN)和支持向量机(SVM)。K近邻算法是最近邻算法(k=1)的扩展,但K近邻可以处理更多的情况,并在使用一个相似度量的情况下对新的样本进行分类。KNN早在80年代初就已经作为一种非参数技术应用于统计学估计和模式识别。

支持向量机(SVM)框架是目前最流行的“非专门设计”监督学习方法:即如果我们没有任何专业领域的先验知识,那么SVM将是一个十分杰出的方法。该算法的初步设想由前苏联统计学家Vladimir Vapnik于60年代在PhD学位论文中提出。1993年,贝尔实验室对使用人工神经网络(ANN)实现“手写字体识别(HCR)”比较感兴趣,而 Vapnik 敢打赌支持向量机在手写字体识别任务上要比人工神经网络更加出色。

支持向量机主要有三个属性令其十分出色:

  1. 支持向量机构建的是最大间隔分类器,它的决策边界会保留样本点间的最大距离,而这种大间距就正好可以产生很好的泛化效果。
  2. 支持向量机可以构建线性分离超平面,但它同时有能力通过核函数将数据嵌入高维空间中。而这种高维线性分离器实际上在原空间是非线性的。这就意味着假设空间大大扩展了原来使用严格线性表征的方法。
  3. 支持向量机是非参数方法,它保留了训练样本,并可能需要将它们全部储存下来。但另一方面在实践上,它通常只保留原始样本中的一小部分,有时即采用很小的一个常数乘以维度数。因此支持向量机结合了非参数和参数模型的优点:它能灵活地表征复杂函数,并有效地防止过拟合现象。


第七周 Q&A 总结

  • 1.K近邻算法是如何基于最近邻算法提升的?
  • A:K近邻可以降低由最近邻算法带来的潜在偏差。
  • 2.为什么提高维度会令最近邻方法产生困难?
  • A:提高维度意味着需要计算更多的特征,这也就大大提高了计算复杂度。更不必说实际上很多特征都是不相关的,所以最近邻算法并不擅长于处理不相关的特征。
  • 3.解释一下支持向量机的机制。
  • A:SVM是一种二元分类模型,其旨在寻找能最大化样本数据间隔的超平面。

第八章总结

第八章“没有老师的学习(Learning without a teacher)”展示了目前查询所面临的问题:即如果一个“机器人婴儿”用人类的方法教导会怎么样?这表明一个人通过日常生活的经验和其所制定的决策,就是一种大脑用来执行这些行动的算法,它和前面我们提到的各种类比算法是一样的。而每一种算法都在大脑内显现一些痕迹,因此大脑就是一种尚未被发现的终极算法。


【重要部分】

  • 简介:
  •    研究者基于认知科学中的孩童学习理论开发出机器学习算法。
  • 物以类聚:
  •    作者介绍了聚类算法,它是一种将相似目标组合成一个集群的算法,如最大期望(EM)算法和k均值聚类算法。
  • 挖掘数据形:
  •    PCA 和 Isomap 是两种降维算法。
  • 回报驱动型机器人:
  •    作者介绍了强化学习,即一种依靠学习者对环境的响应而采取不同行动的技术。
  • 熟能生巧:
  •    Chunking 是一种启发自心理学的卓越算法。
  • 关系学习:
  •    该章节最后作者解释了另一种很具潜力的算法——关系学习。

【关键概念】
  • 聚类算法
  •    k均值算法
  •    朴素贝叶斯
  •    最大期望算法
  • 降维算法
  •    主成分分析(PCA)
  •    等距特征映射
  • 强化学习
  •   效果律
  • Chunking
  •    A power law
  •    Soar
  • 关系学习
  •   马尔可夫网络

【测试】

  1. 请分别给出两个马尔可夫模型和K均值聚类算法的应用案例。
  2. 请描述哪种情况下Isomap算法可以实现最好的降维方法。
  3. 带泛化的强化学习为什么经常很难得到一个稳定解?
  4. 聚类和Chunking有什么区别?

Chapter #7 Review

【Chapter Summary】

This chapter discusses three algorithms that use analogical reasoning: nearest-neighbor, K-nearest-neighbor and support vector machine. K-nearest-neighbor is an upgraded version of nearest-neighbor, but a simple algorithm that stores all available cases and classifies new cases based on a similarity measure. KNN has already been used in statistical estimation and pattern recognition in the beginning of 1970's as a non-parametric technique.

The support vector machine or SVM framework is currently the most popular approach for “off-the-shelf” supervised learning: if you do not have any specialized prior knowledge of a domain, then SVM is an excellent method. It is invented by Soviet statistician Vladimir Vapnik, who developed the initial idea of SVM while working on his PhD thesis in 1960s. In 1993, Bell lab was interested in “handwritten character recognition” (HCR) using Artificial Neural Network (ANN). Vapnik made a bet that SVM would perform better than ANN on handwritten character recognition task.

There are three properties that make the SVMs attractive :

  • 1) It constructs a maximum margin separator, a decision boundary with the largest possible distance to example points, which helps to produce a good generalization;
  • 2) SVMs create a linear separating hyperplane, but they have the ability to embed the data into a higher-dimensional space. The high-dimensional linear separator is actually nonlinear in the original space. This means the hypothesis space is greatly expanded over methods that use strictly linear representations;
  • 3) SVMs are nonparametric method, they retain training examples and potentially need to store them all. On the other hand, in practice, they often end up only retaining a small fraction of the original examples, sometimes as few as a small constant times the number of dimensions. Thus, SVMs combine the advantages of nonparametric and parametric models: they are flexible to represent complex functions, and are resistant to overfitting.

Week 7 Q & A Collection

  1. How does k-nearest neighbor improve based on nearest-neighbor?
  2. k-nearest neighbor reduced the bias potentially caused by a single nearest neighbor
  3. Why would rising dimensionality create problems for nearest-neighbor algorithms?
  4. Rising dimensinality means more features to be involved in computing which can increase the complexity dramatically; not to mention that most features are actually irrelevant, and nearest neighbor is not good at deal with the irrelevant features
  5. Explain the mechanism of SVM.
  6. SVM is a binary classification model which intends to find the line or hyperplane with the maximum-margin to separate data

Chapter #8 Preview

【Chapter Summary】

Chapter 8, “Learning without a teacher”, presents the current difficulties faced by an audacious inquiry: what if a “baby robot” had to be taught as a human being? It is shown, through daily life examples of a person and the decisions he/she makes, which is the “type” of algorithm that the brain uses to perform such actions and the analogies to the algorithms presented in the previous chapters. Each tribe shows some traces in the brain, therefore the brain is the master algorithm, which is just yet to be found.


【Important Sections】

  • Intro:
  •    Researchers develop machine learning algorithms based on the theories of child learning in cognitive science.
  • Putting together birds of a feather:
  •    The author introduces clustering, an algorithm that would group similar objects, such as Expectation-Maximization(EM) algorithm and k-means.
  • Discovering the shape of the data:
  •    PCA and Isomap are two algorithms for dimensionality reduction.
  • The hedonistic robot:
  •    The author talks about reinforcement learning, a technique that relies on response of the environment to various actions by the learner.
  • Practice makes perfect:
  •    Chunking is a preeminent learning algorithm inspired by psychology.
  • Learning to relate:
  •    The chapter ends with the author explaining another potentially killer algorithm - relational learning.


【Key Concepts】

  • Cluster 聚类
  •    k-means algorithm k均值算法
  •    Naïve Bayes 朴素贝叶斯
  •    Expectation-Maximization(EM) algorithm 最大期望算法
  • Dimensionality reduction 降维
  •    Principal-component analysis(PCA) 主成分分析
  •    Isomap 等距特征映射
  • Reinforcement Learning 强化学习
  •    The Law of Effect 效果律
  • Chunking
  •    A power law
  •    Soar
  • Relational Learning 关系学习
  •    Markov network 马尔可夫网络


【Quiz】

  1. Give two applications of Markov models and k-means algorithm respectively.
  2. Describe the situation when Isomap algorithm performs the best for dimensionality reduction.
  3. Reinforcement learning with generalization often fails to settle on a stable solution, why?
  4. What is the difference between clustering and chunking?
入门支持向量机k近邻算法强化学习理论深度研学社
暂无评论
暂无评论~