研学社 · 入门组 | 《终极算法》前两章总结及第三章学习

近些年,人工智能领域发生了飞跃性的突破,更使得许多科技领域的学生或工作者对这一领域产生了浓厚的兴趣。在入门人工智能的道路上,The Master Algorithm 可以说是必读书目之一,其重要性不需多言。作者 Pedro Domingos 看似只是粗略地介绍了机器学习领域的主流思想,然而几乎所有当今已出现的、或未出现的重要应用均有所提及。本书既适合初学者从更宏观的角度概览机器学习这一领域,又埋下无数伏笔,让有心人能够对特定技术问题进行深入学习,是一本不可多得的指导性入门书籍。诙谐幽默的行文风格也让阅读的过程充满趣味。

以这本书为载体,机器之心「人工智能研学社 · 入门组」近期将正式开班加入方式!我们邀请所有对人工智能、机器学习感兴趣的初学者加入我们,通过对 The Master Algorithm 的阅读与讨论,宏观、全面地了解人工智能的发展历史与技术原理。本文对该书的第一、二章进行了简单总结,并给出了第三章的总结提纲(中英文),文末还附有小测试,来挑战一下吧!

第一、二章总结

本章总结

机器学习是众所周知的多面手,有许多不同的名字,比如:模式识别、统计建模、数据挖掘、知识发现、预测分析、数据科学、自适应系统、自组织系统等等。在这两个导引章节中,你会开始熟悉一些该领域常用的术语,都是按应用分类的。这里给出了一些重点趋势:金融(预测股票涨跌)、挖掘企业数据库(客户关系管理、信用评分和欺诈检测)和电子商务(个性化)。

The Master Algorithm 之于算法就像是手之于笔、刀剑、螺丝刀和叉子。本书作者简要介绍了机器学习的五大流派:

  • 符号主义(Symbolists)

  • 联结主义(Connectionists)

  • 进化主义(Evolutionaries)

  • 贝叶斯派(Bayesians)

  • Analogizers

7.png

机器学习领域内的技术术语不可胜计,所以一开始的时候你可能会感到无所适从。但是在大多数情况下,一些专业术语和几个算法就能帮助你理解绝大多数应用的关键思想。在后续的章节中,本书将带领我们更近距离更细致地了解每种机器学习流派。当一种理论被用于描述和建模真实世界时所能达到的简单程度可以被用作该理论能力的一种指示。我们能做到足够好吗?首先,我们无法获得用来完全确定这个世界的足够数据。其次,即便我们有关于这个世界在某个时间点的所有知识,物理定律也让我们无法确定其过去和未来。

第三章预习

本章总结

为了理解符号主义(Symbolism),我们必须先理解什么是推导(deduction)以及其为什么如此重要。The Master Algorithm 应该有能力事先就能掌握大量知识,并使用这种知识来引导新的数据泛化。分治法(divide and conquer)规则归纳算法不能做到这一点,但归纳法规则(rule of induction)可以。

重要章节
no free lunch 定理:

  • 在机器学习领域,预先确立的观念是不可或缺的,我们的目标是找到能够通过读取数据继续编写自身的最简单的算法

启动知识引擎:

  • 机器学习领域的一个典型策略是从有限制的假设开始,然后在它们不能解释数据时逐渐放松它们

  • 我们也将遇到本书中的第一个真正的学习器(learner)

如何寻找世界的规则:

  • 两种典型学习算法的示例

在盲目和幻觉之间:

  • 过拟合问题以及几种可能的解决方法

你可以相信的准确度:

  • 为了避免过拟合所应该遵循的原则

  • 「偏置(bias)」和「方差(variance)」的概念

归纳是推导的反面:

  • 「推导(deduction)」和「归纳(induction)」的基本概念

Twenty Questions 游戏:

  • 决策树的基本概念

符号主义:

  • 机器学习符号主义流派的基本哲学思想总结

关键概念

  • 休谟问题

  • 过拟合

  • 偏置和方差

  • 推导和归纳

  • 决策树

小测验

  1. 什么是过拟合?

  2. 列出本章中提及的三种用于解决/改善过拟合的方法。

  3. 什么是归纳?请用一个例子进行解释。

  4. 构建一个基于你自己的案例的决策树。


Chapter #1-2 Review

【Chapter Summary】

Machine learning is notably multi-faced and goes by a variety of names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems etc. In these two introductory chapters, you would start to get familiar with some commonly used terminologies in the fields, which are categorized by applications. Some notable trends are highlighted here: finance (predicting stock ups and downs), mining corporate databases (customer relationship management, credit scoring, and fraud detection), and e-commerce (personalization ).

The Master Algorithm to algorithms is what the hand is to pens, swords, screwdrivers and forks. The author briefly introduces five tribes in machine learning:

  • Symbolists

  • Connectionists

  • Evolutionaries

  • Bayesians

  • Analogizers

7.png

The number of technical terms in machine learning is significant and nearly uncountable, so you may feel overwhelmed at the very beginning. However, in most of the cases, several jargons and a few algorithms are sufficient to understand the key idea of the vast majority of applications. The author will guide us to take a closer and more detailed look at each of machine learning tribes in the following chapters. A considerable indicator of the power of a theory is the extent of simplification that the theory could achieve when it is used to describe and model the real world. Can we do good enough? Firstly, we would never have enough data to completely determine the world. Secondly, even if we had the complete knowledge of the world at some point in time, the laws of physics would still not allow us to determine its past and future.

Chapter #3 Preview

【Chapter Summary】

In order to understand the Symbolism , we have to know what the deduction is and why it is so important. “The Master Algorithm” should be able to start with a large body of knowledge, and use it to guide new generalizations from data. The “divide and conquer” rule induction algorithm can’t do it, but the rule of induction can.

【Important Sections】

  • The “no free lunch” theorem:

  • In machine learning, preconceived notions are indispensable. Our goal is to find the simplest program that will continue to write itself by reading data.

  • Priming the knowledge pump:

  • A typical strategy in machine learning is starting with restrictive assumptions and gradually relaxing them if they fail to explain the data.

  • We also encounter the first actual learner in the book.

  • How to rule the world:

  • Examples of two typical types of learning algorithms

  • Between blindness and hallucination:

  • The problem of overfitting, and several possible methods to solve it.

  • Accuracy you can believe in:

  • Principles to follow in order to avoid overfitting

  • Concepts of “bias” and “variance”

  • Induction is the inverse of deduction:

  • Basic concepts of “deduction” and “induction”

  • A game of twenty questions:

  • Basic concepts of a decision tree

  • The symbolists:

  • Summarization of fundamental philosophy of symbolist school of machine learning

【Key Concepts】

  • Hume's Problem

  • Overfitting

  • Bias & variance

  • Deduction & Induction

  • Decision tree

【Quiz】

  1. What is overfitting?

  2. List three ways to solve/improve overfitting mentioned in this chapter.

  3. What is induction? Explain with an example.

  4. Build a decision tree based on your own example.

入门深度学习深度研学社入门
暂无评论
暂无评论~