Auto Byte

Science AI

# 研学社 · 入门组 | 《终极算法》前两章总结及第三章学习

## 第一、二章总结

The Master Algorithm 之于算法就像是手之于笔、刀剑、螺丝刀和叉子。本书作者简要介绍了机器学习的五大流派：

• 符号主义（Symbolists）

• 联结主义（Connectionists）

• 进化主义（Evolutionaries）

• 贝叶斯派（Bayesians）

• Analogizers

## 第三章预习

no free lunch 定理：

• 在机器学习领域，预先确立的观念是不可或缺的，我们的目标是找到能够通过读取数据继续编写自身的最简单的算法

• 机器学习领域的一个典型策略是从有限制的假设开始，然后在它们不能解释数据时逐渐放松它们

• 我们也将遇到本书中的第一个真正的学习器（learner）

• 两种典型学习算法的示例

• 过拟合问题以及几种可能的解决方法

• 为了避免过拟合所应该遵循的原则

• 「偏置（bias）」和「方差（variance）」的概念

• 「推导（deduction）」和「归纳（induction）」的基本概念

Twenty Questions 游戏：

• 决策树的基本概念

• 机器学习符号主义流派的基本哲学思想总结

• 休谟问题

• 过拟合

• 偏置和方差

• 推导和归纳

• 决策树

1. 什么是过拟合？

2. 列出本章中提及的三种用于解决/改善过拟合的方法。

3. 什么是归纳？请用一个例子进行解释。

4. 构建一个基于你自己的案例的决策树。

## Chapter #1-2 Review

### 【Chapter Summary】

Machine learning is notably multi-faced and goes by a variety of names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems etc. In these two introductory chapters, you would start to get familiar with some commonly used terminologies in the fields, which are categorized by applications. Some notable trends are highlighted here: finance (predicting stock ups and downs), mining corporate databases (customer relationship management, credit scoring, and fraud detection), and e-commerce (personalization ).

The Master Algorithm to algorithms is what the hand is to pens, swords, screwdrivers and forks. The author briefly introduces five tribes in machine learning:

• Symbolists

• Connectionists

• Evolutionaries

• Bayesians

• Analogizers

The number of technical terms in machine learning is significant and nearly uncountable, so you may feel overwhelmed at the very beginning. However, in most of the cases, several jargons and a few algorithms are sufficient to understand the key idea of the vast majority of applications. The author will guide us to take a closer and more detailed look at each of machine learning tribes in the following chapters. A considerable indicator of the power of a theory is the extent of simplification that the theory could achieve when it is used to describe and model the real world. Can we do good enough? Firstly, we would never have enough data to completely determine the world. Secondly, even if we had the complete knowledge of the world at some point in time, the laws of physics would still not allow us to determine its past and future.

## Chapter #3 Preview

### 【Chapter Summary】

In order to understand the Symbolism , we have to know what the deduction is and why it is so important. “The Master Algorithm” should be able to start with a large body of knowledge, and use it to guide new generalizations from data. The “divide and conquer” rule induction algorithm can’t do it, but the rule of induction can.

### 【Important Sections】

• The “no free lunch” theorem:

• In machine learning, preconceived notions are indispensable. Our goal is to find the simplest program that will continue to write itself by reading data.

• Priming the knowledge pump:

• A typical strategy in machine learning is starting with restrictive assumptions and gradually relaxing them if they fail to explain the data.

• We also encounter the first actual learner in the book.

• How to rule the world:

• Examples of two typical types of learning algorithms

• Between blindness and hallucination:

• The problem of overfitting, and several possible methods to solve it.

• Accuracy you can believe in:

• Principles to follow in order to avoid overfitting

• Concepts of “bias” and “variance”

• Induction is the inverse of deduction:

• Basic concepts of “deduction” and “induction”

• A game of twenty questions:

• Basic concepts of a decision tree

• The symbolists:

• Summarization of fundamental philosophy of symbolist school of machine learning

### 【Key Concepts】

• Hume's Problem

• Overfitting

• Bias & variance

• Deduction & Induction

• Decision tree

### 【Quiz】

1. What is overfitting?

2. List three ways to solve/improve overfitting mentioned in this chapter.

3. What is induction? Explain with an example.

4. Build a decision tree based on your own example.