Auto Byte

Science AI

# 从七桥问题开始：全面介绍图论及其应用

## 七桥问题

Leonhard Euler

• 有限无向图 G(V,E) 的 Euler 路径是指 G 的每一个边都只出现一次的路径。如果 G 有一条 Euler 路径，它就被称之 Euler 图。[注释 1]

• 定理：有且仅有两个确定的节点存在奇数自由度，其它的节点都有偶数自由度，那么该有限无向图为 Euler 图。【1】

## 图表征：前言

``````struct BinTreeNode
{
T value; // don't bother with template<>
TreeNode* left;
TreeNode* right;
};
class BinTree
{
public:
// insert, remove, find, bla bla
private:
BinTreeNode* root_;
};``````

``````BinTreeNode<Apple>* root = new BinTreeNode<Apple>("Green");

root->left = new BinTreeNode<Apple>("Yellow");
root->right = new BinTreeNode<Apple>("Yellow 2");

BinTreeNode<Apple>* yellow_2 = root->right;

yellow_2->left = new BinTreeNode<Apple>("Almost red");
yellow_2->right = new BinTreeNode<Apple>("Red");
``````

### Airbnb

Airbnb 房源搜索一瞥：

``````// feel free to reorganize this struct to avoid redundant space
// usage because of aligning factor
// Remark 1: some of the properties could be expressed as enums,
// bitset is chosen for as multi-value enum holder.
// Remark 2: for most of the count values the maximum is 16
// Remark 3: price value considered as integer,
// int considered as 4 byte.
// Remark 4: neighborhoods property omitted
// Remark 5: to avoid spatial queries, we're
// using only country code and city name, at this point won't consider
// the actual coordinates (latitude and longitude)
struct AirbnbHome
{
wstring name; // wide string
uint price;
uchar rating;
uint rating_count;
vector<string> photos; // list of photo URLs
string host_id;
uchar children_number; // max is 5
uchar infants_number; // max is 5
bitset<3> home_type;
uchar beds_number;
uchar bedrooms_number;
uchar bathrooms_number;
bitset<21> accessibility;
bool superhost;
bitset<20> amenities;
bitset<6> facilities;
bitset<34> property_types;
bitset<32> host_languages;
bitset<3> house_rules;
ushort country_code;
string city;
};
``````

``````bool HasWasher(AirbnbHome* h)
{
return h->amenities[2];
}
``````

``````const int KITCHEN = 0;
const int HEATING = 1;
const int WASHER = 2;
//...
bool HasWasher(AirbnbHome* h)
{
return (h != nullptr) && h->amenities[WASHER];
}

bool HasWasherAndKitchen(AirbnbHome* h)
{
return (h != nullptr) && h->amenities[WASHER] && h->amenities[KITCHEN];
}

bool HasAllAmenities(AirbnbHome* h, const std::vector<int>& amenities)
{
bool has = (h != nullptr);
for (const auto a : amenities) {
has &= h->amenities[a];
}
return has;
}
``````

``````// Note the comments
struct AirbnbHome
{
wstring name; // 200 bytes
uint price; // 4 bytes
uchar rating; // 1 byte
uint rating_count; // 4 bytes
vector<string> photos; // 100 bytes
string host_id; // 20 bytes
uchar children_number; // 1 byte
uchar infants_number; // 1 byte
bitset<3> home_type; // 4 bytes
uchar beds_number; // 1 byte
uchar bedrooms_number; // 1 byte
uchar bathrooms_number; // 1 byte
bitset<21> accessibility; // 4 bytes
bool superhost; // 1 byte
bitset<20> amenities; // 4 bytes
bitset<6> facilities; // 4 bytes
bitset<34> property_types; // 8 bytes
bitset<32> host_languages; // 4 bytes, correct me if I'm wrong
bitset<3> house_rules; // 4 bytes
ushort country_code; // 2 bytes
string city; // 50 bytes
};
```
```

（1）目标节点被发现；

（2）如果要找的值小于节点的值，则匹配将继续到节点的左子树；

（3）如果要找的值大于节点的值，则匹配将继续到节点的右子树。

N-ary 树更像是模拟一个图。

``````struct NTreeNode
{
T value;
vector<NTreeNode*> children;
};
``````

``````// almost pseudocode
class NTree
{
public:
void Insert(const T&);
void Remove(const T&);
// lines of omitted code
private:
NTreeNode* root_;
};``````

``````struct GraphNode
{
T value;
};
``````

``````class ConnectedGraph
{
public:
// API

private:
GraphNode* root_;
};
``````

``````class DisconnectedGraphOrJustAGraph
{
public:
// API

private:
std::vector<GraphNode*> all_roots_;
};
``````

``````// A representation of a graph with both vertex and edge tables
// Vertex table is a hashtable of edges (mapped by label)
// Edge table is a structure with 4 fields
// VELO = Vertex Edge Label Only (e.g. no vertex payloads)

class ConnectedVELOGraph {
public:
struct Edge {
Edge(const std::string& f, const std::string& t)
: from(f)
, to(t)
, used(false)
, next(nullptr)
{}
std::string ToString() {
return (from + " - " + to + " [used:" + (used ? "true" : "false") + "]");
}

std::string from;
std::string to;
bool used;
Edge* next;
};

ConnectedVELOGraph() {}
~ConnectedVELOGraph() {
vertices_.clear();
for (std::size_t ix = 0; ix < edges_.size(); ++ix) {
delete edges_[ix];
}
}

public:
void InsertEdge(const std::string& from, const std::string& to) {
Edge* e = new Edge(from, to);
InsertVertexEdge_(from, e);
InsertVertexEdge_(to, e);
edges_.push_back(e);
}

public:
void Print() {
for (auto elem : edges_) {
std::cout << elem->ToString() << std::endl;
}
}

std::vector<std::string> Trace(const std::string& v) {
std::vector<std::string> path;
Edge* e = vertices_[v];
while (e != nullptr) {
if (e->used) {
e = e->next;
} else {
e->used = true;
path.push_back(e->from + ":-:" + e->to);
e = vertices_[e->to];
}
}
return path;
}

private:
void InsertVertexEdge_(const std::string& label, Edge* e) {
if (vertices_.count(label) == 0) {
vertices_[label] = e;
} else {
vertices_[label]->next = e;
}
}

private:
std::unordered_map<std::string, Edge*> vertices_;
std::vector<Edge*> edges_;
};
``````

``````// 'author' represents the User object, at this point we are interested only in author.id
//
// 'tw' is a Tweet object, at this point we are interested only in 'tw.id'

void DeliverATweet(User* author, Tweet* tw)
{
// we assume that 'tw' object is already stored in a database

// 1. Get the list of user's followers (author of the tweet)
vector<User*> user_followers = GetUserFollowers(author->id);

// 2. insert tweet into each timeline
for (auto follower : user_followers) {
InsertTweetIntoUserTimeline(follower->id, tw->id);
}
}
```
```

``````// Warning: a bunch of pseudocode ahead

void RangeInsertIntoTimelines(vector<long> user_ids, long tweet_id)
{
for (auto id : user_ids) {
InsertIntoUserTimeline(id, tweet_id);
}
}

void DeliverATweet(User* author, Tweet* tw)
{
// we assume that 'tw' object is already stored in a database

// 1. Get the list of user's (tweet author's) followers's ids
vector<long> user_followers = GetUserFollowers(author->id);

// 2. Insert tweet into each timeline in parallel
const int CHUNK_SIZE = 4000; // saw this somewhere
for (each CHUNK_SIZE elements in user_followers) {
t.Run(RangeInsertIntoTimelines, current_chunk, tw->id);
}
}
```
```

Airbnb 过滤器节选

``````house_type: "entire_place",
price_range_start: 56,
price_range_end: 80,
beds_number: 2,
amenities: ["tv", "wifi", "laptop friendly workspace"],
facilities: ["gym"]
``````

## 图算法

• Coloring

• Hopcroft–Karp

• Hungarian

• Prüfer coding

• Tarjan's off-line least common ancestors

• Topological sort

Hopcroft-Karp 算法（以及很多其它算法）都基于 DFS（深度优先搜索）和 BFS（广度优先搜索）的图遍历算法。说实话，这里介绍 Hopcroft-Karp 算法的真正原因是逐渐转换到图遍历算法的讨论，相比从二值树开始讨论会更好。

``````// struct ListNode {
//   ListNode* next;
//   T item;
// };

void TraverseRecursive(ListNode* node) // starting node, most commonly the list 'head'
{
if (!node) return; // stop
std::cout << node->item;
TraverseRecursive(node->next); // recursive call
}

void TraverseIterative(ListNode* node)
{
while (node) {
std::cout << node->item;
node = node->next;
}
}
``````

• 输出节点值，然后到达左节点，再到达右节点。

• 到达左节点，然后输出节点值，再到达右节点。

• 到达左节点，然后到达右节点，再输出节点值。

``````// struct TreeNode {
//   T item;
//   TreeNode* left;
//   TreeNode* right;
// }

// you cann pass a callback function to do whatever you want to do with the node's value
// in this particular example we are just printing its value.

// node is the "starting point", basically the first call is done with the "root" node
void PreOrderTraverse(TreeNode* node)
{
if (!node) return; // stop
std::cout << node->item;
PreOrderTraverse(node->left); // do the same for the left sub-tree
PreOrderTraverse(node->right); // do the same for the right sub-tree
}

void InOrderTraverse(TreeNode* node)
{
if (!node) return; // stop
InOrderTraverse(node->left);
std::cout << node->item;
InOrderTraverse(node->right);
}

void PostOrderTraverse(TreeNode* node)
{
if (!node) return; // stop
PostOrderTraverse(node->left);
PostOrderTraverse(node->right);
std::cout << node->item;
}
```
```

### 实例：Netflix

「KISK」或「让我们保持它的简单」或「为了简单起见」，这是教程编写者从真实问题中抽象出来的超级借口，并通过在伪代码中引入「abc」简单示例及其解决方案来做出大量假设，并且这些答案在很老的笔记本电脑上也能工作。

Netflix 和亚马逊。Netflix 提供电影服务，亚马逊提供产品，我们会将它们命名为「物品」，所以每当你阅读「物品」时，都会想到 Netflix 中的电影或亚马逊的任何 [合格] 产品。这些物品最常用的是解析其标题和描述（我们只处理标题），所以如果一个操作员（通常是一个人通过管理仪表板将项目的数据插入 Netflix / Amazon 数据库）插入新项目到数据库中，它的标题正在被一些「ItemTitleProcessor」处理以产生关键字。

``````// Cached representation of an Item
// Full Item object (with title, description, comments etc.)
// could be fetched from the database
struct Item
{
// ID_TYPE is the type of Item's unique id, might be an integer, or a string
ID_TYPE id;
int rating;
};
``````

``````// this is a pseudocode, that's why I didn't bother with "const&"'s and "std::"'s
// though it could have look better, forgive me C++ fellows

vector<Item*> GetItemsByKeywordInSortedOrder(string keyword)
{
// assuming IndexTable is a big hashtable mapping keywords to Item BSTs
BST<Item*> items = IndexTable[keyword];

// suppose BST has a function InOrderProduceVector(), which creates a vector and
// inserts into it items fetched via in-order traversing the tree
vector<Item*> sorted_result = items.InOrderProduceVector();
return sorted_result;
}
``````

``````template <typename BlaBla>
class BST
{
public:
// other code ...
vector<BlaBla*> InOrderProduceVector()
{
vector<BlaBla*> result;
result.reserve(1000); // magic number, reserving a space to avoid reallocation on inserts
InOrderProduceVectorHelper_(root_, result); // passing vector by reference
return result;
}

protected:
// takes a reference to vector
void InOrderProduceVectorHelper_(BSTNode* node, vector<BlaBla*>& destination)
{
if (!node) return;
InOrderProduceVectorHelper_(node->left, destination);
destination.push_back(node->item);
InOrderProduceVectorHelper_(node->right, destination);
}

private:
BSTNode* root_;
};
``````

``````// Reminder: this is pseudocode, no bother with "const&", "std::" or others
// forgive me C++ fellows

template <typename BlaBla>
class BST
{
public:
// other code ...

vector<BlaBla*> ReverseInOrderProduceVector(int offset, int limit)
{
vector<BlaBla*> result;
result.reserve(limit);
// passing result vector by reference
// and passing offset and limit
ReverseInOrderProduceVectorHelper_(root_, result, offset, limit);
return result;
}

protected:
// takes a reference to vector
// skips 'offset' nodes and inserts up to 'limit' nodes
void ReverseInOrderProduceVectorHelper_(BSTNode* node, vector<BlaBla*>& destination, int offset, int limit)
{
if (!node) return;
if (limit == 0) return;
--offset; // skipping current element
ReverseInOrderProduceVectorHelper_(node->right, destination, offset, limit);
if (offset <= 0) { // if skipped enough, insert
destination.push_back(node->value);
--limit; // keep the count of insertions
}
ReverseInOrderProduceVectorHelper_(node->left, destination, offset, limit);
}

private:
BSTNode* root_;
};

// ... other possibly useful code

// this is a pseudocode, that's why I didn't bother with "const&"'s and "std::"'s
// though it could have look better, forgive me C++ fellows

vector<Item*> GetItemsByKeywordInSortedOrder(string keyword, offset, limit) // pagination using offset and limit
{
// assuming IndexTable is a big hashtable mapping keywords to Item BSTs
BST<Item*> items = IndexTable[keyword];

// suppose BST has a function InOrderProduceVector(), which creates a vector and
// inserts into it items fetched via reverse in-order traversing the tree
// to get items in descending order (starting from the highest rated item)
vector<Item*> sorted_result = items.ReverseInOrderProduceVector(offset, limit);
return sorted_result;
}
``````

## DFS vs. BFS

• 深度优先搜索——更多动作，更少思考。

• 广度优先搜索——在进一步行动之前先仔细观察四周。

DFS 很像前序/顺序/后序遍历，而 BFS 用于分层输出树节点。我们需要一个队列（数据结构）来存储图的「层级」，同时输出（访问）其「父级」。在之前的插图中，节点是队列中浅蓝色的点。每一层的节点被从队列中取走，同时在访问每个被取走的节点时，我们还应该将其子节点插入队列（为下一层做准备）。下列代码很简单，可以帮助大家了解 BFS。代码假设图是连通的，尽管我们可以修改代码，使其应用于非连通图。

``````// Assuming graph is connected
// and a graph node is defined by this structure
// struct GraphNode {
//   T item;
//   vector<GraphNode*> children;
// }

// WARNING: untested code
void BreadthFirstSearch(GraphNode* node) // start node
{
if (!node) return;
queue<GraphNode*> q;
q.push(node);
while (!q.empty()) {
GraphNode* cur = q.front(); // doesn't pop
q.pop();
for (auto child : cur->children) {
q.push(child);
}

// do what you want with current node
cout << cur->item;
}
}
``````

### 示例：Uber

Uber 有 5000 万用户、700 万司机，对于 Uber 来说，最重要的事情之一就是高效匹配司机和乘客。该问题首先是定位的问题。后端要处理数百万份用户请求，将每份请求发送至一或多（通常是多）位附近的司机。尽管将用户请求发送至所有附近司机更加简单，有时也更加智能，但是预处理通常会有所帮助。

1. 标记所有未访问节点。创建所有未访问节点的集合「unvisited set」。

2. 为每个节点分配一个实验距离值：初始节点的距离值设置为 0，其他节点设置为无穷大。将初始节点设置为当前节点。

3. 对于当前节点，考虑其所有未访问近邻，通过当前节点计算它们的实验距离。对比新计算的实验距离和当前分配的值，分配较小的值。例如，如果当前节点 A 的距离是 6，连接 A 与近邻 B 的边长度为 2，则经过 A 到 B 的距离是 6 + 2 = 8。如果 B 之前标记的距离大于 8，则将其更改为 8。反之，保留当前值。

4. 当我们考虑完当前节点的所有近邻之后，将当前节点标记为已访问，并从 unvisited set 中移除。已访问节点无需再检查。

5. 如果目标节点已经标记为已访问（当规划两个特定节点之间路线的时候），或 unvisited set 中节点之间的最小实验距离是无穷大（当规划完整遍历时，初始节点和其余未访问节点之间没有连接时），则停止，算法结束。

6. 反之，选择标记有最小实验距离的未访问节点，将其设置为新的「当前节点」，并返回第 3 步。