Auto Byte

Science AI

# 手把手：用Python搭建机器学习模型预测黄金价格

GLD 是最大的以黄金进行直接投资的ETF交易基金。

（详见：http://www.etf.com/GLD）

• 导入Python库并读取黄金ETF 的数据

• 定义解释变量

• 将数据切分为模型训练数据集和测试数据集

• 建立线性回归模型

• 预测黄金ETF的价格

### 导入Python库并读取黄金 ETF 的数据

```# LinearRegression is a machine learning library for linear regression

from sklearn.linear_model import LinearRegression

# pandas and numpy are used for data manipulation

import pandas as pd

import numpy as np

# matplotlib and seaborn are used for plotting graphs

import matplotlib.pyplot as plt

import seaborn

# fix_yahoo_finance is used to fetch data import fix_yahoo_finance as yf```

```# Read data

# Only keep close columns

Df=Df[['Close']]

# Drop rows with missing values

Df= Df.dropna()

# Plot the closing price of GLD

Df.Close.plot(figsize=(10,5))

plt.ylabel("Gold ETF Prices")

plt.show()```

### 定义解释变量

```Df['S_3'] = Df['Close'].shift(1).rolling(window=3).mean()

Df['S_9']= Df['Close'].shift(1).rolling(window=9).mean()

Df= Df.dropna()

X = Df[['S_3','S_9']]

### 定义因变量

```y = Df['Close']

### 输出

```2008-02-08    91.000000

2008-02-11    91.330002

2008-02-12    89.330002

2008-02-13    89.440002

2008-02-14    89.709999

Name: Close, dtype: float64```

• 前80%的数据用于训练模型，其余的数据用来测试模型。

• X_train 和y_train是训练数据集。

• X_test & y_test是测试数据集。

```t=.8

t = int(t*len(Df))

# Train dataset

X_train = X[:t]

y_train = y[:t]

# Test dataset

X_test = X[t:]

y_test = y[t:]```

### 建立线性回归模型

`Y = m1 * X1 + m2 * X2 + CGold ETF price = m1 * 3 days moving average + m2 * 15 days moving average + c`

```linear = LinearRegression().fit(X_train,y_train)

print "Gold ETF Price =", round(linear.coef_[0],2), \

"* 3 Days Moving Average", round(linear.coef_[1],2), \

"* 9 Days Moving Average +", round(linear.intercept_,2)```

### 预测黄金ETF的价格

```predicted_price = linear.predict(X_test)

predicted_price = pd.DataFrame(predicted_price,index=y_test.index,columns = ['price'])

predicted_price.plot(figsize=(10,5))

y_test.plot()

plt.legend(['predicted_price','actual_price'])

plt.ylabel("Gold ETF Price")

plt.show()```

### 输出

```r2_score = linear.score(X[t:],y[t:])*100

float("{0:.2f}".format(r2_score))```