moving averge 滑动平均

moving averge 即滑动平均,时间序列处理中常见的方法,简单来说,就是对于一个给定数列,设定一个窗口值N,依次取第1项第N项,第2项第N+1项,第3项~第N+2项的平均值,以此类推。

数据来自铁路客运量.csv(2005-2016月度数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import matplotlib.pyplot as plt
import pandas as pd
import requests
import io
import numpy as np
import pylab
pylab.style.use('bmh')
from pylab import rcParams
rcParams['figure.figsize'] = 10, 8

def moving_average(l, N):
sum = 0
result = list( 0 for x in l)

for i in range( 0, N ):
# 从左到右逐渐添加index在N之内的数字
sum = sum + l[i]
result[i] = sum / (i+1)

for i in range( N, len(l) ):
# 加入最右边数字减去最左边数字
sum = sum - l[i-N] + l[i]
result[i] = sum / N

return result

# 使用效率更高的numpy
# http://stackoverflow.com/questions/13728392/moving-average-or-running-mean
def fast_moving_average(x, N):
return np.convolve(x, np.ones((N,))/N)[(N-1):]

url = '铁路客运量.csv'


df = pd.read_csv(url) # python2使用StringIO.StringIO

data = np.array(df['铁路客运量_当期值(万人)'])

dic = {}
for i in [3,5,10,20]:
ma_data = moving_average(data, i)
dic[i] = ma_data
ma_data_df = pd.DataFrame(dic)

ma_data_df.plot()

可以看到,趋势逐渐变得平滑,即对局部震荡不敏感。

download

使用numpy.convolve是一种更方便的方法,值得注意的是其有三种mode,分别是’full’(单个重叠也计算), ‘same’(强制等长), ‘valid’(完全重叠),

1
2
3
4
5
6
7
8
9
10
11
def fast_moving_average(x, N, mode):
# return np.convolve(x, np.ones((N,))/N, mode='valid')[(N-1):]
return np.convolve(x, np.ones((N,))/N, mode=mode)
dic = {}
modes = ['full', 'same', 'valid']
i = 10
for mode in modes:
ma_data = fast_moving_average(data, i, mode)
pylab.plot(ma_data)
pylab.legend(modes)

download -1-

参考自斗大熊的博客MovingAverage-滑动平均 – WTF Daily Blog