利用Pandas读取远程mysql数据库

利用pymysql与Pandas读取远程数据库上的表格,并存储为本地csv文件。

因为业务需要读取服务器上的数据进行分析,能连接到sql服务器,但是当我企图用mysqldump备份(复制)database到本地却经常出现mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table错误,这个需要改net_write_timeout来增加等待时长来解决,苦于我的账号并没有权限,难不成只好之间上sql用odbs大法直接做数据分析?我的天,那得多麻烦,经过几番思考,我的解决方案是:用sql的python接口包pymysql进行sql登录以及操作,再用Pandas以DataFrame格式接受table数据,最后存为csv文件。

登录模块

注意,若数据库含有中文,需要use_unicode=True避免乱码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# -*_coding:utf8-*-
# Created by frank at 07/04/2017

import pymysql

host = '#########'
port = ####
user = '########'
password = '#######'

def get_database():
"""

:return: return the pre-connect database
"""

connection = pymysql.connect(host=host,
user=user,
password=password,
port=port,
charset='utf8mb4',
use_unicode=True,)
return connection

处理模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# -*_coding:utf8-*-
# Created by frank at 07/04/2017
# 连接数据库
from connection import get_database
import pandas as pd

# 执行sql命令,打印返回结果
def execute(sql_command):
try:
with connection.cursor() as cursor:
cursor.execute(sql_command)
result = cursor.fetchall()
print("COMMAND: ", sql_command)
print("RESULT: ", result)
print()
dic[sql_command] = result
except:
print("Command %s failed!" % sql_command)
print()

# 建立连接
connection = get_database()
connection.commit()

# 存放命令与对应结果,便于之后调试
dic = {}

# SQL 查询语句
sql_commands = [
# "SHOW DATABASES;",
"use wind_data;",
# "show tables;",
# "SHOW COLUMNS FROM absdescription;",
# "SHOW COLUMNS FROM absdescription2;",
# "SHOW COLUMNS FROM ashareagency;"
# "select * from absdescription;",
# "select count(*) from absdescription;"
# "SHOW COLUMNS FROM xcashflow;",
]

for sql_command in sql_commands:
execute(sql_command)

# 将所有table名字存于list
tables = []
l = list(dic["show tables;"])
for i in l:
tables.append(i[0])

# 取得所有table,分别存放于csv文件
for table in tables:
# table = 'absdescription'
sql_cmd = "select * from %s" % table

df = pd.read_sql(sql=sql_cmd, con=connection)

df.to_csv("data/%s.csv" % table)

# 断开连接
connection.close()

参考自