Python库使用简介

基于Python3。仅记录我自己最常用的部分,并列出更多参考链接

Python标准库

Awesome Python

Introduction

以下是Python常用库的示意图。

PythonLib.png

模块导入方法

安装格式:(Python安装包工具有easy_install、pip、setuptools、distribute)

1
2
3
4
//以pip为例,在cmd命令行输入
pip install 安装库名称 //安装
pip install -U 库名称 //更新
pip uninstall 库名称 //卸载

(如果直接使用Anaconda集成环境,则一般的库会被自动安装

调用格式:(More

1
2
3
4
5
6
7
#在.py文件中输入,类似C语言的#include<…>
import sys #引入1个库
import sys as ss #引入的同时取一个别名
import matplotlib.pyplot #引入子库
import os, sys, time #同时引入多个库
from os import path, walk, unlink #从……导入……功能
from os import * #导入库中所有内容

import库是有时间和空间成本的,斟酌而行。

STL

sys

1
import sys

参考资料:

python之sys模块详解

  • sys.argv: 实现从程序外部向程序传递参数
  • sys.exit([arg]): 程序中间的退出,arg=0为正常退出。
  • sys.executable: 程序执行器路径(比如C:/Anaconda/python.exe)
  • sys.getdefaultencoding(): 获取系统当前编码,一般默认为ascii
  • sys.setdefaultencoding(): 设置系统默认编码
    • 执行dir(sys)时不会看到这个方法,在解释器中执行不通过
    • 设置UTF-8:可以先执行reload(sys),在执行 setdefaultencoding('utf8'),此时将系统默认编码设置为utf8。(见设置系统默认编码 )
  • sys.getfilesystemencoding(): 获取文件系统使用编码方式,Windows下返回’mbcs’,mac下返回’utf-8’.
  • sys.path: 获取指定模块搜索路径的字符串集合,可以将写好的模块放在得到的某个路径下,就可以在程序中import时正确找到。
  • sys.platform: 获取当前系统平台。
  • sys.stdin,sys.stdout,sys.stderr: stdin , stdout , 以及stderr 变量包含与标准I/O 流对应的流对象. 如果需要更好地控制输出,而print 不能满足你的要求, 它们就是你所需要的. 你也可以替换它们, 这时候你就可以重定向输出和输入到其它设备( device ), 或者以非标准的方式处理它们

time

1
import time as ti

参考资料:

http://www.runoob.com/python/python-date-time.html

时间处理包含多个模块:

time.time(): 时间戳。(自从1970年1月1日午夜)

time.localtime(time.time()): 将时间戳转化为本地时间

time.asctime( time.localtime(time.time()) ): 格式化

os

1
import os

参考资料:

http://www.runoob.com/python/os-file-methods.html

os 模块提供了非常丰富的方法用来处理文件和目录。详见参考网址。

Numpy

1
import numpy as np

参考资料:

Numpy中文文档。(More

如何系统地学习Python 中 matplotlib, numpy, scipy, pandas?

中文 Python 笔记

Numpy教程

jupyter notebook导入python code

jupyter code和markdown转换(含常用快捷键)

Numpy 的应用范围

  • 机器学习模型
    • 矩阵计算,训练数据存储,模型调参。
  • 图像处理和计算机图形学
    • 快速处理图像(镜像图像、按特定角度旋转图像等)。
  • 数学任务
    • 数值积分、微分、内插、外推等。

张量生成

NumPy的主要对象是同类型的多维数组。它是一张表,所有元素(通常是数字)的类型都相同,并通过正整数元组索引。在NumPy中,维度称为轴(axis)。轴的数目为rank。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import numpy as np #引入

print("Numpy定义数组....................")
#Numpy定义数组
my_array = np.array([1, 2, 3, 4, 5]) #传入参数必须是一个数字列表

print("打印方式....................")
#打印方式
print(my_array) #打印数组:[1 2 3 4 5]
print(my_array.shape) #输出数组的形状:(5,)
print(my_array[0]) #打印数组的某个元素:1
print(my_array[1]) #:2

print("修改数组元素....................")
#修改数组元素
my_array[0] = -1
print(my_array) #:[-1 2 3 4 5]

print("快速创建向量....................")
#快速创建向量
my_new_array = np.zeros((5)) #类似的,np.ones
np.ones( (2,3,4), dtype=np.int16 ) #指定数据类型的全1向量
np.arange( 10, 30, 5 ) #均匀分布 in [10, 30] 指定间距为5
b = np.arange(6) # :[0 1 2 3 4 5]
x = np.linspace( 0, 2, 9 ) # 均匀分布(指定元素数) 9 numbers int [0, 2]
f = np.sin(x) #可以很方便地为调用pyplot画图做准备
my_random_array = np.random.random((2, 3)) #随机向量

print("二维向量....................")
#二维向量
my_2d_array = np.zeros((2, 3))
my_array = np.array([[4, 5], [6, 1]]) #自行设置元素
print(my_array),print(my_array[0][1]) #:[[4 5] [6 1]] and 5(0行1列)
my_array_column_2 = my_array[:, 1] #提取子矩阵(第1列元素)

print("张量....................")
#张量
my_array = np.zeros(2) #1阶张量-矢量
print(my_array)
my_array = np.zeros((2, 3)) #2阶张量-矩阵
print(my_array)
my_array = np.zeros((2, 3, 4)) #3阶张量
print(my_array)
my_array = np.zeros((2, 3, 4, 5)) #4阶张量
print(my_array)
#......直至n阶张量

NumPy的数组类被称为ndarray。别名为 array。 请注意,numpy.array 与标准Python库类 array.array 不同,后者仅处理一维数组并提供较少的功能。 ndarray 对象则提供更关键的属性:

  • ndarray.ndim:数组的轴(维度)的个数。在Python世界中,维度的数量被称为rank。
  • ndarray.shape:数组的维度(就是形状)。这是一个整数的元组,表示每个维度中数组的大小。对于有n行和m列的矩阵,shape将是(n,m)。因此,shape元组的长度就是rank或维度的个数 ndim
  • ndarray.size:数组元素的总数。这等于shape的元素的乘积
  • ndarray.dtype:一个描述数组中元素类型的对象。可以使用标准的Python类型创建或指定dtype。另外NumPy提供它自己的类型。例如numpy.int32、numpy.int16和numpy.float64。
  • ndarray.itemsize:数组中每个元素的字节大小。例如,元素为 float64 类型的数组的 itemsize 为8(=64/8),而 complex32 类型的数组的 itemsize 为4(=32/8)。它等于 ndarray.dtype.itemsize
  • ndarray.data:该缓冲区包含数组的实际元素。通常,我们不需要使用此属性,因为我们将使用索引访问数组中的元素。

矩阵操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import numpy as np #引入

#矩阵运算
a = np.array([[1.0, 2.0], [3.0, 4.0]]) #2X2矩阵
b = np.array([[5.0, 6.0], [7.0, 8.0]]) #同型矩阵
sum = a + b #加
a += 1 #自加
difference = a - b #减
product = a * b #逐元素乘
a *= 2 #自乘
quotient = a / b #逐元素除
matrix_product = a.dot(b) #矩阵乘法
#or matrix_product = np.dot(a, b)

#矩阵变形
v = np.transpose(np.array([[2,1,3]])) #矩阵转置
b = np.arange(12).reshape(4,3) #返回整形后的矩阵
c = np.arange(24).reshape(2,3,4)
b.resize(2,6) #修改b数组本身
#广播(矩阵的自动匹配)
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b #广播,b被自动展成[2.0, 2.0, 2.0]

#通用函数
#NumPy提供了常见的数学函数,如sin,cos和exp。
np.exp(np.arange(3))
a = np.ones((3,4))
b = np.ones((3,4))
np.add(a, b)
b = np.arange(12).reshape(3,4)
b.sum(axis=0) #指定轴向的操作,这是在0号维度(竖着)上进行加法压缩
b.min(axis=1) # min of each row ,返回值仍然是一个行向量
b.cumsum(axis=1) # cumulative sum along each row
data = 10*np.random.random((3,4))
a = np.around(data) #四舍五入
a = np.floor(data) #上取整
a = np.ceil(data) #下取整
a = np.where(data>0.5,data,0) #逻辑过滤

#解线性方程组
A = np.array([[2,1,-2],[3,0,1],[1,1,-1]])
b = np.transpose(np.array([[-3,5,-2]]))
#x = np.linalg.solve(A,b)
#线性回归。原理是正规方程,这个变换下不用显性求逆
X = np.random.random((3,4))
y = np.transpose(np.array([[3,2,5]]))
Xt = np.transpose(X)
XtX = np.dot(Xt,X)
Xty = np.dot(Xt,y)
beta = np.linalg.solve(XtX,Xty)

#索引、切片和迭代
a = np.arange(10)**3 #**是指数符号,相当于^
a[2:5] #里面的数字就是索引。区间就是切片:array([ 8, 27, 64])
a[:6:2] = -1000 #迭代赋值,2为步长,区间[0,6)。相当于a[0:6:2]
# 注——对于:冒号语法,默认的区间都是前闭后开![a,b)
a[ : :-1] # reversed a
for element in a.flat:
print(element) #flat属性是数组中所有元素的迭代器

Matplotlib

1
2
3
4
5
import numpy as np
import matplot.pyplot as plt
# -------------------------------------------
# 与上两行效果类似。导入 matplotlib 的所有内容(nympy 可以用 np 这个名字来使用)
from pylab import *

参考资料:

https://matplotlib.org/

Pyplot tutorial

Matplot绘图

Matplotlib 教程

pylab 是 matplotlib 面向对象绘图库的一个接口。它的语法和 Matlab 十分相近。也就是说,它主要的绘图命令和 Matlab 对应的命令有相似的参数。(Matlab和Octave语法有几乎一致,所以可以参考这里

绘制函数 plot()​

1
2
3
4
5
6
7
8
9
10
import numpy as np
import matplotlib.pyplot as plt

X = np.linspace(-np.pi, np.pi, 256, endpoint=True)
C,S = np.cos(X), np.sin(X)

plt.plot(X,C)
plt.plot(X,S)

plt.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 导入 matplotlib 的所有内容(nympy 可以用 np 这个名字来使用)
from pylab import *

# 创建一个 8 * 6 点(point)的图,并设置分辨率为 80
figure(figsize=(8,6), dpi=80)

# 创建一个新的 1 * 1 的子图,接下来的图样绘制在其中的第 1 块(也是唯一的一块)
subplot(1,1,1)

X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
C,S = np.cos(X), np.sin(X)

# 绘制余弦曲线,使用蓝色的、连续的、宽度为 1 (像素)、标签为cosine的线条
plot(X, C, color="blue", linewidth=1.0, linestyle="-", label="cosine")
# 绘制正弦曲线,使用绿色的、连续的、宽度为 1 (像素)、标签为sine的线条
plot(X, S, color="green", linewidth=1.0, linestyle="-", label="sine")
# 标签位置
legend(loc='upper left')

# 设置横轴、纵轴的区间范围
xlim(-4.0,4.0)
ylim(-1.0,1.0)

# 设置横轴、纵轴记号(比如自然坐标,对数坐标)
xticks(np.linspace(-4,4,9,endpoint=True))
yticks(np.linspace(-1,1,5,endpoint=True))

# 以分辨率 72 来保存图片
# savefig("exercice_2.png",dpi=72)

# 在屏幕上显示
show()

1550072130566

还有给一些特殊点做注释,坐标轴记号调整等功能。详见参考资料Matplotlib 教程

注色图 fill_between()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt

n = 256
X = np.linspace(-np.pi,np.pi,n,endpoint=True)
Y = np.sin(2*X)

plt.axes([0.025,0.025,0.95,0.95]) #调整图框位置和大小

plt.plot (X, Y+1, color='blue', alpha=1.00)
plt.fill_between(X, 1, Y+1, color='blue', alpha=.25) #填充颜色

plt.plot (X, Y-1, color='blue', alpha=1.00)
plt.fill_between(X, -1, Y-1, (Y-1) > -1, color='blue', alpha=.25)
plt.fill_between(X, -1, Y-1, (Y-1) < -1, color='red', alpha=.25)

plt.xlim(-np.pi,np.pi), plt.xticks([])
plt.ylim(-2.5,2.5), plt.yticks([])
# savefig('../figures/plot_ex.png',dpi=48)
plt.show()

1550071563283

直方图 bar()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
import matplotlib.pyplot as plt

n = 12
X = np.arange(n)
Y1 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)
Y2 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)

plt.axes([0.025,0.025,0.95,0.95])
plt.bar(X, +Y1, facecolor='#9999ff', edgecolor='white') #直方图
plt.bar(X, -Y2, facecolor='#ff9999', edgecolor='white')

#添加数值标签
for x,y in zip(X,Y1):
plt.text(x+0.4, y+0.05, '%.2f' % y, ha='center', va= 'bottom')

for x,y in zip(X,Y2):
plt.text(x+0.4, -y-0.05, '%.2f' % y, ha='center', va= 'top')

plt.xlim(-.5,n), plt.xticks([])
plt.ylim(-1.25,+1.25), plt.yticks([])

# savefig('../figures/bar_ex.png', dpi=48)
plt.show()

1550071890743

饼状图 pie()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import numpy as np
import matplotlib.pyplot as plt

n = 20
Z = np.ones(n)
Z[-1] *= 2

plt.axes([0.025, 0.025, 0.95, 0.95])

plt.pie(Z, explode=Z*.05, colors=['%f' % (i/float(n)) for i in range(n)],
wedgeprops={"linewidth": 1, "edgecolor": "black"})
plt.gca().set_aspect('equal')
plt.xticks([]), plt.yticks([])

# savefig('../figures/pie_ex.png',dpi=48)
plt.show()

1550072096440

散点图 scatter()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np
import matplotlib.pyplot as plt

n = 1024
X = np.random.normal(0,1,n)
Y = np.random.normal(0,1,n)
T = np.arctan2(Y,X)

plt.axes([0.025,0.025,0.95,0.95])
plt.scatter(X,Y, s=75, c=T, alpha=.5)

plt.xlim(-1.5,1.5), plt.xticks([])
plt.ylim(-1.5,1.5), plt.yticks([])
# savefig('../figures/scatter_ex.png',dpi=48)
plt.show()

1550071815044

灰度图 imshow()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import matplotlib.pyplot as plt

def f(x,y):
return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2)

n = 10
x = np.linspace(-3,3,3.5*n)
y = np.linspace(-3,3,3.0*n)
X,Y = np.meshgrid(x,y)
Z = f(X,Y)

plt.axes([0.025,0.025,0.95,0.95])
plt.imshow(Z,interpolation='bicubic', cmap='bone', origin='lower')
plt.colorbar(shrink=.92)

plt.xticks([]), plt.yticks([])
# savefig('../figures/imshow_ex.png', dpi=48)
plt.show()

1550072202364

3D 图*

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = Axes3D(fig)
X = np.arange(-4, 4, 0.25)
Y = np.arange(-4, 4, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=plt.cm.hot)
ax.contourf(X, Y, Z, zdir='z', offset=-2, cmap=plt.cm.hot)
ax.set_zlim(-2,2)

# savefig('../figures/plot3d_ex.png',dpi=48)
plt.show()

1550072514021

等高线图 contourf()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt

def f(x,y):
return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2)

n = 256
x = np.linspace(-3,3,n)
y = np.linspace(-3,3,n)
X,Y = np.meshgrid(x,y)

plt.axes([0.025,0.025,0.95,0.95])

plt.contourf(X, Y, f(X,Y), 8, alpha=.75, cmap=plt.cm.hot)
C = plt.contour(X, Y, f(X,Y), 8, colors='black', linewidth=.5)
plt.clabel(C, inline=1, fontsize=10)

plt.xticks([]), plt.yticks([])
# savefig('../figures/contour_ex.png',dpi=48)
plt.show()

1550072753197

向量场图 quiver()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import matplotlib.pyplot as plt

n = 8
X,Y = np.mgrid[0:n,0:n]
T = np.arctan2(Y-n/2.0, X-n/2.0)
R = 10+np.sqrt((Y-n/2.0)**2+(X-n/2.0)**2)
U,V = R*np.cos(T), R*np.sin(T)

plt.axes([0.025,0.025,0.95,0.95])
plt.quiver(X,Y,U,V,R, alpha=.5)
plt.quiver(X,Y,U,V, edgecolor='k', facecolor='None', linewidth=.5)

plt.xlim(-1,n), plt.xticks([])
plt.ylim(-1,n), plt.yticks([])

# savefig('../figures/quiver_ex.png',dpi=48)
plt.show()

1550072292240

网格 grid()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt

ax = plt.axes([0.025,0.025,0.95,0.95])

ax.set_xlim(0,4)
ax.set_ylim(0,3)
ax.xaxis.set_major_locator(plt.MultipleLocator(1.0))
ax.xaxis.set_minor_locator(plt.MultipleLocator(0.1))
ax.yaxis.set_major_locator(plt.MultipleLocator(1.0))
ax.yaxis.set_minor_locator(plt.MultipleLocator(0.1))
ax.grid(which='major', axis='x', linewidth=0.75, linestyle='-', color='0.75')
ax.grid(which='minor', axis='x', linewidth=0.25, linestyle='-', color='0.75')
ax.grid(which='major', axis='y', linewidth=0.75, linestyle='-', color='0.75')
ax.grid(which='minor', axis='y', linewidth=0.25, linestyle='-', color='0.75')
ax.set_xticklabels([])
ax.set_yticklabels([])

# savefig('../figures/grid_ex.png',dpi=48)
plt.show()

1550072329530

多重网格

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
fig.subplots_adjust(bottom=0.025, left=0.025, top = 0.975, right=0.975)

plt.subplot(2,1,1)
plt.xticks([]), plt.yticks([])

plt.subplot(2,3,4)
plt.xticks([]), plt.yticks([])

plt.subplot(2,3,5)
plt.xticks([]), plt.yticks([])

plt.subplot(2,3,6)
plt.xticks([]), plt.yticks([])

# plt.savefig('../figures/multiplot_ex.png',dpi=48)
plt.show()

1550072407064

极轴图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import matplotlib.pyplot as plt

ax = plt.axes([0.025,0.025,0.95,0.95], polar=True)

N = 20
theta = np.arange(0.0, 2*np.pi, 2*np.pi/N)
radii = 10*np.random.rand(N)
width = np.pi/4*np.random.rand(N)
bars = plt.bar(theta, radii, width=width, bottom=0.0)

for r,bar in zip(radii, bars):
bar.set_facecolor( plt.cm.jet(r/10.))
bar.set_alpha(0.5)

ax.set_xticklabels([])
ax.set_yticklabels([])
# savefig('../figures/polar_ex.png',dpi=48)
plt.show()

1550072479855

文字图 text()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import numpy as np
import matplotlib.pyplot as plt

eqs = []
eqs.append((r"$W^{3\beta}_{\delta_1 \rho_1 \sigma_2} = U^{3\beta}_{\delta_1 \rho_1} + \frac{1}{8 \pi 2} \int^{\alpha_2}_{\alpha_2} d \alpha^\prime_2 \left[\frac{ U^{2\beta}_{\delta_1 \rho_1} - \alpha^\prime_2U^{1\beta}_{\rho_1 \sigma_2} }{U^{0\beta}_{\rho_1 \sigma_2}}\right]$"))
eqs.append((r"$\frac{d\rho}{d t} + \rho \vec{v}\cdot\nabla\vec{v} = -\nabla p + \mu\nabla^2 \vec{v} + \rho \vec{g}$"))
eqs.append((r"$\int_{-\infty}^\infty e^{-x^2}dx=\sqrt{\pi}$"))
eqs.append((r"$E = mc^2 = \sqrt{{m_0}^2c^4 + p^2c^2}$"))
eqs.append((r"$F_G = G\frac{m_1m_2}{r^2}$"))


plt.axes([0.025,0.025,0.95,0.95])

for i in range(24):
index = np.random.randint(0,len(eqs))
eq = eqs[index]
size = np.random.uniform(12,32)
x,y = np.random.uniform(0,1,2)
alpha = np.random.uniform(0.25,.75)
plt.text(x, y, eq, ha='center', va='center', color="#11557c", alpha=alpha,
transform=plt.gca().transAxes, fontsize=size, clip_on=True)

plt.xticks([]), plt.yticks([])
# savefig('../figures/text_ex.png',dpi=48)
plt.show()

1550072562451

Scipy

1
import scipy as sp

参考资料:
Scipy-Lecture-Notes中文版

[ Scipy中文文档 ] 一篇文章快速入门Scipy教程

科学计算。包括统计,优化,整合,线性代数模块,傅里叶变换,信号和图像处理,常微分方程求解器等等。
| 模块 | 任务 |
| —————————————————————————————— | ————————— |
| scipy.cluster | 向量计算 / Kmeans |
| scipy.constants | 物理和数学常量 |
| scipy.fftpack | 傅里叶变换 |
| scipy.integrate | 积分程序 |
| scipy.interpolate | 插值 |
| scipy.io | 数据输入和输出 |
| scipy.linalg | 线性代数程序 |
| scipy.ndimage | n-维图像包 |
| scipy.odr | 正交距离回归 |
| scipy.optimize | 优化 |
| scipy.signal | 信号处理 |
| scipy.sparse | 稀疏矩阵 |
| scipy.spatial | 空间数据结构和算法 |
| scipy.special | 一些特殊数学函数 |
| scipy.stats | 统计 |

(待续)

Pandas

1
import pandas as pd

参考资料:

Pandas 中文文档

Pandas Cheat Sheet - Dataquest

Pandas-CheatPDF),Pandas速查手册中文版- 知乎

Github-Pandas

Pandassheet

Main Features

Here are just a few of the things that pandas does well:

  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
  • Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
  • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
  • Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
  • Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
  • Intuitive merging and joining data sets
  • Flexible reshaping and pivoting of data sets
  • Hierarchical labeling of axes (possible to have multiple labels per tick)
  • Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format
  • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

总结:数据预处理,数据流/IO管理,鲁棒群操作时间序列处理等。

pandas-cheat-sheet.png

Importing Data

pd.read_csv(filename) | From a CSV file(读入训练数据)
pd.read_table(filename) | From a delimited text file (like TSV)
pd.read_excel(filename) | From an Excel file
pd.read_sql(query, connection_object) | Read from a SQL table/database
pd.read_json(json_string) | Read from a JSON formatted string, URL or file.
pd.read_html(url) | Parses an html URL, string or file and extracts tables to a list of dataframes
pd.read_clipboard() | Takes the contents of your clipboard and passes it to read_table()
pd.DataFrame(dict) | From a dict, keys for columns names, values for data as lists

Exporting Data

df.to_csv(filename) | Write to a CSV file(输出csv结果)
df.to_excel(filename) | Write to an Excel file
df.to_sql(table_name, connection_object) | Write to a SQL table
df.to_json(filename) | Write to a file in JSON format(保存模型)

Create Test Objects

Useful for testing code segements

pd.DataFrame(np.random.rand(20,5)) | 5 columns and 20 rows of random floats
pd.Series(my_list) | Create a series from an iterable my_list
df.index = pd.date_range('1900/1/30', periods=df.shape[0]) | Add a date index

Viewing/Inspecting Data

df.head(n) | First n rows of the DataFrame
df.tail(n) | Last n rows of the DataFrame
df.shape | Number of rows and columns
df.info() | Index, Datatype and Memory information
df.describe() | Summary statistics for numerical columns
s.value_counts(dropna=False) | View unique values and counts
df.apply(pd.Series.value_counts) | Unique values and counts for all columns

Selection

df[col] | Returns column with label col as Series
df[[col1, col2]] | Returns columns as a new DataFrame
s.iloc[0] | Selection by position
s.loc['index_one'] | Selection by index
df.iloc[0,:] | First row
df.iloc[0,0] | First element of first column

Data Cleaning

df.columns = ['a','b','c'] | Rename columns
pd.isnull() | Checks for null Values, Returns Boolean Arrray
pd.notnull() | Opposite of pd.isnull()
df.dropna() | Drop all rows that contain null values
df.dropna(axis=1) | Drop all columns that contain null values
df.dropna(axis=1,thresh=n) | Drop all rows have have less than n non null values
df.fillna(x) | Replace all null values with x
s.fillna(s.mean()) | Replace all null values with the mean (mean can be replaced with almost any function from the statistics section)
s.astype(float) | Convert the datatype of the series to float
s.replace(1,'one') | Replace all values equal to 1 with 'one'
s.replace([1,3],['one','three']) | Replace all 1 with 'one' and 3 with 'three'
df.rename(columns=lambda x: x + 1) | Mass renaming of columns
df.rename(columns={'old_name': 'new_ name'}) | Selective renaming
df.set_index('column_one') | Change the index
df.rename(index=lambda x: x + 1) | Mass renaming of index

Filter, Sort, and Groupby

df[df[col] > 0.5] | Rows where the column col is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] | Rows where 0.7 > col > 0.5
df.sort_values(col1) | Sort values by col1 in ascending order
df.sort_values(col2,ascending=False) | Sort values by col2 in descending order
df.sort_values([col1,col2],ascending=[True,False]) | Sort values by col1 in ascending order then col2 in descending order
df.groupby(col) | Returns a groupby object for values from one column
df.groupby([col1,col2]) | Returns groupby object for values from multiple columns
df.groupby(col1)[col2] | Returns the mean of the values in col2, grouped by the values in col1 (mean can be replaced with almost any function from the statistics section)
df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) | Create a pivot table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean) | Find the average across all columns for every unique col1 group
df.apply(np.mean) | Apply the function np.mean() across each column
nf.apply(np.max,axis=1) | Apply the function np.max() across each row

Join/Combine

df1.append(df2) | Add the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2],axis=1) | Add the columns in df1 to the end of df2 (rows should be identical)
df1.join(df2,on=col1,how='inner') | SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. how can be one of 'left', 'right', 'outer', `’inner’

Statistics

These can all be applied to a series as well.

df.describe() | Summary statistics for numerical columns
df.mean() | Returns the mean of all columns
df.corr() | Returns the correlation between columns in a DataFrame
df.count() | Returns the number of non-null values in each DataFrame column
df.max() | Returns the highest value in each column
df.min() | Returns the lowest value in each column
df.median() | Returns the median of each column
df.std() | Returns the standard deviation of each column

Sklearn (scikit-learn)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#回归
from sklearn.linear_model import LinearRegression #线性回归
from sklearn.tree import DecisionTreeRegressor #决策树回归
from sklearn.ensemble import RandomForestRegressor #随机森林回归
#分类
from sklearn.linear_model import LogisticRegression #逻辑回归
from sklearn.svm import SVC, LinearSVC #支持向量机
from sklearn.ensemble import RandomForestClassifier #随机森林分类器
from sklearn.neighbors import KNeighborsClassifier #K近邻
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron #感知机
from sklearn.linear_model import SGDClassifier #随机梯度下降分类器
from sklearn.tree import DecisionTreeClassifier #决策树分类器
#特征工程
import sklearn.preprocessing as preprocessing

参考:Ducumentation中文文档API 参考

Source: yhat

sklearnsheet

(待续)

TensorFlow

1
import tensorflow as tf

参考TensorFlow文档。(More

Tensor是Google开源的深度学习框架,如其名“张量流”,即以处理张量形式的数据流见长。

(待续)

Keras

1
form tensorflow import keras

参考: Keras 中文文档

Source: datacamp

Keras_Cheat_Sheet_Python

(待续)

Pytorch

参考: PyTorch中文文档

(待续)

OpenCV

参考资料:

OpenCV-Python中文文档

(待续)

Scrapy

参考资料:

小白进阶之Scrapy第一篇

Scrapy 入门教程| 菜鸟教程

Scrapy入门教程— Scrapy 0.24.6 文档

(待续)

Pyspider

参考资料:

pyspider: Introduction

Pyspider操作指南

(待续)

0%