Hi,大家好,我是编程小6,很荣幸遇见你,我把这些年在开发过程中遇到的问题或想法写出来,今天说一说
python中数据合并_python数据提取和合并,希望能够帮助你!!!。
python中的merge函数与sql中的 join 用法非常类似,以下是merge( )函数中的参数:
merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
import pandas as pd df1=pd.DataFrame({'key':['a','b','a','b','b'],'value1':range(5)}) df2=pd.DataFrame({'key':['a','c','c','c','c'],'value2':range(5)}) display(df1,df2,pd.merge(df1,df2))
df1
key value1 0 a 0 1 b 1 2 a 2 3 b 3 4 b 4
df2
key value2 0 a 0 1 c 1 2 c 2 3 c 3 4 c 4
pd.merge(df1,df2) ##以df1、df2中相同的列名key进行连接,默认how='inner', pd.merge(df1,df2,on='key',how='inner')
key value1 value2 0 a 0 0 1 a 2 0
pd.merge(df1,df2,how='outer') ## 全连接,取并集
key value1 value2 0 a 0.0 0.0 1 a 2.0 0.0 2 b 1.0 NaN 3 b 3.0 NaN 4 b 4.0 NaN 5 c NaN 1.0 6 c NaN 2.0 7 c NaN 3.0 8 c NaN 4.0
pd.merge(df1,df2,how='left') ### 左连接,左边取全部,右边取部分,没有值则用NaN填充
key value1 value2 0 a 0 0.0 1 b 1 NaN 2 a 2 0.0 3 b 3 NaN 4 b 4 NaN
pd.merge(df1,df2,how='right') ### 右连接,右边取全部,左边取部分,没有值则用NaN填充
key value1 value2 0 a 0.0 0 1 a 2.0 0 2 c NaN 1 3 c NaN 2 4 c NaN 3 5 c NaN 4
如果两个DataFrame的左右连接键的列名不一样,可以用left_on,right_on来进行指定
df3=pd.DataFrame({'lkey':['a','b','a','b','b'],'data1':range(5)}) df4=pd.DataFrame({'rkey':['a','c','c','c','c'],'data2':range(5)})
df3
lkey data1 0 a 0 1 b 1 2 a 2 3 b 3 4 b 4
df4
rkey data2 0 a 0 1 c 1 2 c 2 3 c 3 4 c 4
pd.merge(df3,df4,left_on='lkey',right_on='rkey') ### 内连接,默认how='inner'
lkey data1 rkey data2 0 a 0 a 0 1 a 2 a 0
pd.merge(df3,df4,left_on='lkey',right_on='lkey',how='outer') ### 全连接
lkey data1 rkey data2 0 a 0.0 a 0.0 1 a 2.0 a 0.0 2 b 1.0 NaN NaN 3 b 3.0 NaN NaN 4 b 4.0 NaN NaN 5 NaN NaN c 1.0 6 NaN NaN c 2.0 7 NaN NaN c 3.0 8 NaN NaN c 4.0
pd.merge(df3,df4,left_on='lkey',right_on='rkey',how='left') ### 左连接
lkey data1 rkey data2 0 a 0 a 0.0 1 b 1 NaN NaN 2 a 2 a 0.0 3 b 3 NaN NaN 4 b 4 NaN NaN
pd.merge(df3,df4,left_on='lkey',right_on='rkey',how='right') ### 右连接
lkey data1 rkey data2 0 a 0.0 a 0 1 a 2.0 a 0 2 NaN NaN c 1 3 NaN NaN c 2 4 NaN NaN c 3 5 NaN NaN c 4
df5=pd.DataFrame(np.arange(12).reshape(3,4),index=list('abc'),columns=['v1','v2','v3','v4']) df6=pd.DataFrame(np.arange(12,24,1).reshape(3,4),index=list('abd'),columns=['v5','v6','v7','v8'])
df5
v1 v2 v3 v4 a 0 1 2 3 b 4 5 6 7 c 8 9 10 11
df6
v5 v6 v7 v8 a 12 13 14 15 b 16 17 18 19 d 20 21 22 23
pd.merge(df5,df6,left_index=True,right_index=True)
v1 v2 v3 v4 v5 v6 v7 v8 a 0 1 2 3 12 13 14 15 b 4 5 6 7 16 17 18 19
今天的分享到此就结束了,感谢您的阅读,如果确实帮到您,您可以动动手指转发给其他人。