python数据分析-pandas模块基础知识

2022-08-01,,,,

呀~博主是正在学习数据分析的一员,记录的是自己学习过程中总结的知识点,肯定有不完善的地方,如有问题可以私聊我改正,共同学习进步。希望大家都能保持学习的热情,坚持自己,不断超越自己!
博客地址:qxi的博客

还是可以先预习下前面的知识点耶:
pandas基础知识(1)
pandas基础知识(2)
pandas基础知识(3)
pandas基础知识(4)
pandas基础知识(5)
pandas基础知识(6)
#这一篇接着上一篇讲DataFrame的合并,利用的是merge()函数#

  1. merge()函数

①merge()函数中的on=’key’,代表基于哪个列索引值把两个DataFrame合并起来,key指的是列索引值,先看只有一个key的合并。

import pandas as pd
import numpy as np
left=pd.DataFrame({'key':['K0','K1','K2','K3'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
right=pd.DataFrame({'key':['K0','K1','K2','K3'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(left)
print(right)
res=pd.merge(left,right,on='key') #基于索引key合并
print(res)

运行结果:

  key   A   B
0  K0  A0  B0
1  K1  A1  B1
2  K2  A2  B2
3  K3  A3  B3
  key   C   D
0  K0  C0  D0
1  K1  C1  D1
2  K2  C2  D2
3  K3  C3  D3
  key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K1  A1  B1  C1  D1
2  K2  A2  B2  C2  D2
3  K3  A3  B3  C3  D3

②merge()函数中的how=''
how='inner'合并的是两个df相同key1,key2的部分,类似于取交集,不定义how时就是默认how=‘inner’,比如例子中相同的是key1=0,key2=0以及key1=1,key2=0;
how='outer'合并的是两个df中关于key1,key2的全部,类似于取并集,具体看例子。

import pandas as pd
import numpy as np
df1=pd.DataFrame({'key1':['K0','K0','K1','K2'],'key2':['K0','K1','K0','K1'],'A':['A0','A1','A2','A3'],'B':['B0','B1','B2','B3']})
df2=pd.DataFrame({'key1':['K0','K1','K1','K2'],'key2':['K0','K0','K0','K0'],'C':['C0','C1','C2','C3'],'D':['D0','D1','D2','D3']})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='inner') #默认inner
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='outer') #默认inner
print(res2)

运行结果:

  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3
  key1 key2   C   D
0   K0   K0  C0  D0
1   K1   K0  C1  D1
2   K1   K0  C2  D2
3   K2   K0  C3  D3
  key1 key2   A   B   C   D
0   K0   K0  A0  B0  C0  D0
1   K1   K0  A2  B2  C1  D1
2   K1   K0  A2  B2  C2  D2  #只合并相同的部分
  key1 key2    A    B    C    D
0   K0   K0   A0   B0   C0   D0
1   K0   K1   A1   B1  NaN  NaN
2   K1   K0   A2   B2   C1   D1
3   K1   K0   A2   B2   C2   D2
4   K2   K1   A3   B3  NaN  NaN
5   K2   K0  NaN  NaN   C3   D3  #都会合并,没有的用nan值填充

how='left’基于左边的df1合并,df1中所有内容会显示,df2只出现跟它关联部分相同的部分,比如这里right中第3行不显示(由于key1=K2,key2=K0在df2中并没有);how='right’则是基于右边的df2合并

print(df1)
print(df2)
res1=pd.merge(df1,df2,on=['key1','key2'],how='left')
print(res1)
res2=pd.merge(df1,df2,on=['key1','key2'],how='right')
print(res2)

运行结果:

  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3  #df1
  key1 key2   C   D
0   K0   K0  C0  D0
1   K1   K0  C1  D1
2   K1   K0  C2  D2
3   K2   K0  C3  D3  #df2
  key1 key2   A   B    C    D
0   K0   K0  A0  B0   C0   D0
1   K0   K1  A1  B1  NaN  NaN
2   K1   K0  A2  B2   C1   D1
3   K1   K0  A2  B2   C2   D2
4   K2   K1  A3  B3  NaN  NaN  #基于df1进行合并
  key1 key2    A    B   C   D
0   K0   K0   A0   B0  C0  D0
1   K1   K0   A2   B2  C1  D1
2   K1   K0   A2   B2  C2  D2
3   K2   K0  NaN  NaN  C3  D3  #基于df2表进行合并

④merge()函数中indicator=True时是用来显示合并情况的

import pandas as pd
import numpy as np
df1=pd.DataFrame({'col1':[0,1],'col_left':['a','b']})
df2=pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
print(df1)
print(df2)
res1=pd.merge(df1,df2,on='col1',how='outer',indicator=True) #用来显示哪个df没有
print(res1)

运行结果:

   col1 col_left
0     0        a
1     1        b
   col1  col_right
0     1          2
1     2          2
2     2          2
   col1 col_left  col_right      _merge
0     0        a        NaN   left_only #left有,right没有
1     1        b        2.0        both
2     2      NaN        2.0  right_only
3     2      NaN        2.0  right_only

left_index以及right_index代表的是基于行索引进行合并,如果都为True的话代表的是基于两个df的行索引进行合并,再加个outer则是并集,inner是交集。

import pandas as pd
import numpy as np
left=pd.DataFrame({'A':['A0','A1','A2'],'B':['B0','B1','B2']},index=['K0','K1','K2'])
right=pd.DataFrame({'C':['C0','C2','C3'],'D':['D0','D2','D3']},index=['K0','K2','K3'])
print(left)
print(right)
res1=pd.merge(left,right,left_index=True,right_index=True,how='outer') 
print(res1)
res2=pd.merge(left,right,left_index=True,right_index=True,how='inner') 
print(res2)

运行结果:

   A   B
K0  A0  B0
K1  A1  B1
K2  A2  B2
     C   D
K0  C0  D0
K2  C2  D2
K3  C3  D3
      A    B    C    D
K0   A0   B0   C0   D0
K1   A1   B1  NaN  NaN
K2   A2   B2   C2   D2
K3  NaN  NaN   C3   D3  #都显示出来,并集
     A   B   C   D
K0  A0  B0  C0  D0
K2  A2  B2  C2  D2  #取都有的行索引K0,K2

⑥merge()函数中定义suffixes是对含有相同列索引`进行命名,具体看例子

import pandas as pd
import numpy as np
boys=pd.DataFrame({'k':['K0','K1','K2'],'age':[1,2,3]})
girls=pd.DataFrame({'k':['K0','K0','K3'],'age':[4,5,6]})
print(boys)
print(girls)
res=pd.merge(boys,girls,on='k',how="inner")
print(res)
res=pd.merge(boys,girls,on='k',suffixes=['_boy','_girls'],how="inner")
print(res)

运行结果:

    k  age
0  K0    1
1  K1    2
2  K2    3
    k  age
0  K0    4
1  K0    5
2  K3    6
    k  age_x  age_y
0  K0      1      4
1  K0      1      5      #自动命名为age_x,age_y
    k  age_boy  age_girls
0  K0        1          4
1  K0        1          5    #自定义命名

关于DataFrame的合并就总结完啦,差不多就是这些内容了,如果对你有帮助的话记得点赞,收藏,关注~

本文地址:https://blog.csdn.net/hswqxi/article/details/107430832

《python数据分析-pandas模块基础知识.doc》

下载本文的Word格式文档,以方便收藏与打印。