Skip to content

How to merge multiple excel files using Python

Merge multiple excel files using python

To work with Ms Excel and Python seamlessly we should know how to merge multiple excel files using Python . In this tutorial we will see how to merge two excel files using Python . For this reason we will take two .xlsx file (named demo1.xlsx and demo2.xlsx) which will look like below –

Now we will merge data of demo2.xlsx after data of demo1.xlsx . We will do this in three ways –

Merge multiple excel files using Python by merge() method

import pandas as pd

# Reading the demo1.xlsx
df1=pd.read_excel("demo1.xlsx")

# reading the demo2.xlsx
df2=pd.read_excel("demo2.xlsx")

# merging df2 with df1
df3=df1.merge(df2)  

# creating a new excel file and save the data
df3.to_excel("demo3.xlsx",index=False)

In the first way we will load both the file in the program as pandas dataframe as df1 and df2 . Then we will create df3 from df1 and df2 using merge() method. After that we will create a new excel file demo3.xlsx and put all the data of df3 into that . Below is the image of demo3.xlsx –

Merge multiple excel files using Python by pd.concat() method

import pandas as pd

# Reading the demo1.xlsx
df1=pd.read_excel("demo1.xlsx")

# reading the demo2.xlsx
df2=pd.read_excel("demo2.xlsx")

# merging df2 with df1
df3=pd.concat([df1,df2.iloc[:,1:]],axis=1)

# creating a new excel file and save the data
df3.to_excel("demo3.xlsx",index=False)

In the first way we will load both the file in the program as pandas dataframe as df1 and df2 . Then we will create df3 from df1 and df2 using pd.concat() method. After that we will create a new excel file demo3.xlsx and put all the data of df3 into that . Below is the image of demo3.xlsx –

Merge three excel files using Python by pd.concat() method

import pandas as pd

# Reading the demo1.xlsx
df1=pd.read_excel("demo1.xlsx")

# reading the demo2.xlsx
df2=pd.read_excel("demo2.xlsx")

# reading the demo3.xlsx
df3=pd.read_excel("demo3.xlsx")

# appending df1 , df2 and df3
df4=pd.concat([df1,df2.iloc[:,1:],df3.iloc[:,1:]],axis=1)

# creating a new excel file and save the data
df4.to_excel("demo4.xlsx",index=False)

But if we don’t want to create a new excel file and want to get the merged data in any of the mentioned excel file that is demo1.xlsx or demo2.xlsx , we should check the third way . In this case we will use openpyxl module of Python .

Merge multiple excel files using Python openpyxl

import pandas as pd

# reading the demo2.xlsx file and deleting the row labels
df=pd.read_excel("demo2.xlsx")
df=df.iloc[:,1:]

# merging the data
with pd.ExcelWriter("demo1.xlsx",mode="a",engine="openpyxl",if_sheet_exists="overlay") as writer:
    df.to_excel(writer, sheet_name="Sheet1",columns=None, startcol=writer.sheets["Sheet1"].max_column,index=False)
  • In the above code first we have imported pandas library and read the demo2.xlsx file as df .
  • Then we need to create an ExcelWriter object which will open the demo1.xlsx file in append mode (mode="a") . We have used openpyxl as engine and overlay as if_sheet_exists . By default value of if_sheet_exists is error (which will give ValueError if the sheet exists) . Other values are new (will create a new sheet , name will be determined by the engine) , replace (will delete the old content before writing to the sheet) and overlay (will append the data in the existing sheet without removing the old data)
  • After that by to_excel() method we have merged data of demo2.xlsx after the data of demo1.xlsx in the demo1.xlsx file . Here we have used writer.sheets["Sheet1"].max_column as startcol which will automatically calculate the no of column of previous content .

We can also see the image of new demo1.xlsx for reference –

Note

We also need to install openpyxl to use the above example .To install openpyxl open PowerShell window , type the below code and press Enter key –

pip install openpyxl

Version of pandas should be greater than or equal to 1.4.0 for the above example .

Merge three excel files using Python openpyxl

import pandas as pd

# reading the demo2.xlsx file and deleting the row labels
df2=pd.read_excel("demo2.xlsx")
df2=df2.iloc[:,1:]

# reading the demo3.xlsx file and deleting the row labels
df3=pd.read_excel("demo3.xlsx")
df3=df3.iloc[:,1:]

# merging the data
with pd.ExcelWriter("demo1.xlsx",mode="a",engine="openpyxl",if_sheet_exists="overlay") as writer:
    for df in [df2,df3]:
        df.to_excel(writer, sheet_name="Sheet1",columns=None, startcol=writer.sheets["Sheet1"].max_column,index=False)

Conclusion

merge() method is dedicated method by pandas to merge two excel files but we can’t merge multiple excel files using merge() method . pd.concat() method is a general method , using this method we can merge any number of excel file into a single excel file . Using openpyxl module also we can merge any number of excel file into a single excel file .

So , it is on us which method we should use depending on the situation .

Thank you for reading this Article . If You enjoy it Please Share the article . If you want to say something Please Comment .

Leave a Reply