I want to append (merge) all the csv files in a folder using Python pandas.

For example: Say folder has two csv files test1.csv and test2.csv as follows:

A_Id    P_Id    CN1         CN2         CN3
AAA     111     702         709         740
BBB     222     1727        1734        1778

and

A_Id    P_Id    CN1         CN2         CN3
CCC     333     710        750          750
DDD     444     180        734          778

So the python script I wrote was as follows:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)

all_data.to_csv('testfolder/combined.csv')

Though the combined.csv seems to have all the appended rows, it looks as follows:

      CN1       CN2         CN3    A_Id    P_Id
  0   710      750         750     CCC     333
  1   180       734         778     DDD     444     
  0   702       709         740     AAA     111
  1  1727       1734        1778    BBB     222

Where as it should look like this:

A_ID   P_Id   CN1    CN2    CN2
AAA    111    702    709    740
BBB    222    1727   1734   1778
CCC    333    110    356    123
DDD    444    220    256    223
  • Why are the first two columns moved to the end?
  • Why is it appending in the first line rather than at the last line?

What am I missing? And how can I get get of 0s and 1s in the first column?

P.S: Since these are large csv files, I thought of using pandas.

Try this …..

all_data = all_data.append(df)[df.columns.tolist()]

Starting from version 0.23.0, you can prevent the append() method to sort the final appended DataFrame. In your case:

all_data = all_data.append(df, sort=False)

I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)
    all_data = all_data[df.columns]

all_data.to_csv('testfolder/combined.csv') 

Since your issue was from almost two years ago, I’m posting solution which worked for me for enyone else who will also face similar issue.

You can use reindex to change to the original order:

all_data = all_data.append(df)
all_data = all_data.reindex(df.columns, axis=1)

I saw this here (more details in the link): https://github.com/pandas-dev/pandas/issues/4588#issuecomment-44421883

I tweaked the code as below. Comments in-line.

#!/usr/bin/python
import pandas as pd
import glob

# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')

#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()

#Initialize an empty list to grab the dataframes.
dfList= []

for files in  fileList:
    df =  pd.read_csv(files, index_col = None, header= False)
    dfList.append(df)

#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList =  dfList[::-1]
CombinedFrame =  pd.concat(Reversed_dfList)

# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)

Please try:

df = df.append(pd.DataFrame(Added_rows).T)

all_data = all_data.append(df, ignore_index=True)

Put ignore_index=True so the order will not change.
By default is False so you need to change that.