Each Answer to this Q is separated by one/two green lines.
I want to append (merge) all the csv files in a folder using Python pandas.
For example: Say folder has two csv files
test2.csv as follows:
A_Id P_Id CN1 CN2 CN3 AAA 111 702 709 740 BBB 222 1727 1734 1778
A_Id P_Id CN1 CN2 CN3 CCC 333 710 750 750 DDD 444 180 734 778
So the python script I wrote was as follows:
#!/usr/bin/python import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("testfolder/*.csv"): df = pd.read_csv(f) all_data = all_data.append(df) all_data.to_csv('testfolder/combined.csv')
combined.csv seems to have all the appended rows, it looks as follows:
CN1 CN2 CN3 A_Id P_Id 0 710 750 750 CCC 333 1 180 734 778 DDD 444 0 702 709 740 AAA 111 1 1727 1734 1778 BBB 222
Where as it should look like this:
A_ID P_Id CN1 CN2 CN2 AAA 111 702 709 740 BBB 222 1727 1734 1778 CCC 333 110 356 123 DDD 444 220 256 223
- Why are the first two columns moved to the end?
- Why is it appending in the first line rather than at the last line?
What am I missing? And how can I get get of 0s and 1s in the first column?
P.S: Since these are large csv files, I thought of using pandas.
Try this …..
all_data = all_data.append(df)[df.columns.tolist()]
Starting from version 0.23.0, you can prevent the
append() method to sort the final appended DataFrame. In your case:
all_data = all_data.append(df, sort=False)
I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:
#!/usr/bin/python import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("testfolder/*.csv"): df = pd.read_csv(f) all_data = all_data.append(df) all_data = all_data[df.columns] all_data.to_csv('testfolder/combined.csv')
Since your issue was from almost two years ago, I’m posting solution which worked for me for enyone else who will also face similar issue.
You can use reindex to change to the original order:
all_data = all_data.append(df) all_data = all_data.reindex(df.columns, axis=1)
I saw this here (more details in the link): https://github.com/pandas-dev/pandas/issues/4588#issuecomment-44421883
I tweaked the code as below. Comments in-line.
#!/usr/bin/python import pandas as pd import glob # Grab all the csv files in the folder to a list. fileList = glob.glob('input_folder/*.csv') #Initialize an empty dataframe to grab the csv content. all_data = pd.DataFrame() #Initialize an empty list to grab the dataframes. dfList=  for files in fileList: df = pd.read_csv(files, index_col = None, header= False) dfList.append(df) #The frames will be in reverse order i.e last read file's content in the begining. So reverse it again Reversed_dfList = dfList[::-1] CombinedFrame = pd.concat(Reversed_dfList) # The "Combined.csv" file will have combination of all the files. CombinedFrame.to_csv('output_folder/Combined.csv', index=False)
df = df.append(pd.DataFrame(Added_rows).T)
all_data = all_data.append(df, ignore_index=True)
Put ignore_index=True so the order will not change.
By default is
False so you need to change that.