Saving pandas dataframe as csv and overwrite existing file - python

I have always two dataframes which come from different directories with the same last four digits in their names. The filepaths are:
dir1 = "path/to/files1/"
dir2 = "path/to/files2/"
Then I use a loop to load and concatenate the dataframes which belong together to dataframe df.
# For each file in the first directory
for i in os.listdir(dir1):
# For each file in the second directory
for j in os.listdir(dir2):
# If the last 4 digits of filename match (ignoring file extension)
if i[-8:-4] == j[-8:-4]:
# Load CSVs into pandas
print(i[-12:-4] + ' CPU Analysis')
print('\n')
df1 = pd.read_csv(dir1 + i,delimiter=',')
df2 = pd.read_csv(dir2 + j,delimiter=';')
df = pd.concat([df1, df2])
What I now want to do is to store df in dir1 using the same filename as before, i.e. I want to overwrite the existing file in dir1 and save as csv.
So, I think I should use something like this at the end of the loop:
df.to_csv(dir1, i[:-4])
But I am not sure about this.

I think here is possible join values by +:
df = pd.concat([df1, df2])
df.to_csv(dir1 + i[:-4] + '.csv', index=False)
Or use f-strings:
df = pd.concat([df1, df2])
df.to_csv(f'{dir1}{i[:-4]}.csv', index=False)
But if need original extension use same path like for reading file:
df = pd.concat([df1, df2])
df.to_csv(dir1 + i, index=False)
df = pd.concat([df1, df2])
df.to_csv(f'{dir1}{i}', index=False)

Related

How do use python to iterate through a directory and delete specific columns from all csvs?

I have a directory with several csvs.
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
Each csv has the same below columns. Reprex below-
yes no maybe ofcourse
1 2 3 4
I want my script to iterate through all csvs in the folder and delete the columns maybe and ofcourse.
If glob provides you with file paths, you can do the following with pandas:
import pandas as pd
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
drop = ['maybe ', 'ofcourse']
for file in files:
df = pd.read_csv(file)
for col in drop:
if col in df:
df = df.drop(col, axis=1)
df.to_csv(file)
Alternatively if you want a cleaner way to not get KeyErrors from drop you can do this:
import pandas as pd
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
drop = ['maybe ', 'ofcourse']
for file in files:
df = pd.read_csv(file)
df = df.drop([c for c in drop if c in df], axis=1)
df.to_csv(file)
Do you mean by:
files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
for filename in files:
df = pd.read_csv(filename)
df = df.drop(['maybe ', 'ofcourse'], axis=1)
df.to_csv(filename)
This code will remove the maybe and ofcourse columns and save it back to the csv.
You can use panda to read csv file to a dataframe then use drop() to drop specific columns. something like below:
df = pd.read_csv(csv_filename)
df.drop(['maybe', 'ofcourse'], axis=1)
import pandas as pd
from glob import glob
files = glob(r'C:/Users/jj/Desktop/Bulk_Wav/*.csv')
for filename in files:
df = pd.read_csv(filename, sep='\t')
df.drop(['maybe', 'ofcourse'], axis=1, inplace=True)
df.to_csv(filename, sep='\t', index=False)
If the files look exactly like what you have there, then maybe something like this

Merge all dataframes together in a loop

I have several CSV files in a certain path. I would like to add them all together, I did this laboriously with a function and assigned the individual arrays to the dataframe.
Is there an option to do everything in a for loop?
So I don't have to do df1 = pd.read_csv(CSV_FILES[0] and frames = [df1, df2, df3, df4]?
As soon as I try to loop reading in a for, I get an error.
How can I improve this code by not referring to the individual arrays CSV_FILES[0] , but doing everything in a loop?
PATH = ''
def find_csv(path):
csv_files = []
print("Looking for files at ", path)
for file in Path(path).glob('*.csv'):
csv_files.append(str(file))
print("Found ", len(csv_files), " csv files")
return csv_files
CSV_FILES = find_csv(PATH)
df1 = pd.read_csv(CSV_FILES[0])
df2 = pd.read_csv(CSV_FILES[1])
df3 = pd.read_csv(CSV_FILES[2])
df4 = pd.read_csv(CSV_FILES[3])
frames = [df1, df2, df3, df4]
df = pd.concat(frames)
You can create list of DataFrames, change:
csv_files.append(str(file))
to:
csv_files.append(pd.read_csv(str(file)))
And then join them together:
df = pd.concat(CSV_FILES)

how to append .csv files in a folder ( side by side with 2 columns gap)

I am appending 10 CSV file in a folder, I want to merge them vertically not horizontally, how can I do that
this is the format I am looking for
[df1 df2 df3 ......... df10]
but Not like this
[
df1
df2
...
df10
]
The code
import pandas as pd
import glob
path = r' ' # use your path
all_files = glob.glob(path + "/*.csv")
li = []
this is the code i am trying
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame1 = pd.concat(li, axis=0, ignore_index=True)
frame1.to_csv('name of 'folder )
I don't want to just give code as that might have errors as I am a newbie in Python, but I would help with the step-by-step process of doing this. You need to:
Create ten dataframes with the ten csv files
Transpose them using the ".T" function(rows become columns and vice-versa). This would
be a lot easier if you just write a simple function for that.
Convert dataframes to csv
df.to_csv(file_name, sep=',') #Seperator can be anything you want
Merge them with glob or glob2
Convert to dataframe again
Transpose back using the .T function

to concat csv files having different dimensions

I have 10 csv files with different dimension so I want to concat it there are 10 csv files.
which looks like, so I want to concatenate it make one file but whenever I am doing it the format changes.
I want one csv file
dfs = glob.glob(path + '*.csv')
result = pd.concat([pd.read_csv(df,header=None) for df in dfs])
result.to_csv(path + 'merge.csv',header=None)
You may want to combine csv files horizontally. Use axis=1 , for example,
df1 = pd.read_csv('f1.txt')
df2 = pd.read_csv('f2.txt')
combined = pd.concat([df1, df2], axis=1)
combined.to_csv('merged_csv.csv')
This worked for me:
import pandas as pd
import os
os.chdir(path)
dfs = [pd.read_csv(f, parse_dates=[0])
for f in os.listdir(os.getcwd()) if f.endswith('csv')]
result_csv = pd.concat(dfs, axis=1)
result_csv.to_csv('result.csv')
You have to use .concat.
The code below only works for few Csv file (3 here)
df1 = pd.read_csv(r"address\1.csv",
index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"address\2.csv",
index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"address\3.csv",
index_col=[0], parse_dates=[0])
finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()
finaldf.to_csv('result.csv')
With code below you can concat infinite csv file that you want in the same directory :
import pandas as pd
import os
os.chdir(path)
dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
for f in os.listdir(os.getcwd()) if f.endswith('csv')]
result_csv = pd.concat(dfs, axis=1, join='inner').sort_index()
result_csv.to_csv('result.csv')
and it will save into result.csv in that same path that you used above.

Create a new dataframe out of dozens of df.sum() series

I have several pandas DataFrames of the same format, with five columns.
I would like to sum the values of each one of these dataframes using df.sum(). This will create a Series for each Dataframe, still with 5 columns.
My problem is how to take these Series, and create another Dataframe, one column being the filename, the other columns being the five columns above from df.sum()
import pandas as pd
import glob
batch_of_dataframes = glob.glob("*.txt")
newdf = []
for filename in batch_of_dataframes:
df = pd.read_csv(filename)
df['filename'] = str(filename)
df = df.sum()
newdf.append(df)
newdf = pd.concat(newdf, ignore_index=True)
This approach doesn't work unfortunately. 'df['filename'] = str(filename)' throws a TypeError, and the creating a new dataframe newdf doesn't parse correctly.
How would one do this correctly?
How do you take a number of pandas.Series objects and create a DataFrame?
Try in this order:
Create an empty list, say list_of_series.
For every file:
load into a data frame, then save the sum in a series s
add an element to s: s['filename'] = your_filename
append s to list_of_series
Finally, concatenate (and transpose if needed):
final_df = pd.concat(list_of_series, axis = 1).T
Code
Preparation:
l_df = [pd.DataFrame(np.random.rand(3,5), columns = list("ABCDE")) for _ in range(5)]
for i, df in enumerate(l_df):
df.to_csv(str(i)+'.txt', index = False)
Files *.txt are comma separated and contain headers.
! cat 1.txt
A,B,C,D,E
0.18021800981245173,0.29919271590063656,0.09527248614484807,0.9672038093199938,0.07655003742768962
0.35422759068109766,0.04184770882952815,0.682902924462214,0.9400817219440063,0.8825581077493059
0.3762875793116358,0.4745731412494566,0.6545473610147845,0.7479829630649761,0.15641907539706779
And, indeed, the rest is quite similar to what you did (I append file names to a series, not to data frames. Otherwise they got concatenated several times by sum()):
files = glob.glob('*.txt')
print(files)
['3.txt', '0.txt', '4.txt', '2.txt', '1.txt']
list_of_series = []
for f in files:
df = pd.read_csv(f)
s = df.sum()
s['filename'] = f
list_of_series.append(s)
final_df = pd.concat(list_of_series, axis = 1).T
print(final_df)
A B C D E filename
0 1.0675 2.20957 1.65058 1.80515 2.22058 3.txt
1 0.642805 1.36248 0.0237625 1.87767 1.63317 0.txt
2 1.68678 1.26363 0.835245 2.05305 1.01829 4.txt
3 1.22748 2.09256 0.785089 1.87852 2.05043 2.txt
4 0.910733 0.815614 1.43272 2.65527 1.11553 1.txt
To answer this specific question :
#ThomasTu How do I go from a list of Series with 'Filename' as a
column to a dataframe? I think that's the problem---I don't understand
this
It's essentially what you have now, but instead of appending to an empty list, you append to an empty dataframe. I think there's an inplace keyword if you don't want to reassign newdf on each iteration.
import pandas as pd
import glob
batch_of_dataframes = glob.glob("*.txt")
newdf = pd.DataFrame()
for filename in batch_of_dataframes:
df = pd.read_csv(filename)
df['filename'] = str(filename)
df = df.sum()
newdf = newdf.append(df, ignore_index=True)

Categories

Resources