I'm almost done with merging excel files with pandas in python but when I give the path it wont work. I get the error ''No such file or directory: 'file1.xlsx'''. When I leave the path empty it work but I want to decide from what folder it should take files from. AND I saved the file the folder 'excel'
cwd = os.path.abspath('/Users/Viktor/downloads/excel') #If i leave it empty and have files in /Viktor it works but I have the desired excel files in /excel
print(cwd)
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
df.to_excel(r'/Users/Viktor/Downloads/excel/resultat/merged.xlsx')
pd.read_excel(file) looks for the file relative to the path where the script is executed. If you execute in '/Users/Viktor/' try with:
import os
import pandas as pd
cwd = os.path.abspath('/Users/Viktor/downloads/excel') #If i leave it empty and have files in /Viktor it works but I have the desired excel files in /excel
#print(cwd)
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel('downloads/excel/' + file), ignore_index=True)
df.head()
df.to_excel(r'/Users/Viktor/downloads/excel/resultat/merged.xlsx')
How about actually changing the current working directory with
os.chdir(cwd)
Just printing the path doesn't help.
Use pathlib
Path.glob() to find all the files
Use Path.rglob() if you want to include subdirectories
Use pandas.concat to combine the dataframes created with the pd.read_excel in the list comprehension
from pathlib import Path
import pandas as pd
# path to files
p = Path('/Users/Viktor/downloads/excel')
# find the xlsx files
files = p.glob('*.xlsx')
# create the dataframe
df = pd.concat([pd.read_excel(file, ignore_index=True) for file in files])
# save the file
df.to_excel(r'/Users/Viktor/Downloads/excel/resultat/merged.xlsx')
Related
I am trying to use this code to write my edited csv files to a new directory. Does anyone know how I specify the directory?
I have tried this but it doesn't seem to be working.
dir = r'C:/Users/PycharmProjects/pythonProject1' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.
cols = ['Temperature']
for csv in csv_files: #iterate list
df = pd.read_csv(csv) #read csv
df[cols].to_csv('C:/Users/Desktop', csv.name, index=False)
print(f'{csv.name} saved.')
I think your only problem is the way you're calling to_csv(), passing a directory and a filename. I tried that and got this error:
IsADirectoryError: [Errno 21] Is a directory: '/Users/zyoung/Desktop/processed'
because to_csv() is expecting a path to a file, not a directory path and a file name.
You need to join the output directory and CSV's file name, and pass that, like:
out_dir = PurePath(base_dir, r"processed")
# ...
# ...
csv_out = PurePath(out_dir, csv_in)
df[cols].to_csv(csv_out, index=False)
I'm writing to the subdirectory processed, in my current dir ("."), and using the PurePath() function to do smart joins of the path components.
Here's the complete program I wrote for myself to test this:
import os
from pathlib import Path, PurePath
import pandas as pd
base_dir = r"."
out_dir = PurePath(base_dir, r"processed")
csv_files = [x for x in Path(base_dir).glob("*.csv")]
if not os.path.exists(out_dir):
os.mkdir(out_dir)
cols = ["Temperature"]
for csv_in in csv_files:
df = pd.read_csv(csv_in)
csv_out = PurePath(out_dir, csv_in)
df[cols].to_csv(csv_out, index=False)
print(f"Saved {csv_out.name}")
I have a folder with many zip files and within those zip files are multiple csv files.
Is there any way to get all of the .csv files in one dataframe in python?
Or any way I can pass a list of zip files?
The code I am currently trying is:
import glob
import zipfile
import pandas as pd
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\data_00-01.zip"):
# This is just one file. There are multiple zip files in the folder
zf = zipfile.ZipFile(zip_file)
dfs = [pd.read_csv(zf.open(f), header=None, sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
print(df)
This code works for one zipfile but I have about 50 zip files in the folder and I would like to read and concatenate all csv files in those zip files in one dataframe.
Thanks
The following code should satisfy your requirements (just edit dir_name according to what you need):
import glob
import zipfile
import pandas as pd
dfs = []
for filename in os.listdir(dir_name):
if filename.endswith('.zip'):
zip_file = os.path.join(dir_name, filename)
zf = zipfile.ZipFile(zip_file)
dfs += [pd.read_csv(zf.open(f), header=None, sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
I have 10s of tab delimeted text files in my local directory. When I copy and paste a text file into an excel sheet, it becomes a file having 100s of columns. Now, I would like to read all the text files and convert them to corresponding excel files.
If there was a single file, I would have done the following way:
import pandas as pd
df = pd.read_csv("H:\\Yugeen\\text1.txt", sep='\t')
df.to_excel('H:\\Yugeen\\output1.xlsx', 'Sheet1', index = False)
Is there any way to achive a solution that I am looking for ?
I use this function to list all files in a directory, along with their file path:
import os
def list_files_in_directory(path):
'''docstring for list_files_in_directory'''
x = []
for root, dirs, files in os.walk('.'+path):
for file in files:
x.append(root+'/'+file)
return x
Selecting for only text files:
files = list_files_in_directory('.')
filtered_files = [i for i in files if '.txt' in i]
Like Sophia demonstrated, you can use pandas to create a dataframe. I'm assuming you want to merge these files as well.
import pandas as pd
dfs = []
for file in filtered_files:
df = pd.read_csv(file,sep='\t')
dfs.append(df)
df_master = pd.concat(dfs,axis=1)
filename = 'master_dataframe.csv'
df_master.to_csv(filename,index=False)
The saved file can then be opened in Excel.
Are you talking about how to get the filenames? You can use the glob library.
import glob
import pandas as pd
file_paths = glob.glob('your-directory\\*.txt')
for file in file_path:
df = pd.read_csv(file,sep='\t')
df.to_excel('output-directory\\filename.xlsx',index=False)
Does this answer your question?
I have multiple excel files in a folder ('folder_A') (file 1, 2, 3, etc.)
I want to import those file, do something with them (in pandas) and write the excel file to a csv file in a different folder ('updated_folder_A').
I almost got it working but for some reason it doesn't work
the files don't go ('updated_folder_A'). Can someone tell my what I'm doing wrong?
test.py:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
df.to_csv('updated_{}'.format(file), index = False)
Expanding on #Anteino's answer, assuming your folder structure is like this:
Parent folder
folder_A
file1.xlsx
file2.xlsx
updated_folder_A
Then, if your script's inside Parent folder, this should work:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
file = file[:-5] #Extract .xslx from file name
df.to_csv('updated_folder_A/updated_{}.csv'.format(file), index = False)
Change the last line to:
df.to_csv('updated_folder_A/updated_{}'.format(file), index = False)
And make sure that folder exists too.
I have a folder with lots of .txt files. How can I read all the files in the folder and get the content of them with pandas?. I tried the following:
import pandas as pd
list_=pd.read_csv("/path/of/the/directory/*.txt",header=None)
print list_
Something like this:
import glob
l = [pd.read_csv(filename) for filename in glob.glob("/path/*.txt")]
df = pd.concat(l, axis=0)
You have to take into account the header, for example if you want to ignore it take a look at the skiprows option in read_csv.
I used this in my project for merging the csv files
import pandas as pd
import os
path = "path of the file"
files = [file for file in os.listdir(path) if not file.startswith('.')]
all_data = pd.DataFrame()
for file in files:
current_data = pd.read_csv(path+"/"+file , encoding = "ISO-8859-1")
all_data = pd.concat([all_data,current_data])