I have multiple excel files in a folder ('folder_A') (file 1, 2, 3, etc.)
I want to import those file, do something with them (in pandas) and write the excel file to a csv file in a different folder ('updated_folder_A').
I almost got it working but for some reason it doesn't work
the files don't go ('updated_folder_A'). Can someone tell my what I'm doing wrong?
test.py:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
df.to_csv('updated_{}'.format(file), index = False)
Expanding on #Anteino's answer, assuming your folder structure is like this:
Parent folder
folder_A
file1.xlsx
file2.xlsx
updated_folder_A
Then, if your script's inside Parent folder, this should work:
import glob
import pandas as pd
files = glob.glob('folder_A/*.xlxs')
for file in files:
df = pd.read_excel(file)
df['Col1'] = df['Col1'] / 60
file = file[:-5] #Extract .xslx from file name
df.to_csv('updated_folder_A/updated_{}.csv'.format(file), index = False)
Change the last line to:
df.to_csv('updated_folder_A/updated_{}'.format(file), index = False)
And make sure that folder exists too.
Related
I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?
You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.
I have 10s of tab delimeted text files in my local directory. When I copy and paste a text file into an excel sheet, it becomes a file having 100s of columns. Now, I would like to read all the text files and convert them to corresponding excel files.
If there was a single file, I would have done the following way:
import pandas as pd
df = pd.read_csv("H:\\Yugeen\\text1.txt", sep='\t')
df.to_excel('H:\\Yugeen\\output1.xlsx', 'Sheet1', index = False)
Is there any way to achive a solution that I am looking for ?
I use this function to list all files in a directory, along with their file path:
import os
def list_files_in_directory(path):
'''docstring for list_files_in_directory'''
x = []
for root, dirs, files in os.walk('.'+path):
for file in files:
x.append(root+'/'+file)
return x
Selecting for only text files:
files = list_files_in_directory('.')
filtered_files = [i for i in files if '.txt' in i]
Like Sophia demonstrated, you can use pandas to create a dataframe. I'm assuming you want to merge these files as well.
import pandas as pd
dfs = []
for file in filtered_files:
df = pd.read_csv(file,sep='\t')
dfs.append(df)
df_master = pd.concat(dfs,axis=1)
filename = 'master_dataframe.csv'
df_master.to_csv(filename,index=False)
The saved file can then be opened in Excel.
Are you talking about how to get the filenames? You can use the glob library.
import glob
import pandas as pd
file_paths = glob.glob('your-directory\\*.txt')
for file in file_path:
df = pd.read_csv(file,sep='\t')
df.to_excel('output-directory\\filename.xlsx',index=False)
Does this answer your question?
I'm almost done with merging excel files with pandas in python but when I give the path it wont work. I get the error ''No such file or directory: 'file1.xlsx'''. When I leave the path empty it work but I want to decide from what folder it should take files from. AND I saved the file the folder 'excel'
cwd = os.path.abspath('/Users/Viktor/downloads/excel') #If i leave it empty and have files in /Viktor it works but I have the desired excel files in /excel
print(cwd)
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
df.head()
df.to_excel(r'/Users/Viktor/Downloads/excel/resultat/merged.xlsx')
pd.read_excel(file) looks for the file relative to the path where the script is executed. If you execute in '/Users/Viktor/' try with:
import os
import pandas as pd
cwd = os.path.abspath('/Users/Viktor/downloads/excel') #If i leave it empty and have files in /Viktor it works but I have the desired excel files in /excel
#print(cwd)
files = os.listdir(cwd)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel('downloads/excel/' + file), ignore_index=True)
df.head()
df.to_excel(r'/Users/Viktor/downloads/excel/resultat/merged.xlsx')
How about actually changing the current working directory with
os.chdir(cwd)
Just printing the path doesn't help.
Use pathlib
Path.glob() to find all the files
Use Path.rglob() if you want to include subdirectories
Use pandas.concat to combine the dataframes created with the pd.read_excel in the list comprehension
from pathlib import Path
import pandas as pd
# path to files
p = Path('/Users/Viktor/downloads/excel')
# find the xlsx files
files = p.glob('*.xlsx')
# create the dataframe
df = pd.concat([pd.read_excel(file, ignore_index=True) for file in files])
# save the file
df.to_excel(r'/Users/Viktor/Downloads/excel/resultat/merged.xlsx')
I have several csv files in a folder that I need to read and do the same thing to each file. I want to rename each dataframe that is created with the file name, but am not sure how. Could I store the file names in a list and then refer to them later somehow...? My current code is bellow. Thank you in advance.
import os
Path = "C:\Users\DATA"
filelist = os.listdir(Path)
for x in filelist:
RawData = pd.read_csv("C:\Users\DATA\%s" % x)
What if you have just one dataframe with all files?
import os
path = "C:\Users\DATA"
raw_data = {i: pd.read_csv(os.path.abspath(i)) for i in os.listdir(path)}
I have a folder with lots of .txt files. How can I read all the files in the folder and get the content of them with pandas?. I tried the following:
import pandas as pd
list_=pd.read_csv("/path/of/the/directory/*.txt",header=None)
print list_
Something like this:
import glob
l = [pd.read_csv(filename) for filename in glob.glob("/path/*.txt")]
df = pd.concat(l, axis=0)
You have to take into account the header, for example if you want to ignore it take a look at the skiprows option in read_csv.
I used this in my project for merging the csv files
import pandas as pd
import os
path = "path of the file"
files = [file for file in os.listdir(path) if not file.startswith('.')]
all_data = pd.DataFrame()
for file in files:
current_data = pd.read_csv(path+"/"+file , encoding = "ISO-8859-1")
all_data = pd.concat([all_data,current_data])