my Python code is used to consolidate multiple excel files from folder "excel_Report"
into 1 master excel file.
i have installed all libraries: pyodbc, pandas , plyer ,glob2.
But when i execute python. there is an Error:
"NameError: name 'filenames' is not defined"
i don't know what is wrong with my code. Can you please help?
Thank you
import pyodbc
import pandas as pd
import os
from datetime import datetime
from plyer import notification
import glob
# getting excel files to be merged from the Desktop
path = "T:\excel_Report"
# read all the files with extension .xlsx i.e. excel
excel_files = glob.glob(path + "\*.xlsx")
print('File names:', filenames)
# empty data frame for the new output excel file with the merged excel files
outputxlsx = pd.DataFrame()
with xw.App(visible=False) as app:
combined_wb = app.books.add()
for excel_file in excel_files:
wb = app.books.open(excel_file)
for sheet in wb.sheets:
sheet.copy(after=combined_wb.sheets[0])
wb.close()
#combined_wb.sheets[0].delete()
combined_wb.save("T:/excel_Report/test.xlsx")
combined_wb.close()
You have a typo
you search for all files in the path that end with .xlsx and name that variable excel_files
# read all the files with extension .xlsx i.e. excel
excel_files = glob.glob(path + "\*.xlsx")
yet you try to access a variable called filenames
print('File names:', filenames)
You need to fix it to
print('File names:', excel_files)
Related
I am working on a code that gets as input a zip file that contains excel files, extract them in a folder, convert them in dataframes and load all these dataframes files in a list. I would like to create a new folder, convert those dataframes in csv files and save them in the above-mentioned folder. The goal is to be able to download as a zip file a folder of csv files.
The main problem for me is to make sure that every csv file has the name of the excel file it was originated from.
I'm adding my code, in the first block there's the first part of the code, while in the second one there's the part of the code in which i have a problem.
running this last part of the code i get this error:
"XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf;data'"
%%capture
import os
import numpy as np
import pandas as pd
import glob
import os.path
!pip install xlrd==1.2.0
from google.colab import files
uploaded = files.upload()
%%capture
zipname = list(uploaded.keys())[0]
destination_path = 'files'
infolder = os.path.join('/content/', destination_path)
!unzip -o $zipname -d $destination_path
# Load an excel file, return events dataframe + file header dataframe
def load_xlsx(fullpath):
return events, meta
tasks = [os.path.join(dp, fname) for dp, dn, filenames in os.walk(infolder) for fname in filenames if fname.lower().endswith('.xls')]
dfs = []
metas = []
for fname in tasks:
df, meta = load_xlsx(fname)
dfs.append(df)
metas.append(meta)
newpath = 'csv2021'
if not os.path.exists(newpath):
os.makedirs(newpath)
filepath = os.path.join('/content/files/', newpath)
for fname in tasks:
filename = load_xlsx(fname)
my_csv = filename.to_csv(os.path.join(filepath, filename), encoding="utf-8-sig" , sep = ';')
I need to transform csv files into Excel files in an automatic way. I am failing in naming Excel files with the name of the corresponding csv file.
I saved csv files as 'Trials_1', 'Trials_2', Trilas_3' but with the code that I wrote Python gives me an error and asks me for csv file named 'Trials_4'. Then, if I rename csv file 'Trials_1' into 'Trials_4' the program works and generates an Excel file named 'Trials_1'.
How can I correct my code?
'''
import csv
import openpyxl as xl
import os, os.path
directory=r'C:\\Users\\PycharmProjects\\input\\'
folder=r'C:\\Users\\PycharmProjects\\output\\'
for csv_file in os.listdir(directory):
def csv_to_excel(csv_file, excel_file):
csv_data=[]
with open(os.path.join(directory, csv_file)) as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder,excel_file))
if __name__=="__main__":
m = sum(1 for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f)))
new_name = "{}Trial_{}.csv".format(directory, m + 1)
k = sum(1 for file in os.listdir(folder) if os.path.isfile(os.path.join(folder, file)))
new_name_e = "{}Trial_{}.xlsx".format(folder, k + 1)
csv_to_excel(new_name,new_name_e)
'''
Thanks.
Hi Annachiara welcome to StackOverflow,
I would modify the "csv_to_excel" function by using only pandas.
Before that you should install 'xlsxwriter' with:
pip install XlsxWriter
Then the function would be like this:
def csv_to_excel(csv_file,excel_file,csv_sep=';'):
# read the csv file with pandas
df=pd.read_csv(csv_file,sep=csv_sep)
# create the excel file
writer=pd.ExcelWriter(excel_file, engine='xlsxwriter')
# copy the csv content (df) into the excel file
df.to_excel(writer,index=False)
# save the excel file
writer.save()
# print what you converted for reference
print(f'csv file {csv_file} saved as excel in {excel_file}')
Just only make sure that the csv is read correctly: I added just the separator parameter, but you might want to add all the other parameters (like parse dates etc.)
Then you can convert the list of csv files with a for loop (I used more steps to make it clearer)
dir_in=r'C:\\Users\\PycharmProjects\\input\\'
dir_out=r'C:\\Users\\PycharmProjects\\output\\'
csvs_to_convert=os.listdir(dir_in)
for csv_file_in in csvs_to_convert:
# remove extension from csv files
file_name_no_extension=os.path.splitext(csv_file_in)[0]
# add excel extension .xlsx
excel_name_out=file_name_no_extension+'.xlsx'
# write names with their directories
complete_excel_name_out=os.path.join(dir_out,excel_name_out)
complete_csv_name_in=os.path.join(dir_in,csv_file_in)
# convert csv file to excel file
csv_to_excel(complete_csv_name_in,complete_excel_name_out,csv_sep=';')
Each csv as seperate excel file
import glob
import pandas as pd
import os
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '.xlsx')
df = pd.read_csv(filename)
df.to_excel(sheet_name, index=False)
All csv in same excel in different sheet
import glob
import pandas as pd
import os
# Create excel file
writer = pd.ExcelWriter('all_csv.xlsx')
csv_files = glob.glob('*.csv')
for filename in csv_files:
sheet_name = os.path.split(filename)[-1].replace('.csv', '')
df = pd.read_csv(filename)
# Append each csv as sheet
df.to_excel(writer, sheet_name=sheet_name, index=False)
writer.save()
Assuming you would like to keep the same structure of your code, I just fixed some technical issues in your code to make it work (please change the folders path to your own):
import csv
import openpyxl as xl
import glob, os, os.path
directory= 'input'
folder= '../output' # Since 'input' would be my cwd, need to step back a directory to reach 'output'
# Using your function, just passing different arguments for convinient.
def csv_to_excel(f_path, f_name):
csv_data=[]
with open(f_path, 'r') as file_obj:
reader=csv.reader(file_obj)
for row in reader:
csv_data.append(row)
workbook= xl.Workbook()
sheet=workbook.active
for row in csv_data:
sheet.append(row)
workbook.save(os.path.join(folder, f_name + ".xlsx"))
def main():
os.chdir(directory) # Defining input directory as your cwd
# Searching for all files with csv extention and sending each to your function
for file in glob.glob("*.csv"):
f_path = os.getcwd() + '\\' + file # Saving the absolute path to the file
f_name = (os.path.splitext(file)[0]) # Saving the name of the file
csv_to_excel(f_path, f_name)
if __name__=="__main__":
main()
P.S:
Please avoid iterating a definition of a function since you only need to define a function once.
I am interested in getting this script to open an excel file, and save it again as a .csv or .txt file. I'm pretty sure the problem with this is the iteration - I haven't coded it correctly to iterate properly over the contents of the folder. I am new to Python, and I managed to get this code to sucessfully print a copy of the contents of the items in the folder by the commented out part. Can someone please advise what needs to be fixed?
My error is: raise XLRDError('Unsupported format, or corrupt file: ' + msg)
from xlrd import open_workbook
import csv
import glob
import os
import openpyxl
cwd= os.getcwd()
print (cwd)
FileList = glob.glob('*.xlsx')
#print(FileList)
for i in FileList:
rb = open_workbook(i)
wb = copy(rb)
wb.save('new_document.csv')
I would just use:
import pandas as pd
import glob
import os
file_list = glob.glob('*.xlsx')
for file in file_list:
filename = os.path.split(file, )[1]
pd.read_excel(file).to_csv(filename.replace('xlsx', 'csv'), index=False)
It appears that your error is related to the excel files, not because of your code.
Check that your files aren't also open in Excel at the same time.
Check that your files aren't encrypted.
Check that your version of xlrd supports the files you are reading
In the above order. Any of the above could have caused your error.
I have a folder JanuaryDataSentToResourcePro that contain multiple .xlsx files.
I want to iterate through folder and convert all of them into .csv and keep the same file name.
For that I'm trying to implement glob, but getting an error: TypeError: 'module' object is not callable
import glob
excel_files = glob('*xlsx*')
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(r'''C:\Users\username\Documents\TestFolder\JanuaryDataSentToResourcePro\ResourceProDailyDataset_01_01_2018.xlsx''', 'ResourceProDailyDataset')
df.to_csv(out)
I am new to python. Does it look right?
UPDATE:
import pandas as pd
import glob
excel_files = glob.glob("*.xlsx")
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel, 'ResourceProDailyDataset')
df.to_csv(out)
But still not converting convert .xlsx to .csv
The glob package should be used like:
import glob
f = glob.glob("*.xlsx")
The glob is not a method but glob.glob is.
========================================
import glob
excel_files = glob.glob('C:/Users/username/Documents/TestFolder/JanuaryDataSentToResourcePro/*.xlsx') # assume the path
for excel in excel_files:
out = excel.split('.')[0]+'.csv'
df = pd.read_excel(excel) # if only the first sheet is needed.
df.to_csv(out)
I am relatively new to python and Stackoverflow but hoping anyone can shed some light of my current problem. I have a python script that takes excel files (.xls, and .xlsx) from one directory and converts them to .csv files to another directory. It works perfectly fine on my sample excel files (consisted of 4 columns and 1 row for the purpose of testing), but when I try to run my script against a different directory that has excel files (alot larger in file size) I am getting an assertion error. I have attached my code and the error. Looking forward to have some guidance on this problem. Thanks!
import os
import pandas as pd
source = "C:/.../TestFolder"
output = "C:/.../OutputCSV"
dir_list = os.listdir(source)
os.chdir(source)
for i in range(len(dir_list)):
filename = dir_list[i]
book = pd.ExcelFile(filename)
#writing to csv
if filename.endswith('.xlsx') or filename.endswith('.xls'):
for i in range(len(book.sheet_names)):
df = pd.read_excel(book, book.sheet_names[i])
os.chdir(output)
new_name = filename.split('.')[0] + str(book.sheet_names[i])+'.csv'
df.to_csv(new_name, index = False)
os.chdir(source)
print "New files: ", os.listdir(output)
Since you use Windows, consider the Jet/ACE SQL engine (Windows .dll files) to query Excel workbooks and export to CSV files, bypassing needs to load/export with pandas dataframes.
Specifically, use pyodbc to make the ODBC connection to Excel files, iterate through each sheet and export to csv files using SELECT * INTO ... SQL action query. The openpyxl module is used to retrieve sheet names. Below script does not rely on relative paths so can be run from anywhere. It is assumed each Excel file has complete header columns (no missing cells in used range of top row).
import os
import pyodbc
from openpyxl import load_workbook
source = "C:/Path/To/TestFolder"
output = "C:/Path/To/OutputCSV"
dir_list = os.listdir(source)
for xlfile in dir_list:
strfile = os.path.join(source, xlfile)
if strfile.endswith('.xlsx') or strfile.endswith('.xls'):
# CONNECT TO WORKBOOK
conn = pyodbc.connect(r'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};' + \
'DBQ={};'.format(strfile), autocommit=True)
# RETRIEVE WORKBOOK SHEETS
sheets = load_workbook(filename = strfile, use_iterators = True).get_sheet_names()
# ITERATIVELY EXPORT SHEETS TO CSV IN OUTPUT FOLDER
for s in sheets:
outfile = os.path.join(output, '{0}_{1}.csv'.format(xlfile.split('.')[0], s))
if os.path.exists(outfile): os.remove(outfile)
strSQL = " SELECT * " + \
" INTO [text;HDR=Yes;Database={0};CharacterSet=65001].[{1}]" + \
" FROM [{2}$]"
conn.execute(strSQL.format(output, os.path.basename(outfile, s))
conn.close()
**Note: this process creates a schema.ini file that concatenates with each iteration. Can be deleted.