Convert multiple csv files to Excel files? - python

In Python 2.7, I have a lot of csv files I want to convert to Excel.
The names of the csv files are abcd1.csv, abcd2.csv and so on.
I want to convert them to abcd1.xls, abcd2.xls and so on.
While I am able to do it on one file, I don't know how to do it on multiple files.
This is the function I have used so far:
from openpyxl import Workbook
import csv
wb = Workbook()
ws = wb.active
file_name = "COUNT16_DISTRIBUTION" + str(count3*1) + ".csv"
with open(file_name, 'r') as f:
for row in csv.reader(f):
ws.append(row)
wb.save()
the file_name can be used in a while loop and I can go through each csv file but I dont know how to save them as .xls.

Here is an example with pandas:
import pandas as pd
import os
# Create function that converts csv 2 excel
def csv2excel(filepath, sep=','):
df = pd.read_csv(filepath, sep=sep)
newpath = os.path.splitext(filepath)[0] + '.xlsx'
df.to_excel(newpath, index=False)
# Loop through files and call the function
for f in os.listdir('.'):
if f.endswith('.csv') and f.startswith('abcd'):
csv2excel(f)

Related

How to copy multiple .xlsx files into a respective .csv file?

I have 24 excel files, I'm aiming to copy the .xslx data and to their respective 24 .csv files. I have copied the data over however its creating 10 copies in the .csv files, I believe it has something to do with the for loops. Ive tried to use writerow() rather than writerows() yet that does help. I'm trying to understand openpyxl and its writer and reader objects.
import openpyxl, os, csv
from pathlib import Path
for excelFile in os.listdir('./excelspreadsheets'):
if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
wb = openpyxl.load_workbook('./excelspreadsheets/' + excelFile)
for sheetName in wb.sheetnames:
# Loop through every sheet in the workbook
sheet = wb[sheetName]
sheetTitle = sheet.title
# Create the CSV filename from the Excel filename and sheet title
p = Path(excelFile)
excelFileStemName = p.stem
CsvFilename = excelFileStemName + '_' + sheetTitle + '.csv'
# Create the csv.writer object for this CSV file
print(f'Creating filename {CsvFilename}...')
outputFile = open(CsvFilename, 'w', newline='')
outputWriter = csv.writer(outputFile)
# Create reader object for each excel sheet
fileObj = open('./excelspreadsheets/' + excelFile)
fileReaderObj = csv.reader(fileObj)
# Loop through every row in the excel sheet
for rowNum in range(1, sheet.max_row + 1):
rowData = [] # append each cell to this list
# Loop through each cell in the row
for colNum in range(1, sheet.max_column + 1):
rowData.append(sheet.values)
# write the rowData list to the CSV file.
for row in rowData:
outputWriter.writerows(row)
outputFile.close()
So, each of the newly created .csv files writes the correct data but does it 10 times, rather than once.
Appreciate any feedback thanks.
You can use read_excel and to_csv, which come as part of pandas to read excel file and write the data to csv file. It is just simpler from coding perspective, as the read and write will be done in one line. It also uses Openpyxl underneath. The updated code is below.
import openpyxl, os, csv
from pathlib import Path
import pandas as pd
for excelFile in os.listdir('./excelspreadsheets'):
if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
xls = pd.ExcelFile('./excelspreadsheets/' + excelFile)
for sheetname in xls.sheet_names:
#Read each sheet into df
df = pd.read_excel('./excelspreadsheets/' + excelFile, sheetname)
#Remove .xlsx from filename and create CSV name
CsvFilename = excelFile.rstrip('.xlsx') + '_' + sheetname + '.csv'
print(f'Creating filename {CsvFilename}...')
#Write df as CSV to file
df.to_csv(CsvFilename, index=False)
Let me know if you see any errors...

How to combine multi excel workbook into single workbook with multiple worksheets

I have 3 workboooks with single sheets.I need to combine all workbooks into single workbook with 3 sheets.
I tried the below code :
from pandas import ExcelWriter
writer = ExcelWriter("Sample.xlsx")
for filename in glob.glob("*.xlsx"):
df_excel = pd.read_excel(filename,engine='openpyxl')
(_, f_name) = os.path.split(filename)
(f_short_name, _) = os.path.splitext(f_name)
df_excel.to_excel(writer, f_short_name, index=False)
writer.save()
i got an error like "File is not zip File"
"Sample.xlsx" is created in the same directory as the input workbooks and before you look for all files with glob.glob("*.xlsx"). Therefore you try to read "Sample.xlsx" which is your writer. This isn't working.
Make sure to only iterate over the real input workbooks e.g. like that:
import pandas as pd
from pandas import ExcelWriter
import glob
import os
writer = ExcelWriter("Sample.xlsx")
input_workbooks = glob.glob("*.xlsx")
input_workbooks.remove("Sample.xlsx")
for filename in input_workbooks:
df_excel = pd.read_excel(filename,engine='openpyxl')
(_, f_name) = os.path.split(filename)
(f_short_name, _) = os.path.splitext(f_name)
df_excel.to_excel(writer, f_short_name, index=False)
writer.save()
Better would be to save the output workbook ("Sample.xlsx") to another directory to avoid confusion. Obviously, when you do that, you do can not remove it from the list any longer, so just delete the line: input_workbooks.remove("Sample.xlsx")

Python Pandas csv files to Excel worksheets - Cleanup

I want to take multiple .csv files and convert them to Excel worksheets in one workbook, specifically using Pandas.
I finally got this to work, but I know the code itself is of poorly written.
Any suggestions on how to clean this up?
"Beautify is better than Ugly"
Here is the code:
import pandas as pd
import os
import openpyxl as xl
directory = os.path.join(os.curdir, "data/")
new_xl_file_path = "csv_merge.xlsx"
new_xl_file = xl.Workbook() # Create a new Excel workbook
new_xl_file.save(new_xl_file_path)
name_list = os.listdir(directory) # file1.csv, file2.csv, file3.csv, etc...
full_path_list = [] # For reading with pd.read_csv()
data_frame_list = [] # List to save .csv dataframes
for filename in os.listdir(directory):
f = os.path.join(directory, filename) # Get full path name
df = pd.read_csv(f)
data_frame_list.append(df)
counter = 0
with pd.ExcelWriter(new_xl_file_path) as writer:
for dataframe in data_frame_list:
dataframe.to_excel(writer, index=False, sheet_name=name_list[counter])
counter += 1

How to work around limitations when converting text files to Excel files?

I want to converting the text files to Excel .xls files using python. I have used existing python script by seeing this website. but it is showing something like that more than 65656 records is not supported and.xls format not supported. Can anyone help me for this one?
Here is existing python script for converting text file to excel file:
mypath ='S://Input'
from os import listdir
from os.path import isfile, join
import xlwt
import xlrd
textfiles = [join(mypath,f) for f in listdir(mypath)
if isfile(join(mypath,f)) and '.txt' in f]
style = xlwt.XFStyle()
style.num_format_str = '#,###0.00'
for textfile in textfiles:
f = open(textfile, 'r+')
row_list = []
for row in f:
row_list.append(row.split(' '))
column_list = zip(*row_list)
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet('Sheet1')
i = 0
for column in column_list:
for item in range(len(column)):
value = column[item].strip()
if is_number(value):
worksheet.write(item, i, float(value), style=style)
else:
worksheet.write(item, i, value)
i+=1
workbook.save(textfile.replace('.txt', '.xls'))
See this question Here for reading data from a txt file into a pandas dataframe.
You can use the following code
data = pd.read_csv(FullFilePath, sep="\t", header=None)
data.columns = ["a", "b", "c", "etc."]
To read your file into a data frame
Then once the data is in a data frame, use the to_excel method linked here
with the code:
data.to_excel(FullOutputPath, index = False)

Converting multiple xls files to xlsx- issues with scaling up from single file

We have a few thousand xls files, with dozens of sheets in each file. We are working on a larger project to combine the files and sheets, but first need to convert them to xlsx.
The following code works fine on a single file:
import xlrd
from openpyxl.workbook import Workbook as openpyxlWorkbook
xlsBook = xlrd.open_workbook(C://path)
workbook = openpyxlWorkbook()
for i in xrange(0, xlsBook.nsheets):
xlsSheet = xlsBook.sheet_by_index(i)
sheet = workbook.active if i == 0 else workbook.create_sheet()
sheet.title = xlsSheet.name
for row in xrange(0, xlsSheet.nrows):
for col in xrange(0, xlsSheet.ncols):
sheet.cell(row=row+1, column=col+1).value = xlsSheet.cell_value(row, col)
workbook.save(c://path/workbook.xlsx")
This works perfectly.
When attempting to loop through all files, we use:
import xlrd
from openpyxl.workbook import Workbook as openpyxlWorkbook
import glob
import pandas as pd
from pandas import ExcelWriter
import os
path ="C://path"
path2 = "C://path2"
allFiles = glob.glob(path + "/*.xls")
for file_ in allFiles:
xlsBook = xlrd.open_workbook(file_)
workbook = openpyxlWorkbook()
for i in xrange(0, xlsBook.nsheets):
xlsSheet = xlsBook.sheet_by_index(i)
sheet = workbook.active if i == 0 else workbook.create_sheet()
sheet.title = xlsSheet.name
for row in xrange(0, xlsSheet.nrows):
for col in xrange(0, xlsSheet.ncols):
sheet.cell(row=row+1, column=col+1).value = xlsSheet.cell_value(row, col)
##workbook.save(os.path.join(path2,file_))
##workbook.to_excel(os.path.join(path2,file_))
workbook.save("C://path/workbook.xlsx")
For the first two commented out save methods, workbook.save seems to do absolutely nothing, and to_excel tells me workbook does not have a property called to_excel...is that because I didn't call pandas in the loop?
The final workbook.save was a test- I assumed it would save the final iteration of the loop correctly, since it worked in the script with just one file.
Instead, it creates the file, with all of the worksheets correctly named, but no data in any of the worksheets.
Any idea what I am missing? To be clear, I am looking to have each file named with its original filename at the end of the loop, and a valid xlsx extension.
I'd try this way instead. Simpler code and it worked when I tested it.
import pandas as pd
import glob
def converter(filename):
xl = pd.ExcelFile(filename) # reads file in
sheet_names = xl.sheet_names # gets the sheet names of the file
sheets_dict = {} # dictionary with sheet_names as keys and data as values
for sheet in sheet_names:
sheets_dict[sheet] = xl.parse(sheet)
writer = pd.ExcelWriter(r'C:\Users\you\Desktop\\' + filename.split('\\')[-1][:-4] + '.xlsx') # takes the file path and only returns the file name, now with format xlsx
for sheet_name, data in sheets_dict.iteritems():
data.to_excel(writer, sheet_name, index = False)
writer.save()
files = glob.glob(r'C:\Users\you\Desktop' + '\*.xls')
for file in files:
converter(file)
Edit: I'm not too familiar with openpyxl but I don't believe it has a .to_excel method. I think you were creating a openpyxl workbook but then trying to save it using a pandas method.

Categories

Resources