python: creating excel workbook and dumping csv files as worksheets

python: creating excel workbook and dumping csv files as worksheets - python

I have few csv files which I would like to dump as new worksheets in a excel workbook(xls/xlsx).
How do I achieve this?
Googled and found 'pyXLwriter' but it seems the project was stopped.
While Im trying out 'pyXLwriter' would like to know are there any alternatives/suggestions/modules?
Many Thanks.
[Edit]
Here is my solution: (anyone has much leaner, much pythonic solution? do comment. thx)
import glob
import csv
import xlwt
import os
wb = xlwt.Workbook()
for filename in glob.glob("c:/xxx/*.csv"):
(f_path, f_name) = os.path.split(filename)
(f_short_name, f_extension) = os.path.splitext(f_name)
ws = wb.add_sheet(str(f_short_name))
spamReader = csv.reader(open(filename, 'rb'), delimiter=',',quotechar='"')
row_count = 0
for row in spamReader:
for col in range(len(row)):
ws.write(row_count,col,row[col])
row_count +=1
wb.save("c:/xxx/compiled.xls")
print "Done"

Not sure what you mean by "much leaner, much pythonic" but you certainly could spruce it up a bit:
import glob, csv, xlwt, os
wb = xlwt.Workbook()
for filename in glob.glob("c:/xxx/*.csv"):
(f_path, f_name) = os.path.split(filename)
(f_short_name, f_extension) = os.path.splitext(f_name)
ws = wb.add_sheet(f_short_name)
spamReader = csv.reader(open(filename, 'rb'))
for rowx, row in enumerate(spamReader):
for colx, value in enumerate(row):
ws.write(rowx, colx, value)
wb.save("c:/xxx/compiled.xls")

You'll find all you need in this xlwt tutorial. This libraries (xlrd and xlwt) are the most popular choices for managing Excel interaction in Python. The downside is that, at the moment, they only support Excel binary format (.xls).

Use xlsxwriter to create and write in a excel file in python.
Install it by : pip install xlsxwriter
import xlsxwriter
# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()
# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)
# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})
# Write some simple text.
worksheet.write('A1', 'Hello')
# Text with formatting.
worksheet.write('A2', 'World', bold)
# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)
# Insert an image.
worksheet.insert_image('B5', 'logo.png')
workbook.close()

I always just write the Office 2003 XML format through strings. It's quite easy to do and much easier to manage than writing and zipping up what constitutes a xlsx document. It also doesn't require any external libraries. (though one could easily roll their own)
Also, Excel supports loading CSV files. Both space delimited or character delimited. You can either load it right in, or try to copy & paste it, then press the Text-To-Columns button in the options. This option has nothing to do with python, of course.

Also available in GitHub repo "Kampfmitexcel"...
import csv, xlwt, os
def input_from_user(prompt):
return raw_input(prompt).strip()
def make_an_excel_file_from_all_the_txtfiles_in_the_following_directory(directory):
wb = xlwt.Workbook()
for filename in os.listdir(data_folder_path):
if filename.endswith(".csv") or filename.endswith(".txt"):
ws = wb.add_sheet(os.path.splitext(filename)[0])
with open('{}\\{}'.format(data_folder_path,filename),'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for rowx, row in enumerate(reader):
for colx, value in enumerate(row):
ws.write(rowx, colx, value)
return wb
if __name__ == '__main__':
path_to_data = input_from_user("Where is the data stored?: ")
xls = make_an_excel_file_from_all_the_txtfiles_in_the_following_directory(path_to_data)
xls_name = input_from_user('What do you want to name the excel file?: ')
xls.save('{}\\{}{}'.format(data_folder_path,xls_name,'.xls'))
print "Your file has been saved in the data folder."

This is basing on the answer your answer itself. But the reason I am using xlsxwriter is because, it accepts more data in xlsx format. Where as the xlwt limits you to 65556 rows and xls format.
import xlsxwriter
import glob
import csv
workbook = xlsxwriter.Workbook('compiled.xlsx')
for filename in glob.glob("*.csv"):
ws = workbook.add_worksheet(str(filename.split('.')[0]))
spamReader = csv.reader(open(filename, 'rb'), delimiter=',',quotechar='"')
row_count = 0
print filename
for row in spamReader:
for col in range(len(row)):
ws.write(row_count,col,row[col])
row_count +=1
workbook.close()

Related

ExcelWriter overwriting sheets when writing

This program should take the contents of individual sheets and put them into one excel workbook. It almost does that, but it is overwriting instead of appending new sheets into the final workbook. I read that pandas excel writer is the way to go with this, any ideas as to why its having this behavior?
import xlwt, csv, os
from openpyxl import load_workbook
import pandas as pd
from pandas import ExcelWriter
csv_folder = r'C:\Users\Me\Desktop\Test_Folder\\'
for fil in os.listdir(csv_folder):
if '.xlsx' not in fil:
continue
else:
pass
df = pd.read_excel(csv_folder+fil, encoding = 'utf8')
file_name = fil.replace('.xlsx','')
writer = pd.ExcelWriter('condensed_output.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, sheet_name = file_name)
writer.save()
#writer.close()

Make sure the writer.save() is outside of the loop. Also be aware of the character limit on sheetnames, so if the file names are the same up to a certain point, you run the risk of writing over a sheetname that way as well.
import xlwt, csv, os
from openpyxl import load_workbook
import pandas as pd
from pandas import ExcelWriter
csv_folder = r'C:\Users\Me\Desktop\Test_Folder\\'
writer = pd.ExcelWriter('condensed_output.xlsx', engine = 'xlsxwriter')
for fil in os.listdir(csv_folder):
if '.xlsx' not in fil:
continue
else:
pass
df = pd.read_excel(csv_folder+fil, encoding = 'utf8')
file_name = fil.replace('.xlsx','')
df.to_excel(writer, sheet_name = file_name)
writer.save() #make sure this is outside of the loop.
ETA: establish the writer outside of the loop as well

Print Excel workbook using python

Suppose I have an excel file excel_file.xlsx and i want to send it to my printer using Python so I use:
import os
os.startfile('path/to/file','print')
My problem is that this only prints the first sheet of the excel workbook but i want all the sheets printed. Is there any way to print the entire workbook?
Also, I used Openpyxl to create the file, but it doesn't seem to have any option to select the number of sheets for printing.
Any help would be greatly appreciated.

from xlrd import open_workbook
from openpyxl.reader.excel import load_workbook
import os
import shutil
path_to_workbook = "/Users/username/path/sheet.xlsx"
worksheets_folder = "/Users/username/path/worksheets/"
workbook = open_workbook(path_to_workbook)
def main():
all_sheet_names = []
for s in workbook.sheets():
all_sheet_names.append(s.name)
for sheet in workbook.sheets():
if not os.path.exists("worksheets"):
os.makedirs("worksheets")
working_sheet = sheet.name
path_to_new_workbook = worksheets_folder + '{}.xlsx'.format(sheet.name)
shutil.copyfile(path_to_workbook, path_to_new_workbook)
nwb = load_workbook(path_to_new_workbook)
print "working_sheet = " + working_sheet
for name in all_sheet_names:
if name != working_sheet:
nwb.remove_sheet(nwb.get_sheet_by_name(name))
nwb.save(path_to_new_workbook)
ws_files = get_file_names(worksheets_folder, ".xlsx")
# Uncomment print command
for f in xrange(0, len(ws_files)):
path_to_file = worksheets_folder + ws_files[f]
# os.startfile(path_to_file, 'print')
print 'PRINT: ' + path_to_file
# remove worksheets folder
shutil.rmtree(worksheets_folder)
def get_file_names(folder, extension):
names = []
for file_name in os.listdir(folder):
if file_name.endswith(extension):
names.append(file_name)
return names
if __name__ == '__main__':
main()
probably not the best approach, but it should work.
As a workaround you can create separate .xlsx files where each has only one spreadsheet and then print them with os.startfile(path_to_file, 'print')

I have had this issue(on windows) and it was solved by using pywin32 module and this code block(in line 5 you can specify the sheets you want to print.)
import win32com.client
o = win32com.client.Dispatch('Excel.Application')
o.visible = True
wb = o.Workbooks.Open('/Users/1/Desktop/Sample.xlsx')
ws = wb.Worksheets([1 ,2 ,3])
ws.printout()

you could embed vBa on open() command to print the excel file to a default printer using xlsxwriter's utility mentioned in this article:
PBPYthon's Embed vBA in Excel

Turns out, the problem was with Microsoft Excel,
os.startfile just sends the file to the system's default app used to open those file types. I just had to change the default to another app (WPS Office in my case) and the problem was solved.

Seems like you should be able to just loop through and change which page is active. I tried this and it did print out every sheet, BUT for whatever reason on the first print it grouped together two sheets, so it gave me one duplicate page for each workbook.
wb = op.load_workbook(filepath)
for sheet in wb.sheetnames:
sel_sheet = wb[sheet]
# find the max row and max column in the sheet
max_row = sel_sheet.max_row
max_column = sel_sheet.max_column
# identify the sheets that have some data in them
if (max_row > 1) & (max_column > 1):
# Creating new file for each sheet
sheet_names = wb.sheetnames
wb.active = sheet_names.index(sheet)
wb.save(filepath)
os.startfile(filepath, "print")

openpyxl overwrites all data instead of update the excel sheet

I'm trying to read a string from a text file and write it into an excel sheet without overwriting. I found somewhere that to update excel sheets, openpyxl in used. But my script just overwrites the entire sheet. I want other data to be the same.
python script:
from openpyxl import Workbook
file_name="D:\\a.txt"
content={}
with open(file_name) as f:
for line in f:
(key,value)=line.split(":")
content[key]=value
wb=Workbook()
ws=wb.active
r = 2
for item in content:
ws.cell(row=r, column=3).value = item
ws.cell(row=r, column=4).value = content[item]
r += 1
wb.save("D:\\Reports.xlsx")
Excel sheet before script:
Excel sheet after script :
How do I write the data to excel with overwriting other things ? Help.

Overwriting is due to both saving the file with wb.save() and your hard coded starting row number r = 2.
1) If you don't care of overwriting the rows each time you execute your script you could use something like this:
from openpyxl import Workbook
from openpyxl import load_workbook
path = 'P:\Desktop\\'
file_name = "input.txt"
content= {}
with open(path + file_name) as f:
for line in f:
(key,value)=line.split(":")
content[key]=value
wb = load_workbook(path + 'Reports.xlsx')
ws = wb.active
r = 2
for item in content:
ws.cell(row=r, column=3).value = item
ws.cell(row=r, column=4).value = content[item]
r += 1
wb.save(path + "Reports.xlsx")
2) If you care about overwriting rows and the column numbers (3 & 4) you could try something like this:
from openpyxl import Workbook
from openpyxl import load_workbook
path = 'P:\Desktop\\'
file_name = "input.txt"
content= []
with open(path + file_name) as f:
for line in f:
key, value = line.split(":")
content.append(['','', key, value]) # adding empty cells in col 1 + 2
wb = load_workbook(path + 'Reports.xlsx')
ws = wb.active
for row in content:
ws.append(row)
wb.save(path + "Reports.xlsx")

Importing Multiple Excel Files using OpenPyXL

I am trying to read in multiple excel files and append the data from each file into one master file. Each file will have the same headers (So I can skip the import of the first row after the initial file).
I am pretty new to both Python and the OpenPyXL module. I am able to import the first workbook without problem. My problem comes in when I need to open the subsequent file and copy the data to paste into the original worksheet.
Here is my code so far:
# Creating blank workbook
from openpyxl import Workbook
wb = Workbook()
# grab active worksheet
ws = wb.active
# Read in excel data
from openpyxl import load_workbook
wb = load_workbook('first_file.xlsx') #explicitly loading workbook, will automate later
# grab active worksheet in current workbook
ws = wb.active
#get max columns and rows
sheet = wb.get_sheet_by_name('Sheet1')
print ("Rows: ", sheet.max_row) # for debugging purposes
print ("Columns: ", sheet.max_column) # for debugging purposes
last_data_point = ws.cell(row = sheet.max_row, column = sheet.max_column).coordinate
print ("Last data point in current worksheet:", last_data_point) #for debugging purposes
#import next file and add to master
append_point = ws.cell(row = sheet.max_row + 1, column = 1).coordinate
print ("Start new data at:", append_point)
wb = load_workbook('second_file.xlsx')
sheet2 = wb.get_sheet_by_name('Sheet1')
start = ws.cell(coordinate='A2').coordinate
print("New data start: ", start)
end = ws.cell(row = sheet2.max_row, column = sheet2.max_column).coordinate
print ("New data end: ", end)
# write a value to selected cell
#sheet[append_point] = 311
#print (ws.cell(append_point).value)
#save file
wb.save('master_file.xlsx')
Thanks!

I don't really understand your code. It looks too complicated. When copying between worksheets you probably want to use ws.rows.
wb1 = load_workbook('master.xlsx')
ws2 = wb1.active
for f in files:
wb2 = load_workbook(f)
ws2 = wb2['Sheet1']
for row in ws2.rows[1:]:
ws1.append((cell.value for cell in row))

Converting XLSX to CSV while maintaining timestamps

I'm trying to convert a directory full of XLSX files to CSV. Everything is working except I'm encountering an issue with the columns containing time information. The XLSX file is being created by another program that I can not modify. But I want to maintain the same times that show up when I view the XLSX file in Excel as when it is converted to CSV and viewed in any text editor.
My code:
import csv
import xlrd
import os
import fnmatch
import Tkinter, tkFileDialog, tkMessageBox
def main():
root = Tkinter.Tk()
root.withdraw()
print 'Starting .xslx to .csv conversion'
directory = tkFileDialog.askdirectory()
for fileName in os.listdir(directory):
if fnmatch.fnmatch(fileName, '*.xlsx'):
filePath = os.path.join(directory, fileName)
saveFile = os.path.splitext(filePath)[0]+".csv"
savePath = os.path.join(directory, saveFile)
workbook = xlrd.open_workbook(filePath)
sheet = workbook.sheet_by_index(0)
csvOutput = open(savePath, 'wb')
csvWriter = csv.writer(csvOutput, quoting=csv.QUOTE_ALL)
for row in xrange(sheet.nrows):
csvWriter.writerow(sheet.row_values(row))
csvOutput.close()
print '.csv conversion complete'
main()
To add some detail, if I open one file in Excel I see this in a time column:
00:10.3
00:14.2
00:16.1
00:20.0
00:22.0
But after I convert to CSV I see this in the same location:
0.000118981
0.000164005
0.000186227
0.000231597
0.000254861
Thanks to seanmhanson with his answer https://stackoverflow.com/a/25149562/1858351 I was able to figure out that Excel is dumping the times as decimals of a day. While I should try to learn and use xlrd better, for a quick short term fix I was instead able to convert that into seconds and then from seconds back into the time format originally seen of HH:MM:SS. My (probably ugly) code below in case anyone might be able to use it:
import csv
import xlrd
import os
import fnmatch
from decimal import Decimal
import Tkinter, tkFileDialog
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
def seconds_to_hms(seconds):
input = Decimal(seconds)
m, s = divmod(input, 60)
h, m = divmod(m, 60)
hm = "%02d:%02d:%02.2f" % (h, m, s)
return hm
def main():
root = Tkinter.Tk()
root.withdraw()
print 'Starting .xslx to .csv conversion'
directory = tkFileDialog.askdirectory()
for fileName in os.listdir(directory):
if fnmatch.fnmatch(fileName, '*.xlsx'):
filePath = os.path.join(directory, fileName)
saveFile = os.path.splitext(filePath)[0]+".csv"
savePath = os.path.join(directory, saveFile)
workbook = xlrd.open_workbook(filePath)
sheet = workbook.sheet_by_index(0)
csvOutput = open(savePath, 'wb')
csvWriter = csv.writer(csvOutput, quoting=csv.QUOTE_ALL)
rowData = []
for rownum in range(sheet.nrows):
rows = sheet.row_values(rownum)
for cell in rows:
if is_number(cell):
seconds = float(cell)*float(86400)
hms = seconds_to_hms(seconds)
rowData.append((hms))
else:
rowData.append((cell))
csvWriter.writerow(rowData)
rowData = []
csvOutput.close()
print '.csv conversion complete'
main()

Excel stores time as a float in terms of days. You will need to use XLRD to determine if a cell is a date, and then convert it as needed. I'm not great with XLRD, but you might want something akin to this, changing the string formatting if you want to keep leading zeroes:
if cell.ctype == xlrd.XL_CELL_DATE:
try:
cell_tuple = xldate_as_tuple(cell, 0)
return "{hours}:{minutes}:{seconds}".format(
hours=cell_tuple[3], minutes=cell_tuple[4], seconds=cell_tuple[5])
except (any exceptions thrown by xldate_as_tuple):
//exception handling
The XLRD date to tuple method's documentation can be found here: https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html?p=4966#xldate.xldate_as_tuple-function
For a similar issue already answered, see also this question: Python: xlrd discerning dates from floats

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python: creating excel workbook and dumping csv files as worksheets - python

You'll find all you need in this xlwt tutorial. This libraries (xlrd and xlwt) are the most popular choices for managing Excel interaction in Python. The downside is that, at the moment, they only support Excel binary format (.xls).

Related

ExcelWriter overwriting sheets when writing

Print Excel workbook using python

openpyxl overwrites all data instead of update the excel sheet

Importing Multiple Excel Files using OpenPyXL

Converting XLSX to CSV while maintaining timestamps

Categories

Resources