Excel python openpyxl methods - python

Python excel methods doesn't work correctly.
import pandas as pd
import os
from openpyxl import load_workbook
path = "C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\"
spreadsheet_file = pd.read_excel(os.path.join(path, "PlikExcelDoKonwersji.xlsx"), engine='openpyxl', header = 1)
print(spreadsheet_file)
It works perfectly, but if I would like to use methods from openpyxl I have error.
import pandas as pd
import os
from openpyxl import load_workbook
path = "C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\"
spreadsheet_file = pd.read_excel(os.path.join(path, "PlikExcelDoKonwersji.xlsx"), engine='openpyxl', header = 1)
#sheet = spreadsheet_file.sheet_by_name('sheet')
book = load_workbook(path, "PlikExcelDoKonwersji.xlsx")
sheet = book['SendMail1']
data = []
for row in sheet.rows:
print(row[1].value)
Error:
line 94, in _validate_archive
raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
Process finished with exit code 1

load_workbook function only takes a single argument for file-path.
try
book = load_workbook(os.path.join(path, "PlikExcelDoKonwersji.xlsx"))

Instead of making the path 2 values, make it 1.
For Example, Try:
book = load_workbook("C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\PlikExcelDoKonwersji.xlsx")

Thanks guys it works but now another problem occurs.
I would like to start as before without first row, I used header = 1, but now it doesn't work I go for documentation.
def load_workbook(filename, read_only=False, keep_vba=KEEP_VBA,
data_only=False, keep_links=True):
So it isn't there. How you manage it in openpyxl?
I would like to make something like:
Find name in column[1], the same will be in column[2] and get one of data from row[7] using column[1] and column[2].
Thanks for any suggestions.

Related

Combining excel workbook sheet into one using python

I have roughly 30 excel workbooks I need to combine into one. Each workbook has a variable number of sheets but the sheet I need to combine from each workbook is called "Output" and the format of the columns in this sheet is consistent.
I need to import the Output sheet from the first file, then append the remaining files and ignore the header row.
I have tried to do this using glob/pandas to no avail.
You could use openpyxl. Here's a sketch of the code:
from openpyxl import load_workbook
compiled_wb = load_workbook(filename = 'yourfile1.xlsx')
compiled_ws = compiled['Output']
for i in range(1, 30):
wb = load_workbook(filename = 'yourfile{}.xlsx'.format(i))
ws = wb['Output']
compiled_ws.append(ws.rows()[1:]) # ignore row 0
compiled_wb.save('compiled.xlsx')
Method shown by Clinton c. Brownley in Foundations for Analytics with Python:
execute in shell indicating the path to the folder with excel files ( make sure the argument defining all_workbooks is correct) and then followed by the excel output file as follows:
python script.py <the /path/ to/ excel folder/> < your/ final/output.xlsx>
script.py:
import pandas as pd
import sys
import os
import glob
input_path = sys.argv[1]
output_file = sys.argv[2]
all_workbooks = glob.glob(os.path.join(input_file, '*.xlsx'))
all_df = []
for workbook in all_workbooks:
all_worksheets = pd.read_excel(workbook, sheetname='Output', index_col=None)
for worksheet, data in all_worksheets.items:
all_df.append(data)
data_concatenated = pd.concat(all_df, axis=0, ignore_index=True)
writer = pd.ExcelWriter(output_file)
data_concatenated.to_excel(writer, sheetname='concatenated_Output', index=False)
writer.save()
This will probably get down-voted because this isn't a Python answer, but honestly, I wouldn't use Python for this kind of task. I think you are far better off installing the AddIn below, and using that for the job.
https://www.rondebruin.nl/win/addins/rdbmerge.htm
Click 'Merge all files from the folder in the Files location selection' and click 'Use a Worksheet name' = 'Output', and finally, I think you want 'First cell'. Good luck!

Save selection of multiple Excel workbooks to one pdf with Python

I want to make a pdf somposed by ranges in all Excel-workbooks located in a given folder (folderwithallfiles). All workbooks will have the same structure so the range reference will be the same for all workbooks.
I thought I got it with the script below, but it does not work.
import win32com.client as win32
import glob
import os
xlfiles = sorted(glob.glob("*.xlsx"))
#print "Reading %d files..."%len(xlfiles)
cwd = "C:\\Users\\user\folderwithallfiles"
#cwd = os.getcwd()
path_to_pdf = r'C:\\Users\\user\folderwithallfiles\multitest.pdf'
excel = win32.gencache.EnsureDispatch('Excel.Application')
for xlfile in xlfiles:
wb = excel.Workbooks.Open(cwd+"\\"+xlfile)
ws = wb.Sheets('sheet 1')
ws.Range("A1:Q59").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)
Please check the below code if it works. I have written on the fly. Let me know if you find issues in it.
import pandas as pd
import numpy as np
import glob
import pdfkit as pdf
all_data = pd.DataFrame()
for f in glob.glob("filepath\file*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df, ignore_index=True)
all_data.to_html("filepath\all_data.html)
pdf.from_file("filepath\all_data.html", "filepath\all_data.pdf")

Pandas Excel Writer using Openpyxl with existing workbook

I have code from a while ago that I am re-using for a new task. The task is to write a new DataFrame into a new sheet, into an existing excel file. But there is one part of the code that I do not understand, but it just makes the code "work".
working:
from openpyxl import load_workbook
import pandas as pd
file = r'YOUR_PATH_TO_EXCEL_HERE'
df1 = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
book = load_workbook(file)
writer = pd.ExcelWriter(file, engine='openpyxl')
writer.book = book # <---------------------------- piece i do not understand
df1.to_excel(writer, sheet_name='New', index=None)
writer.save()
The little line of writer.book=book has me stumped. Without that piece of code, the Excel file will delete all other sheets, except the sheet used in the sheetname= parameter in df1.to_excel.
i looked at xlsxwriter's documentation as well as openpyxl's, but cannot seem to figure out why that line gives me my expected output. Any ideas?
edit: i believe this post is where i got the original idea from.
In the source code of ExcelWriter, with openpyxl, it initializes empty workbook and delete all sheets. That's why you need to add it explicitly
class _OpenpyxlWriter(ExcelWriter):
engine = 'openpyxl'
supported_extensions = ('.xlsx', '.xlsm')
def __init__(self, path, engine=None, **engine_kwargs):
# Use the openpyxl module as the Excel writer.
from openpyxl.workbook import Workbook
super(_OpenpyxlWriter, self).__init__(path, **engine_kwargs)
# Create workbook object with default optimized_write=True.
self.book = Workbook()
# Openpyxl 1.6.1 adds a dummy sheet. We remove it.
if self.book.worksheets:
try:
self.book.remove(self.book.worksheets[0])
except AttributeError:
# compat
self.book.remove_sheet(self.book.worksheets[0])

How can I convert a sheet to a string? Or do I even need to?

I'm having a problem taking an xlrd document and placing it into an xlwt file to be saved. I keep getting the error:
decode() argument 1 must be string, not Sheet
How can I change a Sheet back into a string? here is my code:
import xlrd
import xlwt
wb = xlrd.open_workbook("Workbook1.xlsx")
sh = wb.sheet_by_name("worksheet")
wbk = xlwt.Workbook(sh)
sheet = wbk.add_sheet("sheet1")
You need to use xlutils to bridge the two. That way your code will become:
import xlrd, xlwt, xlutils
read_book = xlrd.open_workbook("Workbook1.xlsx")
write_book = xlutils.copy(read_book)
write_sheet = write_book.add_sheet("sheet1")

How do you read excel files with xlrd on Appengine

I am using xlrd in appengine. I use flask
I cant read the input file and it keeps on showing the same error message
the code is
def read_rows(inputfile):
rows = []
wb = xlrd.open_workbook(inputfile)
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
rows.append(sh.row_values(rownum))
return rows
#app.route('/process_input/',methods=['POST','GET'])
def process_input():
inputfile = request.files['file']
rows=read_rows(request.files['file'])
payload = json.dumps(dict(rows=rows))
return payload
I realize that this might be caused by not uploading and saving it as a file. Any workaround on this? This would help many others as well. Any help is appreciated, thx
Update: Found a solution that I posted below. For those confused with using xlrd can refer to the open source project repo I posted. The key is passing the content of the file instead of the filename
Find a solution finally
here's how I do it. Instead of saving the file, I read the content of the file and let xlrd reads it
def read_rows(inputfile):
rows = []
wb = xlrd.open_workbook(file_contents=inputfile.read())
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
rows.append(sh.row_values(rownum))
return rows
worked nicely and turned the excel files into JSON-able formats. If you want to output the json simply use json.dumps().
full code example can be found at https://github.com/cjhendrix/HXLator/blob/master/gae/main.py and it features full implementation of the xlrd and how to work with the data.
Thx for the pointers
Use:
wb = xlrd.open_workbook(file_contents=inputfile)
The way you are invoking open_workbook expects what you're passing in to be a filename, not a Flask FileStorage object wrapping the actual file.
Judge from your traceback.
File "/Users/fauzanerichemmerling/Desktop/GAEHxl/gae/lib/xlrd/init.py", line 941, in biff2_8_load
f = open(filename, open_mode)
You can try changing this line to :
f = filename

Categories

Resources