Differences between xlwings vs openpyxl Reading Excel Workbooks - python

I've mostly only used xlwings to open (read-write) workbooks (since the workbooks I read have complicated macros). But I've recently begun using openpyxl to open (read-only) workbooks when I've needed to read thousands of workbooks to scrape some data.
I've noticed that there is a considerable difference between how xlwings and openpyxl read workbooks. I believe xlwings relies on pywin32 to read workbooks. When you read a workbook with xlwings.Book(<filename>) the actual workbook opens up. I have a feeling this is a result of pywin32.
However, when using openpyxl.load_workbook(<filename>) a workbook window does not appear. I have a feeling this is a result of not using pywin32.
Beyond this, I've no further understanding how the backends work for each libraries. Could someone shine some light on this? Is there a benefit/cost to relying on xlwings and pywin32 for reading workbooks, as opposed to openpyxl which does not seem to use pywin32?

You are correct in that xlwings relies on pywin32, whereas openpyxl does not.
openpyxl
A ".xlsx" excel file is essentially a zip-file containing multiple XML files formatted according to Microsoft's OOXML specification. With this specification it's possible to create a program capable of directly reading/writing excel files in just about any programming language. This is the approach applied in openpyxl: it uses python code to read/write excel files directly.
xlwings
A Microsoft Excel application can be started and controlled by an external program through the Win32 COM API. The pywin32 package provides an interface between Win32 COM and Python. Through a python script with the right pywin32 commands you can fully control an Excel Application (open excel files, query data from cells, write data to cells, save excel files, etc.). The pywin32 commands that you can use mirror the Excel VBA commands, albeit with python syntax.
xlwings is (among other things) a user-friendly wrapper around pywin32. It introduces several concise-yet-powerful methods. An example would be the methods for direct conversion of an excel cell range to a numpy array or pandas dataframe (and vice versa).
Summary
A fundamental difference between xlwings and openpyxl is that the former requires that MS Excel is installed on your machine, whereas the latter does not.

Related

How can I use the Python language to put add-ins in Excel without using Pyxll or xlwings or VBA?

I am trying to figure out how to use Python-based functions in excel. I came across Pyxll which can make Python add-ins instead of using VBA. But Pyxll is not free after their 30-day trial.
I also came across xlwings which worked fine and served the purpose of adding udfs in excel, the problem is--- it is not very user-friendly. I have to put the excel file in the same folder as the python file plus, they should also have the same names with different extensions. Or I may use the xlwings quickstart command to do that.
This means I have to create such folders everytime I wish to include my python based functions in a new excel project file and copy paste the functions from the previous python files to the newly created quickstart files.
I was wondering if there is any way to use only one python file to import user-defined functions using xlwings or perhaps a different library/module which is free to use and does that?
(PS: According to the xlwings documentation, we can point to a udf module under the xlwings tab in excel but even after many attempts I am not able to make it work )

Pandas: Does pd.read_excel() need Excel to be installed for it to work?

I was wondering if pd.read_excel() needs Microsoft Excel to be installed on the computer for it to work? I'm not sure if my customer will have Excel installed or not so I don't want the program to break down if it is not available.
Thanks!
pandas's pd.read_excel uses xlrd package to read excel file.
So it wont need Microsoft Excel.

Can I create Excel workbooks with only Pandas (Python)?

In the pandas documentation, it says that the optional dependencies for Excel I/O are:
xlrd/xlwt: Excel reading (xlrd) and writing (xlwt)
openpyxl: openpyxl > version 2.4.0 for writing .xlsx files (xlrd >= 0.9.0)
XlsxWriter: Alternative Excel writer
I can't install any external modules. Is there any way to create an .xlsx file with just a pandas installation?
Edit: My question is - is there any built-in pandas functionality to create Excel workbooks, or is one of these optional dependencies required to create any Excel workbook at all?
I thought that openpyxl was part of a pandas install, but turns out I had XlsxWriter installed.
The pandas codebase does not duplicate Excel reading or writing functionality provided by the external libraries you listed.
Unlike the csv format, which Python itself provides native support for, if you don't have any of those libraries installed, you cannot read or write Excel spreadsheets.

how to execute vba macro outside of excel

I have an excel spreadsheet with a massive amount of VBA and macros that include a button.
How do I execute VBA code in Excel (specifically to clicking the button and trigger its onclick event) outside excel from python (for example)?
Note: I'm willing to take answers in different languages like C++, C#, or Java; but, by far I would prefer them in python since it'll more smoothly connect with the remainder of my python applications.
Note 2: I may need to manipulate the excel spreadsheet with python using one of the python excel libraries available
Version Numbers:
Microsoft Excel Office 365 Version 1708 Build 8431.2079
python 2.7
You can use the win32com library to interact with COM Objects through Python. You can to install the win32com library with pip. That's how I did it at least. Essentially, you are not clicking the button on your worksheet, but instead calling the subroutine embedded in your VBA code to run on your worksheet.
Also, I used Python 3.6 for this, but I believe Python 2.7 should be compatible. Here is a basic outline.
Install Win32Com
Open up and command prompt and type:
pip install pypiwin32
Code
import win32com.client as win32
excel = win32.Dispatch("Excel.Application") # create an instance of Excel
#excel.Visible = 1 #Uncomment this to show Excel working through the code
book = excel.Workbooks.Open(Filename=r'C:\YourBookHere.xlsm')
excel.Run("YourBookHere.xlsm!Sheet1.MacroName") # This runs the macro that is on Sheet1
book.Save()
book.Close()
excel.Quit()
Hope this helps, this is my first post.
Edit: Cleaned up things a bit
To expand upon dylan's excellent answer: you can also accomplish the same using the python xlwings package. In essence xlwings is a fancy wrapper around pywin32 that allows control of an excel application with even simpler syntax. The following python code should do the same as the code from dylan's answer:
import xlwings as xw
# Connect to existing workbook containing VBA macro
wb = xw.Book(r'C:\YourBookHere.xlsm')
# Run the VBA macro named 'MacroName'
wb.macro('MacroName')()
In addition, you can also use Windows Task Scheduler to fire the event. If a button-click-event fires the code, you can do exactly the same thing, by following the instructions in the link below.
https://www.sevenforums.com/tutorials/11949-elevated-program-shortcut-without-uac-prompt-create.html
Also, you will need a Workbook-open-event, so, something like this...
Private Sub Workbook_Open()
Msgbox "Welcome to ANALYSIS TABS"
End Sub

Which library to import in Python to read data from an Excel file, for automation testing using Selenium?

Which library to import in Python to read data from an Excel file, I want to store different xpaths in Excel file for automation testing using Selenium?
You may use XlsxWriter. It is a Python module for writing files in Excel.
xlutils is also very useful collection of utilities for automating excel sheet operations.
https://xlsxwriter.readthedocs.io/
The xlrd library is what you are looking for to read excel files. And to write, you can use xlwt.

Categories

Resources