Using python I need to be able to do the following operations to a workbook for excel 2007:
delete rows
sorting a worksheet
getting distinct values from a column
I am looking into openpyxl; however, it seems to have limited capabilities.
Can anyone please recommend a library that can do the above tasks?
I want to preface this with letting you know this is only a windows based solution. But if you are using Windows I would recommend using Win32Com which can be found here. This module gives Python programmatic access to any Microsoft Office Application (including Excel) and uses many of the same methods used in VBA. Usually what you will do is record a macro (or recall from memory) how to do something in VBA and then use the same functions in Python
To start we want to connect to Excel and get access to the first sheet as an example
#First we need to access the module that lets us connect to Excel
import win32com.client
# Next we want to create a variable that represents Excel
app = win32com.client.Dispatch("Excel.Application")
# Lastly we will assume that the workbook is active and get the first sheet
wbk = app.ActiveWorkbook
sheet = wbk.Sheets(1)
At this point we have a variable named sheet that represents the excel work sheet we will be working with. Of course there are multiple ways to access the sheet, this is usually the way I demo how to use win32com with excel because it is very intuitive.
Now assume I have the following values on the first sheet and I will go over one by one how to answer what you were asking:
A
1 "d"
2 "c"
3 "b"
4 "a"
5 "c"
Delete Rows:
Lets assume that you want to delete the first row in your active sheet.
sheet.Rows(1).Delete()
This creates:
A
1 "c"
2 "b"
3 "a"
4 "c"
Next Lets sort the cells in ascending order (although I would recommend extracting the values to python and doing the sorting within a list and sending the values back)
rang = sheet.Range("A1","A4")
sheet.Sort.SetRange(rang)
sheet.Sort.Apply()
This creates:
A
1 "a"
2 "b"
3 "c"
4 "c"
And now we will get distinct values from the column. The main thing to take away here is how to extract values from a cells. You can either select a lot of cells at once and with sheet.Range("A1","A4") or you can access the values by iterating over cell by cell with sheet.Cells(row,col). Range is orders of magnitude faster, but Cells is slightly easier for debugging.
#Get a list of all Values using Range
valLstRange = [val[0] for val in sheet.Range("A1","A4").Value]
#Get a list of all Values using Cells
valLstCells = [sheet.Cells(row,1).Value for row in range(1,4)]
#valLstCells and valLstRange both = ["a","b","c","c"]
Now lastly you wanted to save the workbook and you can do this with the following:
wbk.SaveAs("C:/savedWorkbook.xlsx")
And you are done!
INFO About COM
If you have worked with VBA, .NET, VBscript or any other language to work with Excel many of these Excel methods will look the same. That is because they are all using the same library provided by Microsoft. This library uses COM, which is Microsoft's way of providing API's to programmers that are language agnostic. COM itself is an older technology and can be tricky to debug. If you want more information on Python and COM I highly recommend Python Programming on Win32 by Mark Hammond. He is the guy that gets a shoutout after you install Python on Windows in the official .msi installer.
ALTERNATIVES TO WIN32COM
I also need to point out there are several fantastic open source alternatives that can be faster than COM in most situations and work on any OS (Mac, Linux, Windows, etc.). These tools all parse the zipped files that comprise a .xlsx. If you did not know that a .xlsx file is a .zip, just change the extension to .zip and you can then explore the contents (kind of interesting to do at least once in your career). Of these I recommend Openpyxl which I have used for parsing and creating Excel files on a server where performance was critical. Never use win32com for server activities as it opens an out-of-process instance of excel.exe for each instance that can be leaky
RECOMMENDATION
I would recommend win32com for users who are working intimately with individual data sets (analysts, financial services, researchers, accountants, business operations, etc.) that are performing data discovery activities as it works great with open workbooks. However, developers or users that need to perform very large tasks with a small footprint or extremely large manipulations or processing in parallel must use a package such as openpyxl.
Related
Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!
You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.
I am trying to parse through an Excel sheet that has columns for the website name (column A), the number of visitors (F), a contact at that website's first name (B), one for last name (C), for email (E), and date it was last modified (L).
I want to write a python script that goes through the sheet and looks at sites that have been modified in the last 3 months and prints out the name of the website and an email.
It is pretty straightforward to do this. I think a little bit of googling can help you a lot. But in short, you need to use a library called Pandas which is a really powerful tool for handling spreadsheets, datasets, and table-based files.
Pandas documentation is very well written. You can use the tutorials provided within the documentation to work your way through the problem easily. However, I'll give you a brief overview of what you should do.
First open the spreadsheet (excel file) inside python using Pandas and load it into a data frame (read the docs and you'll understand).
Second Using one of the methods provided by pandas called where (actually there are a couple of methods) you can easily set a condition (like if date is older than some data) and get the masked data frame (which represents your spreadsheet) back from the method.
I've been asked to create a Python script to automate a server deployment for 80 retail stores.
As part of this script, I have a secondary script that I call to change multiple values in 9 XML files, however, the values are unique for each store, so this script needs to be changed each time, but after I am gone, this is going to be done by semi / non-technical people, so we don't want them to change the Python scripts directly for fear of breaking them.
This in mind, I would like to have these people input the store details into an XLS sheet, and a python file read this sheet and put the data it finds into the existing python script with the data to be changed.
The file will be 2 columns, with the required data in the 2nd one.
I'm sorry if this is a long explanation, but that is the gist of it. I'm using python 2.6. Does anyone have a clue about how I can do this? Or which language might be better for this. I also know Bash and Javascript.
Thanks in advance
Depending on the complexity and the volume of your data
for small Openpyxl,
for large pandas
I created a little script in python to generate an excel compatible xml file (saved with xls extension). The file is generated from a part database so I can place an order with the extracted data.
On the website for ordering the parts, you can import the excel file so the order fills automatically. The problem here is that each time I want to make an order, I have to open excel and save the file with xls extension of type MS Excel 97-2003 to get the import working.
The excel document then looks exactly the same, but when opened with notepad, we cannot see the xml anymore, only binary dump.
Is there a way to automate this process, by running a bat file or maybe adding some line to my python script so it is converted in the proper format?
(I know that question has been asked before, but it never has been answered)
There are two basic approaches to this.
You asked about the first: Automating Excel to open and save the file. There are in fact two ways to do that. The second is to use Python tools that can create the file directly in Python without Excel's help. So:
1a: Automating Excel through its automation interface.
Excel is designed to be controlled by external apps, through COM automation. Python has a great COM-automation interface inside of pywin32. Unfortunately, the documentation on pywin32 is not that great, and all of the documentation on Excel's COM automation interface is written for JScript, VB, .NET, or raw COM in C. Fortunately, there are a number of questions on this site about using win32com to drive Excel, such as this one, so you can probably figure it out yourself. It would look something like this:
import win32com.client
excel = win32com.client.Dispatch('Excel.Application')
spreadsheet = excel.Workbooks.Open('C:/path/to/spreadsheet.xml')
spreadsheet.SaveAs('C:/path/to/spreadsheet.xls', fileformat=excel.xlExcel8)
That isn't tested in any way, because I don't have a Windows box with Excel handy. And I vaguely remember having problems getting access to the fileformat names from win32com and just punting and looking up the equivalent numbers (a quick google for "fileformat xlExcel8" shows that the numerical equivalent is 56, and confirms that's the right format for 97-2003 binary xls).
Of course if you don't need to do it in Python, MSDN is full of great examples in JScript, VBA, etc.
The documentation you need is all on MSDN (since the Office Developer Network for Excel was merged into MSDN, and then apparently became a 404 page). The top-level page for Excel is Welcome to the Excel 2013 developer reference (if you want a different version, click on "Office client development" in the navigation thingy above and pick a different version), and what you mostly care about is the Object model reference. You can also find the same documentation (often links to the exact same webpages) in Excel's built-in help. For example, that's where you find out that the Application object has a Workbooks property, which is a Workbooks object, which has Open and Add methods that return a Workbook object, which has a SaveAs method, which takes an optional FileFormat parameter of type XlFileFormat, which has a value xlExcel8 = 56.
As I implied earlier, you may not be able to access enumeration values like xlExcel8 for some reason which I no longer remember, but you can look the value up on MSDN (or just Google it) and put the number 56 instead.
The other documentation (both here and elsewhere within MSDN) is usually either stuff you can guess yourself, or stuff that isn't relevant from win32com. Unfortunately, the already-sparse win32com documentation expects you to have read that documentation—but fortunately, the examples are enough to muddle your way through almost everything but the object model.
1b: Automating Excel via its GUI.
Automating a GUI on Windows is a huge pain, but there are a number of tools that make it a whole lot easier, such as pywinauto. You may be able to just use swapy to write the pywinauto script for you.
If you don't need to do it in Python, separate scripting systems like AutoIt have an even larger user base and even more examples to make your life easier.
2: Doing it all in Python.
xlutils, part of python-excel, may be able to do what you want, without touching Excel at all.
I have a excel file with following fields
Software_name , Version and Count.
This file is a inventory of all softwares installed in the network of an organization which was generated using LANdesk.
I have another excel file which is an purchase inventory of these softwares which is generated manually.
I need to compare these sheets and create a report stating whether the organization is compliant or not.
Hence how do i compare these two files.
there are instances like Microsoft Office is mentioned as just office and 'server' is spelt as 'svr'
How do go about with it?
The first step as SeyZ mentions is to determine how you want to read these Excel files. I don't have experience with the libraries he refers to. Instead I use COM programming to read and write Excel files, which of course requires that you have Excel installed. This capability comes from PyWin32 which is installed by default if you use the ActiveState Python installer, or can be installed separately if you got Python from Python.org.
The next step would be to convert things into a common format for comparing, or searching for elements from one file within the other. My first thought here would be to load the contents of the LANdesk software inventory into a database table using something quick and easy like SQLite.
Then for each item of the manual purchase list, normalize the product name and search for it within the inventory table.
Normalizing the values would be a process of splitting a name into parts and replacing partial words and phrases with their full versions. For example, you could create a lookup table of conversions:
partial full
-------------------- --------------------
svr server
srv server
SRV Stevie Ray Vaughan
office Microsoft Office
etc et cetera
You would want to run your manual list data through the normalizing process and add partial values and their full version to this table until it handles all of the cases that you need. Then run the comparison. Here is some Pythonish pseudocode:
for each row of manual inventory excel worksheet:
product = sh.Cells(row, 1) # get contents of row n, column 1
# adjust based on the structure of this sheet
parts = product.split(" ") # split on spaces into a list
for n, part in enumerate(parts):
parts[n] = Normalize(part) # look up part in conversion table
normalProduct = " ".join(parts)
if LookupProduct(normalProduct): # look up normalized name in LANdesk list
add to compliant list
else:
add to non-compliant list
if len(non-compliant list) > 0:
TimeForShopping(non-compliant list)
If you have experience with using SQLite or any other database with Python, then creating the LANdesk product table, and the normalize and lookup routines should be fairly straightforward, but if not then more pseudocode and examples would be in order. Let me know if you need those.
There are several libraries to manipulate .xls files.
XLRD allows you to extract data from Excel spreadsheet files. So you can compare two files easily. (read)
XLWT allows you to create some Excel files. (write)
XLUtils requires both of the xlrd and xlwt packages. So, you can read & write easily thanks to this library.