Editing excel spreadsheets with python

Editing excel spreadsheets with python - python

I'll start off by saying that I'm new to python. I'm trying to create an application that is a simple Q+A and will export the answers to specific cells of an excel. I have an existing spreadsheet that i would like to modify and save as a separate outfile leaving the original untouched. I've seen various ways that i can append the file but will overwrite the original.
As an example, i would like this code;
hq = input('Headquarters: ')
to put the response in cell S1
Am I way off base trying to use Python for this task? Any Help would be greatly appreciated!
-Paul

There may not be very straightforward solutions but there are a couple of tools which might help you.
The first one is openpyxl: https://openpyxl.readthedocs.org/en/2.0.2/# If you have xlsx files, you should be able to modify them with this.
You might also be able to do what you want to do by using xlutils module: http://pythonhosted.org/xlutils/index.html However, then you'll need to first read the file, then edit it, and then save it to another file. Formatting may be lost, etc.
This is heavily YMMV due to the not-so-well defined file format, but I'd start with openpyxl.

Related

How do I make my program independent of an excel sheet it currently relies on?

I'm designing a program in python tkinter that displays information that's currently in an excel spreadsheet. Ideally, I'd like to be able to share this program without needing to share the excel book as well. Is it possible to import the excel book into the program to make the program independent of that excel file? Let me know if I can provide any more clarification. Thank you!

You would need to assign the Excel file's content to a variable in your program to do so. I hope it isn't very large, but to make it easier I recommend:
saving your Excel file to .csv format
read it from Pytho
convert it to the data type you want (ex: string, list, ...)
save it back to a .txt file.
Now just copy the content of your .txt and assign that to a variable somewhere in your code.
Edit: what Alphy13 said is also a solution

Typically its best to create an example workbook that you can share. Give it all of the right parameters and leave out the sensitive data, or fill it in with fake data.
You could also set all of the variables that come from the excel file to have a default value that only changes when the workbook is present. This can be the first step toward creating proper error handling for your program.

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!

You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

Any way to save format when importing an excel file in Python?

I'm doing some work on the data in an excel sheet using python pandas. When I write and save the data it seems that pandas only saves and cares about the raw data on the import. Meaning a lot of stuff I really want to keep such as cell colouring, font size, borders, etc get lost. Does anyone know of a way to make pandas save such things?
From what I've read so far it doesn't appear to be possible. The best solution I've found so far is to use the xlsxwriter to format the file in my code before exporting. This seems like a very tedious task that will involve a lot of testing to figure out how to achieve the various formats and aesthetic changes I need. I haven't found anything but would said writer happen to in any way be able to save the sheet format upon import?
Alternatively, what would you suggest I do to solve the problem that I have described?

Separate data from formatting. Have a sheet that contains only the data – that's the one you will be reading/writing to – and another that has formatting and reads the data from the first sheet.

Modifying and creating xlsx files with Python, specifically formatting single words of a e.g. sentence in a cell

I'm working a lot with Excel xlsx files which I convert using Python 3 into Pandas dataframes, wrangle the data using Pandas and finally write the modified data into xlsx files again.
The files contain also text data which may be formatted. While most modifications (which I have done) have been pretty straight forward, I experience problems when it comes to partly formatted text within a single cell:
Example of cell content: "Medical device whith remote control and a Bluetooth module for communication"
The formatting in the example is bold and italic but may also be a color.
So, I have two questions:
Is there a way of preserving such formatting in xlsx files when importing the file into a Python environment?
Is there a way of creating/modifying such formatting using a specific python library?
So far I have been using Pandas, OpenPyxl, and XlsxWriter but have not succeeded yet. So I shall appreciate your help!
As pointed out below in a comment and the linked question OpenPyxl does not allow for this kind of formatting:
Any other ideas on how to tackle my task?

i have been recently working with openpyxl. Generally if one cell has the same style(font/color), you can get the style from cell.font: cell.font.bmeans bold andcell.font.i means italic, cell.font.color contains color object.
but if the style is different within one cell, this cannot help. only some minor indication on cell.value

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.