I created an Excel spreadsheet using Pandas and xlsxwriter, which has all the data in the right rows and columns. However, the formatting in xlsxwriter is pretty basic, so I want to solve this problem by writing my Pandas spreadsheet on top of a template spreadsheet with Pyxl.
First, however, I need to get Pyxl to only import data up to the first blank row, and to get rid of the column headings. This way I could write my Excel data from the xlsxwriter output to the template.
I have no clue how to go about this and can't find it here or in the docs. Any ideas?
How about if I want to read data from the first column after the first blank column? (I can think of a workaround for this, but it would help if I knew how)
To be honest I'd be tempted to suggest you use openpyxl all the way if there is something that xlsxwriter doesn't do, though I think that it's formatting options are pretty extensive. The most recent version of openpyxl is as fast as xlsxwriter if lxml is installed.
However, it's worth noting that Pandas has tended to ship with an older version of openpyxl because we changed the style API.
Otherwise you can use max_row to get the highest row but this won't check for an empty row.
Related
I've switched over to Jupyter Lab recently, and discovered that pd.read_excel() now requires "engine = 'openpyxl' " in its arguments to avoid a known error in defaulting to xlrd. Unfortunately, openpyxl as an engine is introducing issues that none of my previous code accounted for.
In particular, it appears to append rows of NaN values to the end of dataframes when I import an xlsx file. I'm aware of the issue where blank rows at the start of an Excel sheet get pushed to the end of the import, and that's not the case here. I have an Excel file with multiple tabs, 16 unique column headers in the first row of each tab (and identical between tabs), and every row filled with data. Previously, in Jupyter Notebook (and without engine='openpyxl') read.excel() with sheet_name=None would create a dictionary of dataframes from each tab, reading no additional rows beyond the end of the data. Now, I get upwards of one thousand blank rows at the end of some of the dataframes.
I'm not looking forward to going through all of my old code and adding in dropna(how='all) to every import, and afraid that this might be indicative of a larger issue I'm not catching. Has anyone experienced something similar? Below are the import in Jupyter Lab of one of the tabs in question as an example, and the Excel sheet for the tab itself, with no data beyond row 5226.
Thanks for the help!
It was the case that with version 1.1.4 of pandas that you needed engine="openpyxl" but with 1.2.4 of pandas you will not need the openpyxl parameter.
So an upgrade might be worth while, but not sure if it will fix your issue.
pip install pandas --upgrade
to check what versions you have installed
import pandas as pd
import openpyxl
print(pd.__version__)
print(openpyxl.__version__)
Assuming I have an excel sheet already open, make some changes in the file and use pd.read_excel to create a dataframe based on that sheet, I understand that the dataframe will only reflect the data in the last saved version of the excel file. I would have to save the sheet first in order for pandas dataframe to take into account the change.
Is there anyway for pandas or other python packages to read an opened excel file and be able to refresh its data real time (without saving or closing the file)?
Have you tried using mitosheet package? It doesn't answer your question directly, but it allows you working on pandas dataframes as you would do in excel sheets. In this way, you may edit the data on the fly as in excel and still get a pandas dataframe as a result (meanwhile generating the code to perform the same operations with python). Does this help?
There is no way to do this. The table is not saved to disk, so pandas can not read it from disk.
Be careful not to over-engineer, that being said:
Depending on your use case, if this is really needed, I could theoretically imagine a Robotic Process Automation like e.g. BluePrism, UiPath or PowerAutomate loading live data from Excel into a Python environment with a pandas DataFrame continuously and then changing it.
This use case would have to be a really important process though, otherwise licensing RPA is not worth it here.
df = pd.read_excel("path")
In variable explorer you can see the data if you run the program in SPYDER ide
I'm working a lot with Excel xlsx files which I convert using Python 3 into Pandas dataframes, wrangle the data using Pandas and finally write the modified data into xlsx files again.
The files contain also text data which may be formatted. While most modifications (which I have done) have been pretty straight forward, I experience problems when it comes to partly formatted text within a single cell:
Example of cell content: "Medical device whith remote control and a Bluetooth module for communication"
The formatting in the example is bold and italic but may also be a color.
So, I have two questions:
Is there a way of preserving such formatting in xlsx files when importing the file into a Python environment?
Is there a way of creating/modifying such formatting using a specific python library?
So far I have been using Pandas, OpenPyxl, and XlsxWriter but have not succeeded yet. So I shall appreciate your help!
As pointed out below in a comment and the linked question OpenPyxl does not allow for this kind of formatting:
Any other ideas on how to tackle my task?
i have been recently working with openpyxl. Generally if one cell has the same style(font/color), you can get the style from cell.font: cell.font.bmeans bold andcell.font.i means italic, cell.font.color contains color object.
but if the style is different within one cell, this cannot help. only some minor indication on cell.value
I do a lot of data analysis in Excel and have been exploring Python and DataNitro to streamline my workflow. I specifically am trying to copy certain cells from one sheet in one Excel workbook, and paste them into certain cells in a certain sheet in another Excel workbook.
I have been storing ("copying") using CellRange (DataNitro), but am not sure how to copy the stored contents into a particular sheet, in another Excel workbook. Any clue how I may go about this? Also, is it possible to make the range defined for a CellRange conditional on certain cell properties?
I would really appreciate any help! Thank you, all.
Here's an example of copying:
data = CellRange("A1:A10").value
active_wkbk("Book2.xlsx")
CellRange("A1:A10").value = data
You can make the range conditional using regular Python logic (if statements, etc.).
I create new workbooks via xlsxwriter. In every of them I need to have formated header sheet, which is stored in another template workbook. I know it is impossible to do with xlsxwriter, coz I cannot open template workbook with this.
I thought to do that by xlrd, copy this sheet and then with xlsxwriter write it to created workbook.
But is it possible? To use combination of those two libraries?
I know this question is without even any code, but I'm lame with python and if you could give me any advice or something to deal with my problem I will be gratefull.
xlrd and xlswriter aren't really designed to work together. Consider switching to the pyopenxl library, which allows both reading and writing of spreadsheets and might allow you to do what you need quite easily.