Pandas Pivot table and Excel's style cells

Pandas Pivot table and Excel's style cells - python

I get data measurements from instruments. These measurements depend on several parameters, and a pivot table is a good solution to represent the data. Every measurement can be associated to a scope screenshoot to be more explicit. I get all the data in the following csv format :
The number of measurements and parameters can change.
I am trying to write a Python script (for now with Pandas lib) which allows me to create a pivot table in Excel. With Pandas, I can color the data in and out of a defined range. However, I would like also to to create a link on every cell who can send me to the corresponding screenshot. But I am stuck here.
I would like a result like the following (but with the link to the corresponding screenshot) :
Actually, I found out a way to add the link thanks to the =HYPERLINK() Excel function to all the cells with the apply() Pandas function.
However, I cannot apply a conditional formatting thanks to xlsxWriter anymore because the cells don't have a numerical content anymore
I can apply the conditional formatting first and then iterate through the whole sheet to add a link, but it will be a total mess to retrieve the relation between the data and the different parameters measurement
I would like your help to find ideas and efficient ways to do what I would like

xlsxwriter has a function called write_url ,but first while creating new worksheet you must apply write_url and then use openyxl to insert your pandas data frame
1)create worksheet and insert write_url
2)use openyxl to write data into already formatted cells.

Related

Fill out Excel Template with Python

I'm having trouble finding a solution to fill out an excel template using python. I currently have a pandas dataframe where I use openpyxl to write the necessary data to specific Rows and Cells in a for loop. The issue I have is that in my next project several of the cells I have to write are not continuous so for example instead of going A1,A2,A3 it can go A1,A5,A9. However this time if I were to list the cells like I did in the past it would be impractical.
So I was looking for something that would work similar to a Vlookup in excel. Where in the template we have Python would match the necessary Row and Column to drop the information. I know I might need to use different commands.
I added a picture below as an example. So I would need to drop values in the empty cells and ideally Python would read "USA and Revenue" and know to drop that information on cell B2. I know I might need something to map it also I am just not sure on how to start or if it is even possible.
enter image description here

Access cell value and not formula of Excel cell using Pandas

I am using Pandas for excel manipulation.
I am creating hyperlink which will take me from one cell to another cell.
My cell after creating hyperlink contains data like:
=HYPERLINK("#Sheet1!A20", "Dog")
=HYPERLINK("#Sheet1!B20", "Cat")
After creation of the hyperlink, I need to compare the value of the cells.
Here, for example I want to check if Dog is equal to Cat.
But I am not able to access the value of the cell (Dog, Cat)
Is there a way I can access the value of the cell for comparison using Pandas?

No, you cannot do what you are wanting to do.
Any calculations must be carried out in Python and not attempted through the Excel worksheet.
This is because behind the scenes Pandas is using the Openpyxl library to manipulate any Excel content, and neither Openpyxl nor Pandas has a copy of the Excel formula resolution engine to generate the output of any Excel formulas.
As commented in the documentation: "data_only controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet." [Emphasis mine]
If the file has never been opened in Excel, no calculated value will be available, and only the formula can be accessed.
Essentially this question is a duplicate of this: Read Excel cell value and not the formula computing it -openpyxl
If you goal is to compare the value of two fields ("A20" and "B20") and you don't actually care about the formula, you were just following "Excel-think", then that is a very different question and you need to take a different approach; what you would do is compare your original source data, whether that is a Pandas dataframe, Excel cells etc.

Strategy for creating pivot tables that collapse with large data sets

I'm new to the community and I only recently started to use Python and more specifically Pandas.
The data set I have I would like the columns to be the date. For each Date I would like to have a customer list that then breaks down to more specific row elements. Everything would be rolled up by an order number, so a distinct count on an order number because sometimes a client purchases more than 1 item. In excel I create a pivot table and process it by distinct order. Then I sort each row element by the distinct count of the order number. I collapse each row down until I just have the client name. If I click to expand the cell then I see each row element.
So my question: If I'm pulling in these huge data sets as a dataframe can I pull in xlsx in as an array? I know it will strip the values, so I would have to set the datetime as a datetime64 element. I've been trying to reshape the array around the date being column, and the rows I want but so far I haven't had luck. I have tried to use pivot_table and groupby with some success but I wasn't able to move the date to the column.
Summary: Overall what I'm looking to know is am I going down the wrong rabbit hole together? I'm looking to basically create a collapsible pivot table with specific color parameters for the table as well so that the current spreadsheet will look identical to the one I'm automating.
I really appreciate any help, as I said I'm brand new to Pandas so direction is key. If I know I'm onto the "best" way of dealing with the export to excel after I've imported and modified the spreadsheet. I get a single sheet of raw data kicked out in .xlsx form. Thanks again!

Use python to parse only cells formatted as input from Excel file

I have got a spreadsheet, in which some cells are marked as Input cells. I would like to extract only those cells into a Python variable using, for example, the excel_read() function from pandas.
Is this possible at all?

Sure, if you know beforehand where they are you can specify which columns to use by invoking the parse_cols parameter. But it doesn't look like by reading through the pandas.read_excel function docs that you can programmatically select certain cells within the function call.
However, you could always read in everything and then discard what you don't need based on how Input cells are represented in the DataFrame. Without an example it would hard to guess how to do this currently, but pandas is good for this type of data cleaning.

Change columns when writing a fixed amount of data per column using openpyxl

I am writing a program that will process a bunch of data and fill a column in excel. I am using openpyxl, and strictly using write_only mode as well. Each column will have a fixed 75 cell size, and each cell in the row will have the same formula applied to it. However, I can only process the data one column at a time, I cannot process an entire row, then iterate through all of the rows.
How can I write to a column, then move onto the next column once I have filled the previous one?

This is a rather open ended question, but may I suggest using Pandas. Without some kind of example of what you are trying to achieve it's difficult to make a great recommendation, but I have used pandas in the past a ton for automating processing of excel files. Basically you would just load whatever data into a Pandas DataFrame, then do your transformations/calculations and whenever you are done write it back to either the same or a new excel file (or a number of other formats).

Because the OOXML file format is row-oriented, you must write in rows in write-only mode, it is simply not possible otherwise.
What you might be able to do is to create some kind transitional object that will allow to fill it with columns and then use this to write to openpyxl. A Pandas DataFrame would probably be suitable for this and openpyxl supports converting these into rows.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.