I have a pivot table in excel that I want to read the raw data from that table into python. Is it possible to do this? I do not see anything in the documentation on it or on Stack Overflow.
If the community could be provided some examples on how to read the raw data that drives pivot tables, this could greatly assist in routine analytical tasks.
EDIT:
In this scenario there are no raw data tabs. I want to know how to ping the pivot table get the raw data and read it into python.
First, recreate raw data from the pivot table. The pivot table has full information to rebuild the raw data.
Make sure that none of the items in the pivot table fields are hidden -- clear all the filters and Slicers that have been applied.
The pivot table does not need to contain all the fields -- just make sure that there is at least one field in the Values area.
Show the grand totals for rows and columns. If the totals aren't visible, select a cell in the pivot table, and on the Ribbon, under PivotTable Tools, click the Analyze tab. In the Layout group, click Grand totals, then click On for Rows and Columns.
Double-click the grand total cell at the bottom right of the pivot table. This should create a new sheet with the related records from the original source data.
Then, you could read the raw data from the source.
Related
I am trying to extract a table like this into a Dataframe. How to do that (and extract even the names splitted on several lines) with Python?
Also, I want this to be general and to be applied on each table (even if it doesn't this structure), so giving the coordinates for each separate and different table won't work that well.
I don't know about your exact problem but if you want to extract data or tables from PDF then try the camelot-py library, it is easy and gives almost more than 90% accuracy.
I am also working on the same project.
import camelot
tables = camelot.read_pdf(PDF_file_Path, flavor='stream', pages='1', table_areas=['5,530,620,180'])
tables[0].parsing_report
df = tables[0].df
The parameters of camelot.read_pdf are:
PDF_File the give file path;
table_areas is optional if you get an exact table then provide a location otherwise it can get whole data & all tables;
pages number of pages.
.parsing_report show the result description, e.g., accuracy and whitespace.
.df can show the table as a data frame. Index 0 refer to the 1st table. It depends on your data.
You can read more about them in the camelot documentation.
I get data measurements from instruments. These measurements depend on several parameters, and a pivot table is a good solution to represent the data. Every measurement can be associated to a scope screenshoot to be more explicit. I get all the data in the following csv format :
The number of measurements and parameters can change.
I am trying to write a Python script (for now with Pandas lib) which allows me to create a pivot table in Excel. With Pandas, I can color the data in and out of a defined range. However, I would like also to to create a link on every cell who can send me to the corresponding screenshot. But I am stuck here.
I would like a result like the following (but with the link to the corresponding screenshot) :
Actually, I found out a way to add the link thanks to the =HYPERLINK() Excel function to all the cells with the apply() Pandas function.
However, I cannot apply a conditional formatting thanks to xlsxWriter anymore because the cells don't have a numerical content anymore
I can apply the conditional formatting first and then iterate through the whole sheet to add a link, but it will be a total mess to retrieve the relation between the data and the different parameters measurement
I would like your help to find ideas and efficient ways to do what I would like
xlsxwriter has a function called write_url ,but first while creating new worksheet you must apply write_url and then use openyxl to insert your pandas data frame
1)create worksheet and insert write_url
2)use openyxl to write data into already formatted cells.
I'm new to the community and I only recently started to use Python and more specifically Pandas.
The data set I have I would like the columns to be the date. For each Date I would like to have a customer list that then breaks down to more specific row elements. Everything would be rolled up by an order number, so a distinct count on an order number because sometimes a client purchases more than 1 item. In excel I create a pivot table and process it by distinct order. Then I sort each row element by the distinct count of the order number. I collapse each row down until I just have the client name. If I click to expand the cell then I see each row element.
So my question: If I'm pulling in these huge data sets as a dataframe can I pull in xlsx in as an array? I know it will strip the values, so I would have to set the datetime as a datetime64 element. I've been trying to reshape the array around the date being column, and the rows I want but so far I haven't had luck. I have tried to use pivot_table and groupby with some success but I wasn't able to move the date to the column.
Summary: Overall what I'm looking to know is am I going down the wrong rabbit hole together? I'm looking to basically create a collapsible pivot table with specific color parameters for the table as well so that the current spreadsheet will look identical to the one I'm automating.
I really appreciate any help, as I said I'm brand new to Pandas so direction is key. If I know I'm onto the "best" way of dealing with the export to excel after I've imported and modified the spreadsheet. I get a single sheet of raw data kicked out in .xlsx form. Thanks again!
I'm scraping one site.
And there are several tables that represent attributes of one observation.
I wonder if it is useful to put images in this post because It's Korean alphabet.
I insert explanation image.
There are many tables. I will reshape those table into one table, which will be one record and many fields.
But I got a problem.
A few tables have variable numbers of columns.
I'd like to store those data in sql.
From what I know, sql table has fixed numbers of fields.
Do you have a solution what I have to search??
Here is the link. http://goodauction.land.naver.com/auction/ca_view.php?product_id=1698750&class1=5&ju_price1=&ju_price2=&bi_price1=&bi_price2=&num1=&num2=&lawsup=0&lesson=0&next_biddate1=&next_biddate2=&state=91&b_count1=0&b_count2=0&b_area1=&b_area2=&special=0&e_area1=&e_area2=&si=11&gu=0&dong=0&apt_no=0&order=&start=0&total_record_val=&detail_search=&detail_class=1&recieveCode=
Those variables in table in this link indicate the winning bid, number of floors in apartment, size of the area, use of floor, and so on
And Do you recommend some sites to me in which I learn to scrape the table consisting of cells spanning multiple rows and columns using python.
If you have a table appartment, you need a table floor related to appartment
I have an old excel spreadsheet with a lot of data in a relational database type format, with one main primary key that I need to go through.
I want to compare some rows but there are many entries (thousands of rows, dozens of columns) and Excel doesn't really have built-in features to do this. After looking around I found out the best way to extract the data is using a script with Python, but I have no programming skills in python or any language for the matter. I need to look for duplicates in the key column and then check if there are duplicates rows in that same column and if so merge them in a new row and then a new excel file/sheet separating the merged rows from the non-merged rows.
I don't know if this sounds too complicated or not and I am new here so I did do some research scouring the internet to see if I can find any scripts to do it but no luck really... Here are the closest posts I found that may have something to do with what I want but what I found usually is about people wanting to merge 2 different excel files together:
http://pbpython.com/excel-file-combine.html
Looking to merge two Excel files by ID into one Excel file using Python 2.7
(I have more links but could only post two.)
Basically i'm looking for duplicate rows and want to merge them together into a new file or spreadsheet in excel, separating them from the non dupes and putting it all back together.