embed additional second dataframe into plot - python

I want my plot to retrieve data from one dataframe, but hovering over the data i want it to incorperate data from both data frames.
example:
which results from
fig = px.scatter(X_reduced_df, x='EXTRACTION_DATE_SAMPLE', y='score_IF', color= 'anomaly_class_IF', hover_data=['score_IF','delta',
'omicron',
'NH4-N (titr.)(W)_Zusatzparameter-ESRI',
'P ges. filtr. (photom.)(W) mA_Zusatzparameter-ESRI',
'BSB5 (mit Verd) DIN 1899_Zusatzparameter-ESRI',
'N org. (Ber. TKN)(W)_Zusatzparameter-ESRI'
], #range_x=['2015-12-01', '2016-01-15'],
title="Default Display with Gaps")
fig.show()
Here i want the value "delta" to be associated with additional info on "delta" from another dataframe, i.e. i want "delta= 0, add info" where add info is a list or a dataframecolumn, or similar.(its a list of names associated with a double, like as:
column name: delta
column entries
gamma: 1.2
alpha: 1.3
.
.
.
)
basically its a correlation matrix. and i want the correlations associated with each entry to be displayed.
The second dataframe, the correlation matrix is regrettably not the same columns as the original dataframe, hence not joinable. i want column names to be associated with the add info. i thought about categories, but i cannot see how that could help for a compact add info.
also i do not want to meddle with the column names(like in forcing a rename with the add info).
the plotly library only allows for one dataframe as input, right? how can i add my add info, the i way i described?

Related

Dataframe value.counts() to barplot

I have a dataframe with multiple columns such as product name, reviews, origin, and etc.
Here, I want to create a barplot with only the data from "Origin" column.
To do this, I used the code:
origin = df['Origin'].value_counts()
With this, I was able to get a list of countries with corresponding frequencies (or counts). Now, I want to create a boxplot with each country on X-axis and counted frequencies on the Y-axis. Although the column for frequencies have a column label, I am unable to set the X-axis as the countries are merely saved as index. Would there be a better way to count the column "Origin" and make it into a barplot?
Thanks in advance.

How to extract values based on column header in excel?

I have an excel file containing values, I needed values as the highlighted one in single column and deleting the rest on. But due to mismatch in rows and column header file, I am not able to extract. Once you will see the excel will able to understand what values I needed.As this is just a sample of mine data.
Column A2:A17 date is continuous but few date are repeating, but in Row (D1:K1) date are not repeating, so in this case value of same date occurring just below of of one other.
How to get values in one column?
Is there a way to highlight the values of same date occurring in row and column? The sample data consist of manually highlighted. I have huge dataset that cannot be manually highlighted.
Because from colour code also I can get the required values too.
Following is the file I am attaching here
https://docs.google.com/spreadsheets/d/1-xBMKRP1_toA_Ky8mKxCKAFi4uQ8YWJq/edit?usp=sharing&ouid=110042758694954349181&rtpof=true&sd=true
Please visit the link and help me to find the solution.
Thank you
I'm not clear what those values in columns D to K are.
If only the shaded ones matter and they can be derived from the Latitude and Longitude for each row separately:
Insert a column titled "Row", say in A, and populate it 1,2,3...
I think you also want a column E which is whatever the calculation you currently have in D-K. Is this "Distance"?
Then create a Pivot Table on rows A to E and you can do anything you are likely to need: https://support.microsoft.com/en-us/office/create-a-pivottable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576
Dates at Colum Labels, Row numbers as Row Labels, and Sum of "Distance" as Values.

How can i divide dataframe into multiple dataframes based on same values in columns

In the above image, I colored some rows with the same colors, so I want to create new data frames with the values of the same color. as you can see, the values of the same color are the same as an example - 0.8 multiple and CE option type values rows, in this entire data frame these same values come in 2 times, so I want to create these 3 rows new data frame, and like the same i want to do for all rows.
Below is some code that may help you.
df_dictionary = dict(tuple(your_dataframe.groupby('columns_to_groupby')))
This will produce a dictionary whose keys are the grouped values (in your case, "CE, PE, etc...") and whose values are the dataframes split by the grouping specified. Hope this helps.

Plotting only selected rows in python

I have a data frame called "df" with column = "date", "regions", "transactions". I want to plot the data frame in such a way so I can see transactions for only "selected regions" and not all the regions in my df.
For example- I want to see a plot with transactions for Regions = "a","X","z" only - all in the same graph - and "date" being my x-axis.
So far, I have been able to plot transactions data for all the regions in one graph but not able to slice my data for the regions that I want.
Can someone please help?
you can use df.loc to access only a group of rows or columns. Read below https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
In your case, something like this would return the df with just the required regions
required_regions = ['a','X','z']
df.loc[df['regions'].isin(required_regions)]

How to calculate based on multiple conditions using Python data frames?

I have excel data file with thousands of rows and columns.
I am using python and have started using pandas dataframes to analyze data.
What I want to do in column D is to calculate annual change for values in column C for each year for each ID.
I can use excel to do this – if the org ID is same are that in the prior row, calculate annual change (leaving the cells highlighted in blue because that’s the first period for that particular ID). I don’t know how to do this using python. Can anyone help?
Assuming the dataframe is already sorted
df.groupby(‘ID’).Cash.pct_change()
However, you can speed things up with the assumption things are sorted. Because it’s not necessary to group in order to calculate percentage change from one row to next
df.Cash.pct_change().mask(
df.ID != df.ID.shift()
)
These should produce the column values you are looking for. In order to add the column, you’ll need to assign to a column or create a new dataframe with the new column
df[‘AnnChange’] = df.groupby(‘ID’).Cash.pct_change()

Categories

Resources