Plotting only selected rows in python

Plotting only selected rows in python - python

I have a data frame called "df" with column = "date", "regions", "transactions". I want to plot the data frame in such a way so I can see transactions for only "selected regions" and not all the regions in my df.
For example- I want to see a plot with transactions for Regions = "a","X","z" only - all in the same graph - and "date" being my x-axis.
So far, I have been able to plot transactions data for all the regions in one graph but not able to slice my data for the regions that I want.
Can someone please help?

you can use df.loc to access only a group of rows or columns. Read below https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
In your case, something like this would return the df with just the required regions
required_regions = ['a','X','z']
df.loc[df['regions'].isin(required_regions)]

Related

embed additional second dataframe into plot

I want my plot to retrieve data from one dataframe, but hovering over the data i want it to incorperate data from both data frames.
example:
which results from
fig = px.scatter(X_reduced_df, x='EXTRACTION_DATE_SAMPLE', y='score_IF', color= 'anomaly_class_IF', hover_data=['score_IF','delta',
'omicron',
'NH4-N (titr.)(W)_Zusatzparameter-ESRI',
'P ges. filtr. (photom.)(W) mA_Zusatzparameter-ESRI',
'BSB5 (mit Verd) DIN 1899_Zusatzparameter-ESRI',
'N org. (Ber. TKN)(W)_Zusatzparameter-ESRI'
], #range_x=['2015-12-01', '2016-01-15'],
title="Default Display with Gaps")
fig.show()
Here i want the value "delta" to be associated with additional info on "delta" from another dataframe, i.e. i want "delta= 0, add info" where add info is a list or a dataframecolumn, or similar.(its a list of names associated with a double, like as:
column name: delta
column entries
gamma: 1.2
alpha: 1.3
.
.
.
)
basically its a correlation matrix. and i want the correlations associated with each entry to be displayed.
The second dataframe, the correlation matrix is regrettably not the same columns as the original dataframe, hence not joinable. i want column names to be associated with the add info. i thought about categories, but i cannot see how that could help for a compact add info.
also i do not want to meddle with the column names(like in forcing a rename with the add info).
the plotly library only allows for one dataframe as input, right? how can i add my add info, the i way i described?

Dataframe value.counts() to barplot

I have a dataframe with multiple columns such as product name, reviews, origin, and etc.
Here, I want to create a barplot with only the data from "Origin" column.
To do this, I used the code:
origin = df['Origin'].value_counts()
With this, I was able to get a list of countries with corresponding frequencies (or counts). Now, I want to create a boxplot with each country on X-axis and counted frequencies on the Y-axis. Although the column for frequencies have a column label, I am unable to set the X-axis as the countries are merely saved as index. Would there be a better way to count the column "Origin" and make it into a barplot?
Thanks in advance.

Trying to plot a graph with Pandas but only wish to display data from 1 row of a csv between two dates

I have a CSV file of weather data.
I have the file indexed by date (e.g., the date of the reading).
One of my columns is 'Humidity' and contains humidity data.
I wish to use the .plot function but I wish to limit the data set to between two dates.
To discriminate by time I used this to view my rows,
london[london.loc[datetime(2021,3,1) : datetime(2021,5,31)]]
With london being;
london = read_csv('London_2021.csv')
My question is how can I modify this
london['Mean Humidity'].plot(grid=True, figsize=(10,5))
To only display the data between the two dates?

What about
london[london.loc[start_date : end_date]]['Mean Humidity'].plot(grid=True, figsize=(10,5))

How can we extract duplicate values from multiple columns?

I have a dataset regarding Big Mart sales.
(You can find it here)
https://www.kaggle.com/brijbhushannanda1979/bigmart-sales-data
In the dataset there are columns like 'Outlet_Location_Type' and 'Outlet_Size'.
I want to find how many Tier1 locations have Medium 'Outlet_Size' and want to visualize this using grouped bar chart.I need a pythonic solution to this.
Any help is appreciated.

You need to use the groupby method :
df = pd.read_csv('Test.csv')
df = df[df['Outlet_Location_Type']=='Tier 1'].groupby(['Outlet_Size']).count()
Each column is equal and contains the number of element so you can select one randomly to plot the count :
df['Item_Identifier'].plot(kind='bar', stacked=True)
plt.show()

Factorplot with multiindex dataframe

This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.

Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting only selected rows in python - python

Related

embed additional second dataframe into plot

Dataframe value.counts() to barplot

Trying to plot a graph with Pandas but only wish to display data from 1 row of a csv between two dates

How can we extract duplicate values from multiple columns?

Factorplot with multiindex dataframe

Categories

Resources