Factorplot with multiindex dataframe - python

This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.

Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.

Related

Dataframe value.counts() to barplot

I have a dataframe with multiple columns such as product name, reviews, origin, and etc.
Here, I want to create a barplot with only the data from "Origin" column.
To do this, I used the code:
origin = df['Origin'].value_counts()
With this, I was able to get a list of countries with corresponding frequencies (or counts). Now, I want to create a boxplot with each country on X-axis and counted frequencies on the Y-axis. Although the column for frequencies have a column label, I am unable to set the X-axis as the countries are merely saved as index. Would there be a better way to count the column "Origin" and make it into a barplot?
Thanks in advance.

How to create a line plot using the mean of a column and extracting year from a Date column?

Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!
What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()

Why the plot is not stacked bar in python(pandas)?

I have metled a data using pd.melt function in pandas and pivoted the table keeping the name and year as id. Then I have got the table which I want. But, while ploting the graph, its not proper(means I am not getting what I want). The below is the code which gives the work done so far.
I have prefered to do this method since i have other variable with same name and years.(may be some other method exists)
But I want the graph something like, having bars representing 'Estimated Number of Pregnacies' for each state(including all india) over the years as side by side bars.
How to achieve this?
Here's a minimal example of what you are doing. Hope this gives you some hint:
# sample data
df = pd.DataFrame({'name': ['a','a','b','b','c','c'],
'class' : [1,2,1,2,1,2],
'vals':[122,1122,3342,4431,4311,1989]})
# use groupby on columns you want to see on x axis
df.groupby(['name','class'])['vals'].sum().unstack().plot(kind='bar')

producing a scatter plot from multi-level dataframe [pandas]

I have a big data frame, on which I've done a df.groupby(["event_type", "day"].count() and gotten the following multi-indexed df:
My aim is to produce a scatter plot that shows the number of occurrences of an event per day, sorted by event_type. So a scatter plot where the x axis is "day" and the y axis would be "id" from the above table (which is a count). But I don't know how to go about making it.
background: event_type is only 3 types. day is like 2 years of dates. "id" is id of things I'm tracking, but in the above .groupby() data frame, its actually the count of ids. I'd ideally like to get 3 separate lines plotted (one per event_type) of the id counts versus day of the year. Thanks!
I hope this will help:
a['date'] = pd.to_datetime(a['date'])
for name, group in a.groupby(['type','date']).count().groupby('type'):
plt.plot(group.reset_index().set_index('date')['v1'], marker=o, linestyle='', label=name)
plt.legend()
If you want normal plot instead of scatter, remove marker and linestyle arguments.
My DF looks like this:

Complex dataframe plotting with Pandas / Matplotlib

I'd like to create a single time-series graph from a pandas dataframe that looks like the following:
*sample of a simplified version of my dataframe:*
index to_network count
201401 net_1 100
201401 net_2 200
201401 net_3 150
201402 net_1 300
201402 net_2 250
201403 net_1 175
Ultimately, the final graph should be a time-series line graph (x-axis being the index and the y-axis being 'count') with multiple lines, and each line being a network in the to_network column (e.g., one line should be net_1).
I've been reading the 'python for data analysis' book, but they don't appear to be this complex.
Does it work?
df.groupby('to_network').count.plot()
If you want to show the date correctly, you can try:
df.index=pd.to_datetime(df.index,format='%Y%m')
The default behavior of plot in pandas is to use the index as an x-axis and plot one line per column. So you want to reshape your data frame to mirror that structure. You can do the following:
df.pivot_table(index='index', columns = 'to_network', values = 'count', aggfunc = 'sum').plot()
This will pivot your df (which is in the long format ala ggplot style) into a frame from which pandas default plot behavior will produce your desired result of one line per network type with index as the x-axis and count as the value.
To answer your question, I have checked in a notebook here: http://nbviewer.ipython.org/github/ericmjl/Stack-Overflow-Answers/blob/master/20141020%20Complex%20Pandas%20Plotting/Untitled0.ipynb
The core idea is to do a groupby, and then plot only the column that you're interested in.
Code is also pasted below here:
df = pd.read_csv("data.csv")
df.groupby("to_network")['count'].plot()
Also, be sure to add in Daniele's contribution, where you format the index correctly:
df.index=pd.to_datetime(df.index,format='%Y%m')
For attribution, I have up-voted her answer in addition to citing it here.
I hope this answers the question; if it did, please accept the answer!

Categories

Resources