Dataframe value.counts() to barplot - python

I have a dataframe with multiple columns such as product name, reviews, origin, and etc.
Here, I want to create a barplot with only the data from "Origin" column.
To do this, I used the code:
origin = df['Origin'].value_counts()
With this, I was able to get a list of countries with corresponding frequencies (or counts). Now, I want to create a boxplot with each country on X-axis and counted frequencies on the Y-axis. Although the column for frequencies have a column label, I am unable to set the X-axis as the countries are merely saved as index. Would there be a better way to count the column "Origin" and make it into a barplot?
Thanks in advance.

Related

Want to display only specific value in graph's x-axis , but its showing repeated values of columns of csv-file

I need to display only unique values on x-axis, but it is showing all the values in a specific column of the csv-file. Any suggestions please to fix this out?
df=pd.read_csv('//media//HOTEL MANAGEMENT.csv')
df.plot('Room_Type','Charges',color='g')
plt.show()
My assumption is that you are looking to plot the result of some aggregated data. e.g. Either:
The total charges per room type, or
The average charge per room type, or
The minimum/maximum charge per room type.
If so, you could so like:
df=pd.read_csv('//media//HOTEL MANAGEMENT.csv')
# And use any of the following:
df.groupby('Room_Type')['Charges'].sum().plot(color='g')
df.groupby('Room_Type')['Charges'].mean().plot(color='g')
df.groupby('Room_Type')['Charges'].min().plot(color='g')
df.groupby('Room_Type')['Charges'].max().plot(color='g')
Seeing that the x-axis may not necesarily be sequential, a comparative bar graph could be another way to plot.
df.groupby('Room_Type')['Charges'].mean().plot.bar(color=['r','g'])

Plotting only selected rows in python

I have a data frame called "df" with column = "date", "regions", "transactions". I want to plot the data frame in such a way so I can see transactions for only "selected regions" and not all the regions in my df.
For example- I want to see a plot with transactions for Regions = "a","X","z" only - all in the same graph - and "date" being my x-axis.
So far, I have been able to plot transactions data for all the regions in one graph but not able to slice my data for the regions that I want.
Can someone please help?
you can use df.loc to access only a group of rows or columns. Read below https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html
In your case, something like this would return the df with just the required regions
required_regions = ['a','X','z']
df.loc[df['regions'].isin(required_regions)]

How do I combine dataframe (pandas) columns to plot confidence intervals in a Seaborn box plot?

My dataframe looks something like this:
I want to combine Column 1 and Column 2 by making a confidence interval (take the mean of Column 1 and Column 2, find the standard deviation, etc.) and then plot that using Seaborn box plots. Basically, I want to compare the values across the 4 rows (each row is an experimental subject) for each value (but each value has multiple measurements). In other words, I need to combine columns (replicate experiments) in the DataFrame. How would I go about doing this?
Here is what I'm hoping to get:
Thanks!

Factorplot with multiindex dataframe

This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.
Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.

How do I create a column chart with an aggregated column

I have two columns customer_id and revenue and I'm trying to figure out how to use matplotlib (or seaborn) to create a histogram/bar/column chart that has an aggregated column on the right. Everytime I change the range it just cuts off those values above my max range. Instead I want there to be a bin that is the count of instances above that max value.
For the example chart linked below, if I define my range as 0-1558, I want there be a column that counts the instances of all values $1558 and above and display that as a column.
Example Chart
Cap the values above the limit:
df[df['revenue']>limit] = limit
Now, plot the histogram.
Same concept as #DYZ, but my code ended up being:
df.ix[df.revenue > limit, 'revenue'] = limit

Categories

Resources