I've imported data from a .csv. There is a column within the data referring to number of people. The number of people ranges from 1 - 100 for each different input. My goal is to only plot on a histogram the inputs where number of people is less than 50.
I know how to plot the histogram.
df['people'].hist()
But, how do I specify the range of people?
I've tried df[df['people']< 50].hist() but that did not work.
I know this should be easy but I just don't get it! This is using python and pandas.
Try it with query function
df.query("people < 50")['people'].hist()
I used some sample data and tried to plot the histogram with df['people'][df['people']<50].hist() but df[df['people']<50].hist() also seems to work for me.
df = pd.DataFrame(
[1,1,2,3,3,5,7,8,9,10,
10,11,11,13,13,15,16,17,18,18,
18,19,20,21,21,23,24,24,25,25,
25,25,26,26,26,27,27,27,27,27,
29,30,30,31,33,34,34,34,35,36,
36,37,37,38,38,39,40,41,41,42,
43,44,45,45,46,47,48,48,49,50,
51,52,53,54,55,55,56,57,58,60,
61,63,64,65,66,68,70,71,72,74,
75,77,81,83,84,87,89,90,90,91], columns=['people'])
df.head()
df['people'][df['people']<50].hist()
I have attached a screenshot of the histogram.
I come across this video and it bugs me.
Essentially, at 5:50, they calculate Z-score for the whole data frame by the following snippet:
df_z = (df - df.describle.T['mean'])/df.describle.T['std']
It is a neat and beautiful line.
However, df.describle.T looks like this and df looks like this
df.describle.T['mean'] and df.describle.T['std'] are two individual series, which take the df columns name as index and describle statistic parameters as columns, and df is an ordinary pd.DataFramewhich has numercial index and columns names in the right places.
My question is: how does that line make sense when they are not matching at all, in particular, how do they ensure that every variable example (x_i) matches their mean or std?
Thank you.
I have metled a data using pd.melt function in pandas and pivoted the table keeping the name and year as id. Then I have got the table which I want. But, while ploting the graph, its not proper(means I am not getting what I want). The below is the code which gives the work done so far.
I have prefered to do this method since i have other variable with same name and years.(may be some other method exists)
But I want the graph something like, having bars representing 'Estimated Number of Pregnacies' for each state(including all india) over the years as side by side bars.
How to achieve this?
Here's a minimal example of what you are doing. Hope this gives you some hint:
# sample data
df = pd.DataFrame({'name': ['a','a','b','b','c','c'],
'class' : [1,2,1,2,1,2],
'vals':[122,1122,3342,4431,4311,1989]})
# use groupby on columns you want to see on x axis
df.groupby(['name','class'])['vals'].sum().unstack().plot(kind='bar')
I have grouped some variables using groupby and now I want to plot them and edit the plot using matplotlib. In the code below, I have ploted the data using pandas, which gives me very little room to edit the graph (I think).
a = df_08.groupby('new_time').symbol.count()/len(set(df_08['date']))
a.plot()
The problem with using matplotlib and doing
plt.plot()
is that my data for 'a', after using 'groupby' is not in Series format for pandas and matplotlib does not accept that.
'a' comes out like this, in Series format:
new_time symbol
09:30 224.2
09:31 133.8
09:32 117.6
09:33 113.5
09:34 108.4
The first column has the name 'Index', but I can't seem to treat it as the column name. I would like the first column to be on the x axis and the second column to be on the y axis.
Anyway, I guess my question is how to transform the data from Series to matplotlib acceptable format.
This is the dataframe I am working with:
(only the first two years don't have data for country 69 I will fix this). nkill being the number of killed for that year summed from the original long form dataframe.
I am trying to do something similar to this plot:
However, with the country code as a hue. I know there are similar posts but none have helped me solve this, thank you in advance.
By Hue I mean that in the seaborn syntactical use As pictured in this third picture. See in this example Hue creates a plot for every type of variable in that column. So if I had two country codes in the country column, for every year it would plot two bars (one for each country) side by side.
Just looking at the data it should be possible to directly use the hue argument.
But first you would need to create actual columns from the dataframe
df.reset_index(inplace=True)
Then something like
sns.barplot(x = "year", y="nkill", hue="country", data=df)
should give you the desired plot.