I have a basic DataFrame in pandas and using matplotlib to create a chart
I have followed advice found on SO and also on the docs for labelling the values on the x axis but they won't change from the indices.
I have this,
Presc_df_asc = Presc_df.sort_values('Total Items',ascending=True)
Presc_df_asc['Total Items'].plot.bar(x="Practice", ylim=[Presc_df_asc['Total Items'].min(), Presc_df_asc['Total Items'].max()])
plt.xlabel('Practice')
plt.ylabel('Total Items')
plt.title('practice total items')
plt.legend(('Items',),loc='upper center')
From what I have found plot.bar(x="Practice" should set the x-axis to show the values int he practice column under each bar.
But no matter what I try I get the x-axis labelled as indices with just the main label saying Practices.
In order for the plotting command to be able to access the "Practice" column, you need to apply the plot function to the entire dataframe (or a sub_dataframe that contains at least these two columns). The code below uses the corresponding labels below each bar. The rot=0 argument prevents the labels from being rotated by 90°.
Presc_df_asc.plot.bar(x="Practice", y ="Total Items",
ylim=[Presc_df_asc['Total Items'].min(),
Presc_df_asc['Total Items'].max()], rot=0)
Related
I'm generating a scatterplot for a Pandas DataFrame data, containing amongst others the numeric column 'year' with the unique values
array([2010., 2011., 2012., 2013., 2014., 2015., 2016., 2017., 2018.])
as shown with data.year.unique().
Displaying the plot like this:
ax = sns.scatterplot(x='x', y='y', hue='name', size='year', data=data, palette=sns.color_palette('deep', 7))
generates a legend with the groupings for year listed as
This is misleading, as the plot only contains data from 2008 to 2020.
I've tried passing a tuple (min, max) to the sns.scatterplot function as described in the documentation, to no avail.
Changing the data type of the column 'year' to categoric does print the range of the years correctly in the legend, but yields a legend entry for every single year, which is unnecessary and takes up a lot of space.
I've also tried the solution from this related thread, but it doesn't change the range of the legend entries.
How can I force seaborn to show the actual range of values in the legend? Alternatively, if it only works by using a categorical column, how can I only show every second entry in the legend?
I have a pandas dataframe and want to plot one value versus another, based on a particular field.
So I have 5 different types in 'Topic' and I want to plot each. Code as below at present.
dfCombinedToPlot.groupby('Topic').plot(x='DataValue', y='CasesPer100kPop', style='o')
# plt.title() Want this to equal "'Topic' vs number of cases"
# plt.xlabel() Want this to equal 'Topic'
# plt.ylabel()
plt.show()
I have 3 questions.
1. Can I add a title/xlabel to each of these which matches that from the Topic? So if the topic was "Asthma", I want the title/x label to be "Asthma", and then the next one to be "Bronchitis", and so on.
2. I want to add these on the same plot if possible, I will decide how many looks well when I see them. How do I do this?
3. (Bonus question!!) Can I easily add a "best fit" line to each plot?
Thanks all.
groupby.plot method returns a list of axes you can use to set titles, for example if there are only two axes:
axes = dfCombinedToPlot.groupby('Topic').plot(x='DataValue', y='CasesPer100kPop', style='o')
titles = ['Asthma', 'Bronchitis']
count = 0
for ax in axes:
ax.set_title(f'{titles[count]}')
ax.set_xlabel(f'{titles[count]}')
count += 1
plt.show()
To do the best fit, I assume is a linear regression, so it might worth checking regplot
I'm looking to plot two columns of a time series based on a groupby of a third column. It works as intended more or less, but I can't tell which subgroup is being plotted in the output as it is not included in the legend or anywhere else in the graphs outputted.
Is there a way to include the subgroup name in the graphs outputted?
This is what I've attempted on the dataframe as follows:
dataframe
awareness.groupby('campaign_name')['sum_purchases_value','sum_ad_spend'].plot(figsize=(20,8), legend=True);
Try this:
grouped = awareness.groupby('campaign_name')
titles = [name for name,data in grouped]
plots = grouped['sum_purchases_value',
'sum_ad_spend'].plot(figsize=(20,8), legend=True)
for plot, label in zip(plots, titles):
plot.set(title = label)
The pandas plot function returns a Series of matplotlib subplot objects, so using the for loop you can customize whatever you like (x labels, y labels, font size, etc.)
Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:
I have a TimeSeries in Pandas that I want to plot. I have 336 records in the TimeSeries. I only want to show the date/time (index of the TimeSeries) on the x-axis once per every 20 or so data points.
Here is how I am trying to do this:
stats.plot()
ax.set_xticklabels(stats.index, rotation=45 )
ax.xaxis.set_major_locator(MultipleLocator(20))
ax.xaxis.set_minor_locator(NullLocator())
ax.yaxis.set_major_locator(MultipleLocator(.075))
draw()
My x-axis show the correct number of labels (18), but these are the first 18 in the series, they are not correctly corresponding to the datapoints in the plot.
The problem is you are using set_xticklabels which sets the value of the tick labels independent of the data. The ticks are labeled sequentially from the list you pass in.
From this I can't really tell what you are trying to do, but the behavior you are seeing is the 'correct' behavior for the library (it's doing exactly what you told it to, but that isn't what you want it to do).