this should be very simple but I'm trying to order a seaborn countplot by Month.
The default is in reverse order (latest months first), so I would like to either simply reverse the order or specify the order - ideally I'd like to understand how to do both.
This is the code I have:
sns.countplot(data = cycling ,x = cycling['Date'].dt.strftime('%Y-%m') ) plt.xticks(rotation=45) plt.show()
I tried adding order = cycling['Date'].dt.strftime('%Y-%m') but it just splits the bars further based on how many entries I had for that month. So it goes from this: Barplot image 1: wrong order
To this: Barplot image 2: wrong order + sliced too much
Any help would be great, thanks!
By default, the order of appearance in the 'Date' column is used. If your dataframe is strictly from newest to oldest, you could just invert the dataframe. If there isn't a strict order, you can sort the dataframe.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
cycling = pd.DataFrame({'Date': np.random.choice(pd.date_range('20210801', '20230123', freq='D'), 500)})
ax = sns.countplot(x=cycling.sort_values('Date')['Date'].dt.strftime('%Y-%m'))
ax.tick_params(axis='x', rotation=45)
ax.set_xlabel('')
plt.tight_layout()
plt.show()
You can use order = list(set(order)) to remove duplicates from your list.
the use of set will remove duplicates from the list and the use of list will convert the type back to list
you can also reverse the auto-generated order list: by using order.reverse()
Related
I am using the integrated plot() function in pandas to generate a graph with two y-axes. This works well and the legend even points to the (right) y-axis for the second data set. But imho the legend's position is bad.
However, when I update the legend position I get two legends the correct one ('A', 'B (right)') at an inconvenient location, and a wrong one ('A' only) at the chosen location.
So now I want to generate a legend on my own and was looking for the second <matplotlib.lines.Line2D>, but it is not contained in the ax environment.
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
len(ax.lines)
>>> 1
My ultimate objective is to be able to move the correct legend around, but I am confident I could manually place a legend, if only I had access to the second line container.
If I had, I was going to suppress the original legend by invoking df.plot(...,legend=None) and do something like plt.legend([ax.lines[0],ax.lines[1]],['A','B (right)'],loc='center left',bbox_to_anchor=(1.2, 0.5)). But ax only stores the first line "A", where is the second?
Also ax.get_legend_handles_labels() only contains ([<matplotlib.lines.Line2D at 0x2630e2193c8>], ['A']).
You create two axes. Each contains a line. So you need to loop over the axes and take the line(s) from each of them.
import numpy as np
import pandas as pd
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'])
lines = np.array([axes.lines for axes in ax.figure.axes]).flatten()
print(lines)
For the purpose of creating a single legend you may however just use a figure legend,
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A":[1,2,3],"B":[1/4,1/5,1/6]})
ax = df.plot(secondary_y=['B'], legend=False)
ax.figure.legend()
plt.show()
I know it has already been asked, but I could not solve my problem.
I have three pandas column, One with dates, and other with values.
I can get my graph with the two curves depending on date.
However, I cannot display all dates in the x axis. Can you help me?
import pandas as pd
import matplotlib.pyplot as plt
# mau_file is the pandas dataframe with three columns.
plt.figure()
mau_file.plot(x='month_date', y=['mau', 'nb_migs'], figsize=(10,5), grid=True)
plt.set_xticklabels(mau_file['month_date'])
plt.legend(loc='best')
plt.show()
Usually, plt.xticks() is used to display x axis values.
As I'm not sure it is 100% compatible with a pandas structure, you may need to store your data in a classical table or a numpy array.
Documentation of plt.xticks()
EDIT : It is possible to chose the orientation of the labels.
For exemple plt.xticks(x, labels, rotation='vertical') will give you vertical labels.
I have
x = collections.Counter(df.f.values.tolist())
if 'nan' in x:
del x['nan']
plt.bar(range(len(x)), x.values(), align='center')
plt.xticks(range(len(x)), list(x.keys()))
plt.show()
My question is, how can I remove the nan's from the dictionary that is created, and how can I change the order of the bar plot to go from 1-5? The first 3 nan's are empty spots in the data (intentional since its from a poll), and the last one is the title of the column. I tried manually changing the range part of plt.bar to be 1-5 but it does not seem to work.
You can use .value_counts on a pandas.Series to simply get how many times each value occurs. This makes it simple to then make a barplot.
By default, value_counts will ignore the NaN values, so that takes care of that, and by using .sort_index() we can guarantee the values are plotted in order. It seems we need to use .to_frame() so that it only plots one color for the column (it chooses one color per row for a Series).
Sample Data
import pandas as pd
import numpy as np
# Get your plot settings
import seaborn as sns
sns.set()
np.random.seed(123)
df = pd.DataFrame({'f': np.random.randint(1,6,100)})
df = df.append(pd.DataFrame({'f': np.repeat(np.NaN,1000)}))
Code
df.f.value_counts().to_frame().sort_index().plot(kind='bar', legend=False)
I have the following dataframe (with different campaigns)
When I use groupby and try to plot, I get several graphs
df.groupby("Campaign").plot(y=["Visits"], x = "Week")
I would like to have only one graph with all the visits in the same graph by every campaign during the week time. Also because the graphs show up separated, I do not know which one belongs to each campaign.
I would appreciate any tips regarding this.
You could do this:
df.set_index(['Week','Campaign'])['Visits'].unstack().plot(title='Visits by Campaign')
For multiple values of Week/Campaign let's aggregate them with sum or you could use mean to average the values:
df.groupby(['Week','Campaign'])['Visits'].sum().unstack().plot(title='Visits by Campain')
Output:
Another possible solution would be to use seaborn
import seaborn as sns
ax = sns.lineplot(x="Week",
y="Visits",
hue="Campaign",
estimator=None,
lw=1,
data=df)
The documentation is here
I have some data, based on which I am trying to build a countplot in seaborn. So I do something like this:
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
and get my countplot:
The problem is that ticks on the x-axis are too dense (which makes them useless). I tried to decrease the density with plot_.xticks=np.arange(0, 40, 10) but it didn't help.
Also is there a way to make the plot in one color?
Tick frequency
There seem to be multiple issues here:
You are using the = operator while using plt.xticks. You should use a function call instead (but not here; read point 2 first)!
seaborn's countplot returns an axes-object, not a figure
you need to use the axes-level approach of changing x-ticks (which is not plt.xticks())
Try this:
for ind, label in enumerate(plot_.get_xticklabels()):
if ind % 10 == 0: # every 10th label is kept
label.set_visible(True)
else:
label.set_visible(False)
Colors
I think the data-setup is not optimal here for this type of plot. Seaborn will interpret each unique value as new category and introduce a new color. If i'm right, the number of colors / and x-ticks equals the number of np.unique(data).
Compare your data to seaborn's examples (which are all based on data which can be imported to check).
I also think working with seaborn is much easier using pandas dataframes (and not numpy arrays; i often prepare my data in a wrong way and subset-selection needs preprocessing; dataframes offer more). I think most of seaborn's examples use this data-input.
even though this has been answered a while ago, adding another perhaps simpler alternative that is more flexible.
you can use an matplotlib axis tick locator to control which ticks will be shown.
in this example you can use LinearLocator to achieve the same thing:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.ticker as ticker
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
plot_.xaxis.set_major_locator(ticker.LinearLocator(10))
Since you have tagged matplotlib, one solution different from setting the ticks visible True/False is to plot every nth label as following
fig = plt.figure(); np.random.seed(123)
data = np.hstack((np.random.normal(10, 5, 10000), np.random.normal(30, 8, 10000))).astype(int)
plot_ = sns.countplot(data)
fig.canvas.draw()
new_ticks = [i.get_text() for i in plot_.get_xticklabels()]
plt.xticks(range(0, len(new_ticks), 10), new_ticks[::10])
As a slight modification of the accepted answer, we typically select labels based on their value (and not index), e.g. to display only values which are divisible by 10, this would work:
for label in plot_.get_xticklabels():
if np.int(label.get_text()) % 10 == 0:
label.set_visible(True)
else:
label.set_visible(False)