Is it possible to create space between my axis labels? They are overlapping (30 labels crunched together) Using python pandas...
genreplot.columns =['genres','pct']
genreplot = genreplot.set_index(['genres'])
genreplot.plot(kind='barh',width = 1)
I would post a picture, but i don't have 10 reputation.....
I tried recreating your problem but not knowing what exactly your labels are, I can only give you general comments on this problem. There are a few things you can do to reduce the overlapping of labels, including their number, their font size, and their rotation.
Here is an example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
genreplot = pd.DataFrame(columns=['genres', 'pct'])
genreplot.genres = np.random.random_integers(1, 10, 20)
genreplot.pct = np.random.random_integers(1, 100, 20)
genreplot = genreplot.set_index(['genres'])
ax = genreplot.plot(kind='barh', width=1)
Now, you can set what your labels 5
pct_labels = np.arange(0, 100, 5)
ax.set_xticks(pct_labels)
ax.set_xticklabels(pct_labels, rotation=45)
For further reference, you can take a look at this page for documentation on xticks and yticks:
If your labels are quite long, and you are specifiying them from e.g. a list, you could consider adding some new lines as well:
labels = ['longgggggg_labelllllll_1',
'longgggggg_labelllllll_2']
new_labels = [label.replace('_', '\n') for label in labels]
new_labels
['longgggggg
labelllllll
1',
'longgggggg
labelllllll
2']
Related
I am visualizing the results of a survey. The answers are long and I would like to fit them entirely into the graph. Therefore, I would be very grateful if you could point me to a way to have multi-line xticklabels, or include the xticklabels in a legend on the side as seen in this example:
Because otherwise I would have to make the graph very wide to fit the entire answer. My current code and the resulting plot look as follows:
import seaborn as sns
from textwrap import wrap
sns.set(style="dark")
catp = (sns.catplot(data=results, x='1',
kind='count',
hue_order=results.sort_values('1')['1'],
palette='crest',
height=3.3,
aspect=17.4/7)
.set(xlabel=None,
ylabel='Number of Participants',
title="\n".join(wrap("Question 1: Out of the three options, please choose the one you would prefer your fully autonomous car to choose, if you sat in it.", 90)))
)
plt.tight_layout()
catp.ax.set_yticks((0,10,20,30,40))
for p in catp.ax.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/92)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height() + 0.3
catp.ax.annotate(percentage, (x, y), size = 12)
plt.show()
Best regards!
Edit: You can create a sample dataframe with this code:
import pandas as pd
import numpy as np
from itertools import chain
x = (np.repeat('Brake and crash into the bus', 37),
np.repeat('Steer into the passing car on the left', 22),
np.repeat('Steer into the right hand sidewall', 39))
results = pd.DataFrame({'1': list(chain(*x))})
Extract xticklabels and fix them with wrap as you did with the title
matplotlib 3.4.2 now comes with .bar_label to more easily annotate bars
See this answer for customizing the bar annotation labels.
The height and aspect of the figure will still require some adjusting depending on wrap width.
An alternate solution is to fix the values in the dataframe:
df['1'] = df['1'].apply(lambda row: '\n'.join(wrap(row, 30)))
for col in df.columns: df[col] = df[col].apply(lambda row: '\n'.join(wrap(row, 30))) for all columns.
The list comprehension for labels uses an assignment expression (:=), which requires python >= 3.8. This can be rewritten as a standard for loop.
labels = [f'{v.get_height()/len(df)*100:0.1f}%' for v in c] works without an assignment expression, but doesn't check if the bar height is 0.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2, seaborn 0.11.2
import seaborn as sns
from textwrap import wrap
from itertools import chain
import pandas as pd
import numpy as np
# sample dataframe
x = (np.repeat('Brake and crash into the bus, which will result in the killing of the children on the bus, but save your life', 37),
np.repeat('Steer into the passing car on the left, pushing it into the wall, saving your life, but killing passengers in the other car', 22),
np.repeat('Steer into the right hand sidewall, killing you but saving the lives of all other passengers', 39))
df = pd.DataFrame({'1': list(chain(*x))})
# plotting
sns.set(style="dark")
catp = (sns.catplot(data=df, x='1',
kind='count',
hue_order=df.sort_values('1')['1'],
palette='crest',
height=5,
aspect=17.4/7)
.set(xlabel=None,
ylabel='Number of Participants',
title="\n".join(wrap("Question 1: Out of the three options, please choose the one you would prefer your fully autonomous car to choose, if you sat in it.", 90)))
)
plt.tight_layout()
catp.ax.set_yticks((0,10,20,30,40))
for ax in catp.axes.ravel():
# extract labels
labels = ax.get_xticklabels()
# fix the labels
for v in labels:
text = v.get_text()
text = '\n'.join(wrap(text, 30))
v.set_text(text)
# set the new labels
ax.set_xticklabels(labels)
# annotate the bars
for c in ax.containers:
# create a custom annotation: percent of total
labels = [f'{w/len(df)*100:0.1f}%' if (w := v.get_height()) > 0 else '' for v in c]
ax.bar_label(c, labels=labels, label_type='edge')
My data consists of the following:
Majority numbers < 60, and then a few outliers that are in the 2000s.
I want to display it in a histogram with the following bin ranges:
0-1, 1-2, 2-3, 3-4, ..., 59-60, 60-max
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
b = list(range(61)) + [2000] # will make [0, 1, ..., 60, 2000]
plt.hist(b, bins=b, edgecolor='black')
plt.xticks(b)
plt.show()
This shows the following:
Essentially what you see is all the numbers 0 .. 60 squished together on the left, and the 2000 on the right. This is not what I want.
So I remove the [2000] and get something like what I am looking for:
As you can see now it is better, but I still have the following problems:
How do I fix this such that the graph doesn't have any white space around (there's a big gap before 0 and after 60).
How do I fix this such that after 60, there is a 2000 tick that shows at the very end, while still keeping roughly the same spacing (not like the first?)
Here is one hacky solution using some random data. I still don't quite understand your second question but I tried to do something based on your wordings
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.axes as axes
fig, ax = plt.subplots(figsize=(12, 6))
data= np.random.normal(10, 5, 5000)
upper = 31
outlier = 2000
data = np.append(data, 100*[upper])
b = list(range(upper)) + [upper]
plt.hist(data, bins=b, edgecolor='black')
plt.xticks(b)
b[-1] = outlier
ax.set_xticklabels(b)
plt.xlim(0, upper)
plt.show()
Following up my previous question: Sorting datetime objects by hour to a pandas dataframe then visualize to histogram
I need to plot 3 bars for one X-axis value representing viewer counts. Now they show those under one minute and above. I need one showing the overall viewers. I have the Dataframe but I can't seem to make them look right. With just 2 bars I have no problem, it looks just like I would want it with two bars:
The relevant part of the code for this:
# Time and date stamp variables
allviews = int(df['time'].dt.hour.count())
date = str(df['date'][0].date())
hours = df_hist_short.index.tolist()
hours[:] = [str(x) + ':00' for x in hours]
The hours variable that I use to represent the X-axis may be problematic, since I convert it to string so I can make the hours look like 23:00 instead of just the pandas index output 23 etc. I have seen examples where people add or subtract values from the X to change the bars position.
fig, ax = plt.subplots(figsize=(20, 5))
short_viewers = ax.bar(hours, df_hist_short['time'], width=-0.35, align='edge')
long_viewers = ax.bar(hours, df_hist_long['time'], width=0.35, align='edge')
Now I set the align='edge' and the two width values are absolutes and negatives. But I have no idea how to make it look right with 3 bars. I didn't find any positioning arguments for the bars. Also I have tried to work with the plt.hist() but I couldn't get the same output as with the plt.bar() function.
So as a result I wish to have a 3rd bar on the graph shown above on the left side, a bit wider than the other two.
pandas will do this alignment for you, if you make the bar plot in one step rather than two (or three). Consider this example (adapted from the docs to add a third bar for each animal).
import pandas as pd
import matplotlib.pyplot as plt
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
height = [1, 5, 20, 3, 30, 6, 10]
index = ['snail', 'pig', 'elephant',
'rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,
'lifespan': lifespan,
'height': height}, index=index)
ax = df.plot.bar(rot=0)
plt.show()
In pure matplotlib, instead of using the width parameter to position the bars as you've done, you can adjust the x-values for your plot:
import numpy as np
import matplotlib.pyplot as plt
# Make some fake data:
n_series = 3
n_observations = 5
x = np.arange(n_observations)
data = np.random.random((n_observations,n_series))
# Plotting:
fig, ax = plt.subplots(figsize=(20,5))
# Determine bar widths
width_cluster = 0.7
width_bar = width_cluster/n_series
for n in range(n_series):
x_positions = x+(width_bar*n)-width_cluster/2
ax.bar(x_positions, data[:,n], width_bar, align='edge')
In your particular case, seaborn is probably a good option. You should (almost always) try keep your data in long-form so instead of three separate data frames for short, medium and long, it is much better practice to keep a single data frame and add a column that labels each row as short, medium or long. Use this new column as the hue parameter in Seaborn's barplot
[The resolution is described below.]
I'm trying to create a PairGrid. The X-axis has at least 2 different value ranges, although even when 'cvar' below is plotted by itself the x-axis overwrites itself.
My question: is there a way to tilt the x-axis labels to be vertical or have fewer x-axis labels so they don't overlap? Is there another way to solve this issue?
====================
import seaborn as sns
import matplotlib.pylab as plt
import pandas as pd
import numpy as np
columns = ['avar', 'bvar', 'cvar']
index = np.arange(10)
df = pd.DataFrame(columns=columns, index = index)
myarray = np.random.random((10, 3))
for val, item in enumerate(myarray):
df.ix[val] = item
df['cvar'] = [400,450,43567,23000,19030,35607,38900,30202,24332,22322]
fig1 = sns.PairGrid(df, y_vars=['avar'],
x_vars=['bvar', 'cvar'],
palette="GnBu_d")
fig1.map(plt.scatter, s=40, edgecolor="white")
# The fix: Add the following to rotate the x axis.
plt.xticks( rotation= -45 )
=====================
The code above produces this image
Thanks!
I finally figured it out. I added "plt.xticks( rotation= -45 )" to the original code above. More can be fund on the MatPlotLib site here.
Is it possible to create space between my axis labels? They are overlapping (30 labels crunched together) Using python pandas...
genreplot.columns =['genres','pct']
genreplot = genreplot.set_index(['genres'])
genreplot.plot(kind='barh',width = 1)
I would post a picture, but i don't have 10 reputation.....
I tried recreating your problem but not knowing what exactly your labels are, I can only give you general comments on this problem. There are a few things you can do to reduce the overlapping of labels, including their number, their font size, and their rotation.
Here is an example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
genreplot = pd.DataFrame(columns=['genres', 'pct'])
genreplot.genres = np.random.random_integers(1, 10, 20)
genreplot.pct = np.random.random_integers(1, 100, 20)
genreplot = genreplot.set_index(['genres'])
ax = genreplot.plot(kind='barh', width=1)
Now, you can set what your labels 5
pct_labels = np.arange(0, 100, 5)
ax.set_xticks(pct_labels)
ax.set_xticklabels(pct_labels, rotation=45)
For further reference, you can take a look at this page for documentation on xticks and yticks:
If your labels are quite long, and you are specifiying them from e.g. a list, you could consider adding some new lines as well:
labels = ['longgggggg_labelllllll_1',
'longgggggg_labelllllll_2']
new_labels = [label.replace('_', '\n') for label in labels]
new_labels
['longgggggg
labelllllll
1',
'longgggggg
labelllllll
2']