Plotting a histogram gives height error msg - python

I'm trying to plot a histogram based on percentages, I keep getting the below error:
ValueError: incompatible sizes: argument 'height' must be length 6 or scalar
It's something to do with this line but I'm not sure what's wrong with the height argument.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
xaxis=['epic1', 'epic2', 'epic3', 'epic4', 'epic5', 'epic6']
n=len(xaxis)
names = ('epic1', 'epic2', 'epic3', 'epic4', 'epic5', 'epic6')
data = {'done': [57,53,49,65,78,56,89],
'progress': [23,12,34,11,34,12,12],
'todo' :[11,5,6,7,8,4,6]}
df = pd.DataFrame(data)
df['total'] = df['done'] + df['progress'] + df['todo']
df['done_per'] = df['done'] / df['total'] * 100
df['progress_per'] = df['progress'] / df['total'] * 100
df['todo_per'] = df['todo'] / df['total'] * 100
barWidth = 0.25
# Create green Bars
plt.bar(xaxis, done_per, color='#b5ffb9', edgecolor='green', width=barWidth)
# Create orange Bars
plt.bar(xaxis, progress_per, bottom=done_per, color='#f9bc86',
edgecolor='orange', width=barWidth)
# Create blue Bars
plt.bar(xaxis, todo_per, bottom=[i+j for i,j in zip(done_per, progress_per)],
color='blue', edgecolor='blue', width=barWidth)
plt.xticks(xaxis, names)
plt.xlabel("epics")
plt.show()

There is 7 items in X_per(yaxis) and 6 items in xaxis.
If you want 7 items instead.
Adding 'epic7' into xaxis should do the job. xaxis.append('epic7')
I think you had missed few lines in your code:
done_per = df['done_per']
progress_per = df['progress_per']
todo_per = df['todo_per']

Related

How to extend a matplotlib axis if the ticks are labels and not numeric?

I have a number of charts, made with matplotlib and seaborn, that look like the example below.
I show how certain quantities evolve over time on a lineplot
The x-axis labels are not numbers but strings (e.g. 'Q1' or '2018 first half' etc)
I need to "extend" the x-axis to the right, with an empty period. The chart must show from Q1 to Q4, but there is no data for Q4 (the Q4 column is full of nans)
I need this because I need the charts to be side-by-side with others which do have data for Q4
matplotlib doesn't display the column full of nans
If the x-axis were numeric, it would be easy to extend the range of the plot; since it's not numeric, I don't know which x_range each tick corresponds to
I have found the solution below. It works, but it's not elegant: I use integers for the x-axis, add 1, then set the labels back to the strings. Is there a more elegant way?
This is the code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
from matplotlib.ticker import FuncFormatter
import seaborn as sns
df =pd.DataFrame()
df['period'] = ['Q1','Q2','Q3','Q4']
df['a'] = [3,4,5,np.nan]
df['b'] = [4,4,6,np.nan]
df = df.set_index( 'period')
fig, ax = plt.subplots(1,2)
sns.lineplot( data = df, ax =ax[0])
df_idx = df.index
df2 = df.set_index( np.arange(1, len(df_idx) + 1 ))
sns.lineplot(data = df2, ax = ax[1])
ax[1].set_xlim(1,4)
ax[1].set_xticklabels(df.index)
You can add these lines of code for ax[0]
left_buffer,right_buffer = 3,2
labels = ['Q1','Q2','Q3','Q4']
extanded_labels = ['']*left_buffer + labels + ['']*right_buffer
left_range = list(range(-left_buffer,0))
right_range = list(range(len(labels),len(labels)+right_buffer))
ticks_range = left_range + list(range(len(labels))) + right_range
aux_range = list(range(len(extanded_labels)))
ax[0].set_xticks(ticks_range)
ax[0].set_xticklabels(extanded_labels)
xticks = ax[0].xaxis.get_major_ticks()
for ind in aux_range[0:left_buffer]: xticks[ind].tick1line.set_visible(False)
for ind in aux_range[len(labels)+left_buffer:len(labels)+left_buffer+right_buffer]: xticks[ind].tick1line.set_visible(False)
in which left_buffer and right_buffer are margins you want to add to the left and to the right, respectively. Running the code, you will get
I may have actually found a simpler solution: I can draw a transparent line (alpha = 0 ) by plotting x = index of the dataframe, ie with all the labels, including those for which all values are nans, and y = the average value of the dataframe, so as to be sure it's within the range:
sns.lineplot(x = df.index, y = np.ones(df.shape[0]) * df.mean().mean() , ax = ax[0], alpha =0 )
This assumes the scale of the y a xis has not been changed manually; a better way of doing it would be to check whether it has:
y_centre = np.mean([ax[0].get_ylim()])
sns.lineplot(x = df.index, y = np.ones(df.shape[0]) * y_centre , ax = ax[0], alpha =0 )
Drawing a transparent line forces matplotlib to extend the axes so as to show all the x values, even those for which all the other values are nans.

How to set size of AxesSubplot in relativey simple Python program?

Python 3.7 environent
I want to create a stacked bar plot with some labels on top of each subcategory displyed as the bar. The data comes from a CSV file, and some of the labels are rather long, so they are larger than the bar width. The problem could be easily solved by scaling the whole graphic such that the bars become large enough for the labels, but I fail to re-size the plot as a whole. here the code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
dataset = 'Number'
dataFrame: pd.DataFrame = pd.read_csv('my_csv_file_with_data.csv', sep=',', header=2)
dataFrame['FaultDuration [h]'] = dataFrame['DurationH']
# ***********************************************************
# Data gymnastics to transform data in desired format
# determine the main categories
mainCategories: pd.Series = dataFrame['MainCategory']
mainCategories = mainCategories.drop_duplicates()
mainCategories = mainCategories.sort_values()
print('Main Categories: '+ mainCategories)
# subcategories
subCategories: pd.Series = pd.Series(data=dataFrame['SubCategorie'].drop_duplicates().sort_values().values)
subCategories = subCategories.sort_values()
print('Sub Categories: '+ subCategories)
# Build new frame with subcategories as headers
columnNames = pd.Series(data=['SubCategory2'])
columnNames = columnNames.append(mainCategories)
rearrangedData: pd.DataFrame = pd.DataFrame(columns=columnNames.values)
for subCategory in subCategories:
subset: pd.DataFrame = dataFrame.loc[dataFrame['SubCategorie'] == subCategory]
rearrangedRow = pd.DataFrame(columns=mainCategories.values)
rearrangedRow = rearrangedRow.append(pd.Series(), ignore_index=True)
rearrangedRow['SubCategory2'] = subCategory
for mainCategory in mainCategories:
rowData: pd.DataFrame = subset.loc[subset['MainCategorie'] == mainCategory]
if (rowData is not None and rowData.size > 0):
rearrangedRow[mainCategory] = float(rowData[dataset].values)
else:
rearrangedRow[mainCategory] = 0.0
rearrangedData = rearrangedData.append(rearrangedRow, ignore_index=True)
# *********************************************************************
# here the plot is created:
thePlot = rearrangedData.set_index('SubCategory2').T.plot.bar(stacked=True, width=1, cmap='rainbow')
thePlot.get_legend().remove()
labels = []
# *************************************************************
# creation of bar patches and labels in bar chart
rowIndex = 0
for item in rearrangedData['SubCategory2']:
colIndex = 0
for colHead in rearrangedData.columns:
if colHead != 'SubCategory2':
if rearrangedData.iloc[rowIndex, colIndex] > 0.0:
label = item + '\n' + str(rearrangedData.iloc[rowIndex, colIndex])
labels.append(item)
else:
labels.append('')
colIndex = colIndex + 1
rowIndex = rowIndex + 1
patches = thePlot.patches
for label, rect in zip(labels, patches):
width = rect.get_width()
if width > 0:
x = rect.get_x()
y = rect.get_y()
height = rect.get_height()
thePlot.text(x + width/2., y + height/2., label, ha='center', va='center', size = 7 )
# Up to here things work like expected...
# *******************************************************
# now I want to produce output in the desired format/size
# things I tried:
1) thePlot.figure(figsize=(40,10)) <---- Fails with error 'Figure' object is not callable
2) plt.figure(figsize=(40,10)) <---- Creates a second, empty plot of the right size, but bar chart remains unchanged
3) plt.figure(num=1, figsize=(40,10)) <---- leaves chart plot unchanged
plt.tight_layout()
plt.show()
The object "thePlot" is an AxesSubplot. How do I get to a properly scaled chart?
You can use the set sizes in inches:
theplot.set_size_inches(18.5, 10.5, forward=True)
For example see:
How do you change the size of figures drawn with matplotlib?

Grouped Bar graph Pandas

I have a table in a pandas DataFrame named df:
+--- -----+------------+-------------+----------+------------+-----------+
|avg_views| avg_orders | max_views |max_orders| min_views |min_orders |
+---------+------------+-------------+----------+------------+-----------+
| 23 | 123 | 135 | 500 | 3 | 1 |
+---------+------------+-------------+----------+------------+-----------+
What I am looking for now is to plot a grouped bar graph which shows me
(avg, max, min) of views and orders in one single bar chart.
i.e on x axis there would be Views and orders separated by a distance
and 3 bars of (avg, max, min) for views and similarly for orders.
I have attached a sample bar graph image, just to know how the bar graph should look.
Green color should be for avg, yellow for max and pink for avg.
I took the following code from setting spacing between grouped bar plots in matplotlib but it is not working for me:
plt.figure(figsize=(13, 7), dpi=300)
groups = [[23, 135, 3], [123, 500, 1]]
group_labels = ['views', 'orders']
num_items = len(group_labels)
ind = np.arange(num_items)
margin = 0.05
width = (1. - 2. * margin) / num_items
s = plt.subplot(1, 1, 1)
for num, vals in enumerate(groups):
print 'plotting: ', vals
# The position of the xdata must be calculated for each of the two data
# series.
xdata = ind + margin + (num * width)
# Removing the "align=center" feature will left align graphs, which is
# what this method of calculating positions assumes.
gene_rects = plt.bar(xdata, vals, width)
s.set_xticks(ind + 0.5)
s.set_xticklabels(group_labels)
plotting: [23, 135, 3]
...
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Using pandas:
import pandas as pd
groups = [[23,135,3], [123,500,1]]
group_labels = ['views', 'orders']
# Convert data to pandas DataFrame.
df = pd.DataFrame(groups, index=group_labels).T
# Plot.
pd.concat(
[df.mean().rename('average'), df.min().rename('min'),
df.max().rename('max')],
axis=1).plot.bar()
You should not have to modify your dataframe just to plot it in a certain way right ?
Use seaborn !
import seaborn as sns
sns.catplot(x = "x", # x variable name
y = "y", # y variable name
hue = "type", # group variable name
data = df, # dataframe to plot
kind = "bar")
source

Histogram bars overlapping matplotlib

I am able to build the histogram I need. However, the bars overlap over one another.
As you can see I changed the width of the bars to 0.2 but it still overlaps. What is the mistake I am doing?
from matplotlib import pyplot as plt
import numpy as np
from matplotlib.font_manager import FontProperties
from random import randrange
color = ['r', 'b', 'g','c','m','y','k','darkgreen', 'darkkhaki', 'darkmagenta', 'darkolivegreen', 'darkorange', 'darkorchid', 'darkred']
label = ['2','6','10','14','18','22','26','30','34','38','42','46']
file_names = ['a','b','c']
diff = [[randrange(10) for a in range(0, len(label))] for a in range(0, len(file_names))]
print diff
x = diff
name = file_names
y = zip(*x)
pos = np.arange(len(x))
width = 1. / (1 + len(x))
fig, ax = plt.subplots()
for idx, (serie, color,label) in enumerate(zip(y, color,label)):
ax.bar(pos + idx * width, serie, width, color=color, label=label)
ax.set_xticks(pos + width)
plt.xlabel('foo')
plt.ylabel('bar')
ax.set_xticklabels(name)
ax.legend()
plt.savefig("final" + '.eps', bbox_inches='tight', pad_inches=0.5,dpi=100,format="eps")
plt.clf()
Here is the graph:
As you can see in the below example, you can easily get non-overlapping bars using a heavily simplified version of your plotting code. I'd suggest you to have a closer look at whether x and y really are what you expect them to be. (And that you try to simplify your code as much as possible when you are looking for an error in the code.)
Also have a look at the computation of the width of the bars. You appear to use the number of subjects for this, while it should be the number of bars per subject instead.
Have a look at this example:
import numpy as np
import matplotlib.pyplot as plt
subjects = ('Tom', 'Dick', 'Harry', 'Sally', 'Sue')
# number of bars per subject
n = 5
# y-data per subject
y = np.random.rand(n, len(subjects))
# x-positions for the bars
x = np.arange(len(subjects))
# plot bars
width = 1./(1+n) # <-- n.b., use number of bars, not number of subjects
for i, yi in enumerate(y):
plt.bar(x+i*width, yi, width)
# add labels
plt.xticks(x+n/2.*width, subjects)
plt.show()
This is the result image:
For reference:
http://matplotlib.org/examples/api/barchart_demo.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar
The problem is that the width of your bars is calculated from the three subjects, not the twelve bars per subject. That means you're placing multiple bars at each x-position. Try swapping in these lines where appropriate to fix that:
n = len(x[0]) # New variable with the right length to calculate bar width
width = 1. / (1 + n)
ax.set_xticks(pos + n/2. * width)

Python Grouped bar chart. Count doesnt work

I'm working on a school project and I'm stuck in making a grouped bar chart. I found this article online with an explanation: https://www.pythoncharts.com/2019/03/26/grouped-bar-charts-matplotlib/
Now I have a dataset with an Age column and a Sex column in the Age column there stand how many years the client is and in the sex is a 0 for female and 1 for male. I want to plot the age difference between male and female. Now I have tried the following code like in the example:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import pylab as pyl
fig, ax = plt.subplots(figsize=(12, 8))
x = np.arange(len(data.Age.unique()))
# Define bar width. We'll use this to offset the second bar.
bar_width = 0.4
# Note we add the `width` parameter now which sets the width of each bar.
b1 = ax.bar(x, data.loc[data['Sex'] == '0', 'count'], width=bar_width)
# Same thing, but offset the x by the width of the bar.
b2 = ax.bar(x + bar_width, data.loc[data['Sex'] == '1', 'count'], width=bar_width)
This raised the following error: KeyError: 'count'
Then I tried to change the code a bit and got another error:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import pylab as pyl
fig, ax = plt.subplots(figsize=(12, 8))
x = np.arange(len(data.Age.unique()))
# Define bar width. We'll use this to offset the second bar.
bar_width = 0.4
# Note we add the `width` parameter now which sets the width of each bar.
b1 = ax.bar(x, (data.loc[data['Sex'] == '0'].count()), width=bar_width)
# Same thing, but offset the x by the width of the bar.
b2 = ax.bar(x + bar_width, (data.loc[data['Sex'] == '1'].count()), width=bar_width)
This raised the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape
Now how do I count the results that I do can make this grouped bar chart?
It seems like the article goes through too much trouble just to plot grouped chart bar:
np.random.seed(1)
data = pd.DataFrame({'Sex':np.random.randint(0,2,1000),
'Age':np.random.randint(20,50,1000)})
(data.groupby('Age')['Sex'].value_counts() # count the Sex values for each Age
.unstack('Sex') # turn Sex into columns
.plot.bar(figsize=(12,6)) # plot grouped bar
)
Or even simpler with seaborn:
fig, ax = plt.subplots(figsize=(12,6))
sns.countplot(data=data, x='Age', hue='Sex', ax=ax)
Output:

Categories

Resources