How to generate a histogram with the list below? - python

How to generate a histogram with the list below?
[[0, 0, 0, 19, 7], [0, 0, 0, 21, 7], [0, 0, 0, 21, 7], [0, 0, 0, 29, 0]]
Explaining the list: [0, 0, 0, 19, 7]
First value = repetition average between 0-20
Second value = repetition average between 20-40
Third value = repetition average between 40-60
Fourth value = average repetition between 60-80
Fifth value = repetition average between 80-100
These sublists within the list can grow exponentially, I would like each sub-list to have a distance between each other, to better interpret the graph
What I have achieved so far:
result = [[[0, 0, 0, 19, 7], [0, 0, 0, 21, 7], [0, 0, 0, 21, 7], [0, 0, 0, 29, 0]]]
fig, ax = plt.subplots(figsize=(10,6))
for i in range(len(result)):
data = np.array(result[i])
x=np.arange(len(data)) + i*6
# draw means
ax.bar(x-0.2, data[:,0], color='blue', width=0.4)
ax.bar(x+0.2, data[:,1], color='green', width=0.4)
ax.bar(x-0.2, data[:,2], color='yellow', width=0.4)
ax.bar(x+0.2, data[:,3], color='orange', width=0.4)
ax.bar(x+0.2, data[:,4], color='red', width=0.4)
# separation line
ax.axvline(4.75)
# turn off xticks
ax.set_xticks([])
ax.legend(labels=['0-20', '20-40', '40-60', '60-80', '80-100'])
leg = ax.get_legend()
leg.legendHandles[0].set_color('blue')
leg.legendHandles[1].set_color('green')
leg.legendHandles[2].set_color('yellow')
leg.legendHandles[3].set_color('orange')
leg.legendHandles[4].set_color('red')
plt.title("Histogram")
plt.ylabel('Consume')
plt.xlabel('Percent')
plt.show()
Any suggetions?

Here is an approach to draw the described plot. Note that normally matplotlib only sets one legend entry for a complete bar graph. To have an entry for individual bars, a label needs to be set to each of them explicitly. In the code below such a label is added to each bar in the first set.
(Note that I left out one set of square parenthesis for result as in the original post it is a 3D list. If such a 3D list would be necessary, you could write the loop as for i, data in enumerate(result[0])).
import numpy as np
import matplotlib.pyplot as plt
result = [[0, 0, 0, 19, 7], [0, 0, 0, 21, 7], [0, 0, 0, 21, 7], [0, 0, 0, 29, 0]]
colors = ['blue', 'green', 'yellow', 'orange', 'red']
labels = ['0-20', '20-40', '40-60', '60-80', '80-100']
fig, ax = plt.subplots(figsize=(10, 6))
for i, data in enumerate(result):
x = np.arange(len(data)) + i*6
bars = ax.bar(x, data, color=colors, width=0.4)
if i == 0:
for bar, label in zip(bars, labels):
bar.set_label(label)
if i < len(result) - 1:
# separation line after each part, but not after the last
ax.axvline(4.75 + i*6, color='black', linestyle=':')
ax.set_xticks([])
ax.legend()
ax.set_title("Histogram")
ax.set_ylabel('Consume')
ax.set_xlabel('Percent')
plt.show()

Related

Sharing Y-axis in a matplotlib subplots

I have been trying to create a matplotlib subplot (1 x 3) with horizontal bar plots on either side of a lineplot.
It looks like this:
The code for generating the above plot -
u_list = [2, 0, 0, 0, 1, 5, 0, 4, 0, 0]
n_list = [0, 0, 1, 0, 4, 3, 1, 1, 0, 6]
arr_ = list(np.arange(10, 11, 0.1))
data_ = pd.DataFrame({
'points': list(np.arange(0, 10, 1)),
'value': [10.4, 10.5, 10.3, 10.7, 10.9, 10.5, 10.6, 10.3, 10.2, 10.4][::-1]
})
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 8))
ax1 = plt.subplot(1, 3, 1)
sns.barplot(u_list, arr_, orient="h", ax=ax1)
ax2 = plt.subplot(1, 3, 2)
x = data_['points'].tolist()
y = data_['value'].tolist()
ax2.plot(x, y)
ax2.set_yticks(arr_)
plt.gca().invert_yaxis()
ax3 = plt.subplot(1, 3, 3, sharey=ax1, sharex=ax1)
sns.barplot(n_list, arr_, orient="h", ax=ax3)
fig.tight_layout()
plt.show()
Edit
How do I share the y-axis of the central line plot with the other horizontal bar plots?
I would set the limits of all y-axes to the same range, set the ticks in all axes and than set the ticks/tick-labels of all but the most left axis to be empty. Here is what I mean:
from matplotlib import pyplot as plt
import numpy as np
u_list = [2, 0, 0, 0, 1, 5, 0, 4, 0, 0]
n_list = [0, 0, 1, 0, 4, 3, 1, 1, 0, 6]
arr_ = list(np.arange(10, 11, 0.1))
x = list(np.arange(0, 10, 1))
y = [10.4, 10.5, 10.3, 10.7, 10.9, 10.5, 10.6, 10.3, 10.2, 10.4]
fig, axs = plt.subplots(1, 3, figsize=(20, 8))
axs[0].barh(arr_,u_list,height=0.1)
axs[0].invert_yaxis()
axs[1].plot(x, y)
axs[1].invert_yaxis()
axs[2].barh(arr_,n_list,height=0.1)
axs[2].invert_yaxis()
for i in range(1,len(axs)):
axs[i].set_ylim( axs[0].get_ylim() ) # align axes
axs[i].set_yticks([]) # set ticks to be empty (no ticks, no tick-labels)
fig.tight_layout()
plt.show()
This is a minimal example and for the sake of conciseness, I refrained from mixing matplotlib and searborn. Since seaborn uses matplotlib under the hood, you can reproduce the same output there (but with nicer bars).

Plotting by ignoring missing data in matplotlib

I have been trying to make a program that plots the frequency of usage of a word during Whatsapp chats between 2 people. The word night for example has been used a couple of times on a few days, and 0 times on the most of the days. The graph I have is as follows
Here is the code
word_occurances = [0 for i in range(len(just_dates))]
for i in range(len(just_dates)):
for j in range(len(df_word)):
if just_dates[i].date() == word_date[j].date():
word_occurances[i] += 1
title = person2.rstrip(':') + ' with ' + person1.rstrip(':') + ' usage of the word - ' + word
plt.plot(just_dates, word_occurances, color = 'purple')
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title(title)
plt.savefig('Graphs/Words/' + title + '.jpg', dpi = 200)
plt.show()
word_occurances is a list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
What I want is for the graph to only connect the points where it has been used while showing the entire timeline on the x axis. I don't want the graph to touch 0. How can I do this? I have searched and found similar answers but none have worked the way I them.
You simply have to find the indices of word_occurances on which the corresponding value is greater than zero. With this you can index just_dates to get the corresponding dates.
word_counts = [] # Only word counts > 0
dates = [] # Date of > 0 word count
for i, val in enumerate(word_occurances):
if val > 0:
word_counts.append(val)
dates.append(just_dates[i])
You may want to plot with an underlying bar plot in order to maintain the original scale.
plt.bar(just_dates, word_occurances)
plt.plot(dates, word_counts, 'r--')
One way to address this is to plot only data that contain entries but label all dates where a conversation took place to indicate the zero values in your graph:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
#fake data generation, this block just imitates your unknown data and can be deleted
import numpy as np
import pandas as pd
np.random.seed(12345)
n = 30
just_dates = pd.to_datetime(np.random.randint(1, 100, n)+18500, unit="D").sort_values().to_list()
word_occurances = [0]*n
for i in range(10):
word_occurances[np.random.randint(n)] = np.random.randint(1, 10)
fig, ax = plt.subplots(figsize=(15,5))
#generate data to plot by filtering out zero values
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
#plot these data with marker to indicate each point
#think 1-1-1-1-1 would only be visible as two points with lines only
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
#label all dates where conversations took place
ax.xaxis.set_major_locator(FixedLocator(mdates.date2num(just_dates)))
#prevent that matplotlib autoscales the y-axis
ax.set_ylim(0, )
ax.tick_params(axis="x", labelrotation= 90)
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:
This can get quite busy soon with all these date labels (and might or might not work with your datetime objects in just_dates that might differ in structure from my sample date). Another way would be to indicate each conversation with vlines:
...
fig, ax = plt.subplots(figsize=(15,5))
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
ax.vlines((just_dates), 0, max(word_occurances), color="red", ls="--")
ax.set_ylim(0, )
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:

How do I color background based on 1 or 0 in Python

I want the background of the graph of x to be grey when y=1 and white when y=0
#some random data
x = np.random.random(12)
#0's and 1's
y = [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]
plt.plot(np.linspace(0, 12, 12), x);
So it looks something like this in stead of this
You can try manually drawing the rectangles using a loop:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
idx = np.linspace(0, 12, 12)
x = np.random.random(12)
y = [0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1]
fig, ax = plt.subplots(1)
ax.plot(idx, x)
rect_height = np.max(x)
rect_width = 1
for i, draw_rect in enumerate(y):
if draw_rect:
rect = patches.Rectangle(
(i, 0),
rect_width,
rect_height,
linewidth=1,
edgecolor='grey',
facecolor='grey',
fill=True
)
ax.add_patch(rect)
plt.show()

Pyplot contourf don't fill in "0" level

I'm plotting precipitation data from weather model output. I'm contouring the data I have, using contourf. However, I don't want it to fill in the "0" level with color (only the values >0). Is there a good way to do this? I've tried messing around with the levels.
Here's the code I'm using to plot:
m = Basemap(projection='stere', lon_0=centlon, lat_0=centlat,
lat_ts=centlat, width=width, height=height)
m.drawcoastlines()
m.drawstates()
m.drawcountries()
parallels = np.arange(0., 90, 10.)
m.drawparallels(parallels, labels=[1, 0, 0, 0], fontsize=10)
meridians = np.arange(180., 360, 10.)
m.drawmeridians(meridians, labels=[0, 0, 0, 1], fontsize=10)
lons, lats = m.makegrid(nx, ny)
x, y = m(lons, lats)
cs = m.contourf(x, y, snowfall)
cbar = plt.colorbar(cs)
cbar.ax.set_ylabel("Accumulated Snow (km/m^2)")
plt.show()
And here's the image I'm getting.
An example snowfall dataset would look something like:
0 0 0 0 0 0
0 0 1 1 1 0
0 1 2 2 1 0
0 2 3 2 1 0
0 1 0 1 2 0
0 0 0 0 0 0
This can also be achieved using 'locator' with MaxNLocator('prune = 'lower') from the ticker subclass. See docs.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'))
fig.colorbar(p)
plt.show()
Image of output
The 'nbins' parameter can be used to control the number of intervals (levels)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'), nbins = 5)
If you don't include 0 in your levels, you won't plot a contour at the 0 level.
For example:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, levels=np.linspace(0.5, 3.0, 11))
fig.colorbar(p)
plt.show()
yields:
An alternative is to mask any datapoints which are 0:
p = ax.contourf(np.ma.masked_array(a, mask=(a==0)),
levels=np.linspace(0.0, 3.0, 13))
fig.colorbar(p)
Which looks like:
I suppose its up to you which of those matches your desired plot the most.
I was able to figure things out myself, there are two ways I found of solving this problem.
Mask out all data <0.01 from the data set using
np.ma.masked_less(snowfall, 0.01)
or
Set the levels of the plot to be from 0.01 -> whatever maximum value
levels = np.linspace(0.1, 10, 100)
then
cs = m.contourf(x, y, snowfall, levels)
I found that option 1 worked best for me.

How to annotate a stacked bar plot and add legend labels

In short:
Height of bars does not match the numbers.
Labels seem to be placed on the wrong height. (should be right in the middle of each bar)
On the very bottom I also see the '0' labels which I really don't want to see in the graph.
Explained:
I'm trying to make a stacked bar chart and label each bar with it's appropriate value in it. But for some reason the height of the bars is completely wrong. Like for the first week the green bar should be 20 points long but it is only 10. And the red bar should be 10 points long but it is only 8 or so. And week 17 should have multiple bars in it but instead has only one (the white one)
I am guessing that because of the wrong bar heights the labels are misplaced too. I have no idea why the 0's on the very bottom are also showing but that's a problem too.
I don't know if these are all separate questions and should be asked in separate posts, but I feel like they are all connected and that there is an answer that solves them all.
import matplotlib.pyplot as plt
import numpy as np
newYearWeek =[201613, 201614, 201615, 201616, 201617, 201618, 201619, 201620, 201621, 201622]
uniqueNames = ['Word1', 'Word2', 'Word3', 'Word4', 'Word5', 'Word6',
'Word7', 'Word8', 'Word9', 'Word10', 'Word11']
#Each column in the multiarray from top to bottom represents 1 week
#Each row from left to right represents the values of that word.
#So that makes 11 rows and 10 columns.
#And yes the multidimensional array have to be like this with the 0's in it.
keywordsMuliarray = [
[20, 3, 1, 0, 0, 1, 6, 3, 1, 2],
[10, 1, 0, 0, 3, 1, 3, 1, 0, 2],
[2, 2, 5, 3, 5, 4, 5, 4, 3, 2],
[0, 4, 3, 3, 1, 0, 2, 7, 1, 2],
[0, 0, 2, 0, 1, 1, 1, 0, 1, 3],
[0, 0, 3, 2, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 7, 6, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 1]]
fig = plt.figure(figsize=(8.5, 5.5))
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
N = len(newYearWeek)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat',
'orange', 'mediumslateblue', 'silver',
'whitesmoke', 'black', 'darkkhaki', 'dodgerblue', 'crimson',
'sage', 'navy', 'plum', 'darkviolet', 'lightpink']
def autolabel(rects, values):
# Attach some text labels.
for (rect, value) in zip(rects, values):
ax.text(rect.get_x() + rect.get_width() / 2.,
rect.get_y() + rect.get_height() / 2.,
'%d'%value,
ha = 'center',
va = 'center')
left = np.zeros(len(uniqueNames)) # left alignment of data starts at zero
helpingNumber = 0
for i in range(0, len(newYearWeek)):
rects1 = plt.bar(ind, keywordsMuliarray[helpingNumber][:],width, color=colors[helpingNumber], label=uniqueNames[helpingNumber])
autolabel(rects1, keywordsMuliarray[helpingNumber][:])
helpingNumber = helpingNumber+1
# Shrink current axis by 20%
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 1, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
#plt.ylabel('Scores')
plt.xticks(ind + width/2., newYearWeek, fontsize=8)
#plt.yticks(np.arange(0, 81, 10))
plt.margins(x=0.02)
plt.tight_layout(rect=[0,0,0.8,1])
plt.show()
This is how the graph looks now:
To make what you want you have to sum heights of all previous bars in current column (list bot_heights), like here:
import matplotlib.pyplot as plt
import numpy as np
newYearWeek =[201613, 201614, 201615, 201616, 201617, 201618, 201619, 201620, 201621, 201622]
uniqueNames = ['Word1', 'Word2', 'Word3', 'Word4', 'Word5', 'Word6',
'Word7', 'Word8', 'Word9', 'Word10', 'Word11']
#Each column in the multiarray from top to bottom represents 1 week
#Each row from left to right represents the values of that word.
#So that makes 11 rows and 10 columns.
#And yes the multidimensional array have to be like this with the 0's in it.
keywordsMuliarray = [
[20, 3, 1, 0, 0, 1, 6, 3, 1, 2],
[10, 1, 0, 0, 3, 1, 3, 1, 0, 2],
[2, 2, 5, 3, 5, 4, 5, 4, 3, 2],
[0, 4, 3, 3, 1, 0, 2, 7, 1, 2],
[0, 0, 2, 0, 1, 1, 1, 0, 1, 3],
[0, 0, 3, 2, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 7, 6, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 1]]
fig = plt.figure(figsize=(8.5, 5.5))
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
N = len(newYearWeek)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat',
'orange', 'mediumslateblue', 'silver',
'whitesmoke', 'black', 'darkkhaki', 'dodgerblue', 'crimson',
'sage', 'navy', 'plum', 'darkviolet', 'lightpink']
def autolabel(rects, values):
# Attach some text labels
for (rect, value) in zip(rects, values):
if value > 0:
ax.text(rect.get_x() + rect.get_width() / 2.,
rect.get_y() + rect.get_height() / 2.,
'%d'%value, ha = 'center', va = 'center', size = 9)
left = np.zeros(len(uniqueNames)) # left alignment of data starts at zero
# plot the first bars
rects1 = plt.bar(ind, keywordsMuliarray[0][:],width,
color=colors[0], label=uniqueNames[0])
autolabel(rects1, keywordsMuliarray[0][:])
# put other bars on previuos
bot_heights = [0.] * len(keywordsMuliarray[0][:])
for i in xrange(1,N):
bot_heights = [bot_heights[j] + keywordsMuliarray[i-1][j] for j in xrange(len(bot_heights))]
rects1 = plt.bar(ind, keywordsMuliarray[i][:],width,
color=colors[i], label=uniqueNames[i],
bottom=bot_heights)
autolabel(rects1, keywordsMuliarray[i][:])
# Shrink current axis by 20%
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 1, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
#plt.ylabel('Scores')
plt.xticks(ind + width/2., newYearWeek, fontsize=8)
plt.yticks(np.arange(0, 41, 5))
plt.margins(x=0.02)
plt.tight_layout(rect=[0,0,0.8,1])
plt.show()
To prevent overlapping of bar labels I recommend you do not add a label if a value is zero (look to modified autolabel function). As a result I get:
The other answer doesn't plot data for 'Word11'
Lists and arrays of data can most easily be plotted by loading them into pandas
Plot the dataframe with pandas.DataFrame.plot and kind='bar'
When plotting data from pandas, the index values become the axis tick labels and the column names are the segment labels
matplotlib.pyplot.bar_label can be used to add annotations
See Adding value labels on a matplotlib bar chart for more options using .bar_label.
Tested in pandas 1.3.1, python 3.81., and matplotlib 3.4.21.
Minimum version required
labels = [f'{v.get_height():0.0f}' if v.get_height() > 0 else '' for v in c ] without the assignment expression (:=).
import pandas as pd
import matplotlib.pyplot as plt
# create a dataframe from the data in the OP and transpose it with .T
df = pd.DataFrame(data=keywordsMuliarray, index=uniqueNames, columns=newYearWeek).T
# display(df.head())
Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word8 Word9 Word10 Word11
201613 20 10 2 0 0 0 1 0 0 0 0
201614 3 1 2 4 0 0 0 0 1 0 0
201615 1 0 5 3 2 3 1 0 0 0 0
201616 0 0 3 3 0 2 0 1 0 0 0
201617 0 3 5 1 1 0 1 0 7 0 0
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat', 'orange', 'mediumslateblue', 'silver', 'whitesmoke', 'black', 'darkkhaki']
# plot the dataframe
ax = df.plot(kind='bar', stacked=True, figsize=(9, 6), color=colors, rot=0, ec='k')
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
# add annotations
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{h:0.0f}' if (h := v.get_height()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center', fontsize=8)
plt.show()

Categories

Resources