Related
I have two data arrays and I am looking to plot them in a single plot using matplotlib
The data arrays are:
date_array=['2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29', '2018-04-02', '2018-04-03', '2018-04-04', '2018-04-05', '2018-04-06', '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-20', '2018-04-23', '2018-04-24', '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-30', '2018-05-01', '2018-05-02', '2018-05-03', '2018-05-04', '2018-05-07', '2018-05-08', '2018-05-09', '2018-05-10', '2018-05-11', '2018-05-14', '2018-05-15', '2018-05-16', '2018-05-17', '2018-05-18', '2018-05-21', '2018-05-22', '2018-05-23', '2018-05-24', '2018-05-25', '2018-05-29', '2018-05-30', '2018-05-31', '2018-06-01', '2018-06-04', '2018-06-05', '2018-06-06', '2018-06-07', '2018-06-08', '2018-06-11', '2018-06-12', '2018-06-13', '2018-06-14', '2018-06-15', '2018-06-18', '2018-06-19', '2018-06-20', '2018-06-21', '2018-06-22', '2018-06-25', '2018-06-26', '2018-06-27', '2018-06-28', '2018-06-29', '2018-07-02', '2018-07-03', '2018-07-05', '2018-07-06', '2018-07-09', '2018-07-10', '2018-07-11', '2018-07-12', '2018-07-13', '2018-07-16', '2018-07-17', '2018-07-18', '2018-07-19', '2018-07-20', '2018-07-23', '2018-07-24', '2018-07-25', '2018-07-26', '2018-07-27', '2018-07-30', '2018-07-31', '2018-08-01', '2018-08-02', '2018-08-03', '2018-08-06', '2018-08-07', '2018-08-08', '2018-08-09', '2018-08-10', '2018-08-13', '2018-08-14', '2018-08-15']
value_1 = [45.27, 44.53, 44.68, 45.29, 44.43, 44.88, 45.85, 45.7, 44.76, 44.22, 44.81, 44.54, 44.13, 44.0, 43.41, 43.68, 43.29, 42.33, 42.18, 41.8, 41.78, 42.46, 43.67, 43.92, 44.75, 44.33, 44.41, 45.7, 43.8, 44.16, 44.9, 45.07, 46.24, 48.3, 49.21, 49.84, 50.34, 50.4, 49.98, 50.7, 49.15, 48.5, 48.53, 47.65, 48.52, 47.36, 46.13, 46.01, 47.27, 48.04, 49.48, 49.96, 50.48, 51.3, 52.29, 51.86, 50.2, 49.42, 50.0, 52.42, 52.32, 52.62, 52.13, 51.13, 50.24, 48.66, 48.99, 48.05, 48.33, 49.22, 50.62, 51.39, 51.87, 47.37, 49.53, 49.54, 51.82, 51.65, 52.98, 52.09, 54.24, 53.98, 52.72, 51.09, 49.99, 48.55, 47.98, 48.67, 48.87, 48.45, 48.65, 50.06, 52.64, 54.6, 56.61, 55.77, 55.59, 56.5, 56.31, 54.0]
value_2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95.39398869716304, 95.39398869716304, 0, 0, 95.39398869716304, 95.39398869716304, 0, 0, 0, 0, 0, 0, 0, 95.39398869716304]
The thing is that I have data points available for value_1 for all dates in date_array but not for value_2 so wherever I don't have the value available I have filled in a zero (That is one of my question as you'll see later).
When I plot it using this code:
x = date_array
y1 = value_1
y2 = value_2
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(x, y1, s=10, c='b', marker="s", label='fig 1')
ax1.scatter(x,y2, s=10, c='r', marker="o", label='fig 2')
plt.legend(loc='upper left');
plt.show()
I get this:
My questions:
How do I work my around the fact that I don't have all values available for value_2 and still get the plot? I don't want the red dots to show that have value 0 in the plot but am not sure how I'll get around to do that. Note An entry in value_2 can't have 0 value so if it is 0 that means its not present.
How to fix the messed up data labels on x-axis? If there are only 10-12 markers on the x-axis that would look neater.
Thanks!
You can convert the zeros to NaN and they wont be plotted:
value_2 = [np.nan if x==0 else x for x in value_2]
For the second questions, I would transform to datetime object and the distance is adjusted automatically(and after rotate them):
from datetime import datetime
date_array = [datetime.strptime(i, '%Y-%m-%d').date() for i in date_array]
plt.xticks(rotation=70)
Complete code:
import matplotlib.pyplot as plt
from datetime import datetime
date_array = [datetime.strptime(i, '%Y-%m-%d').date() for i in date_array]
value_2 = [np.nan if x==0 else x for x in value_2]
x = date_array
y1 = value_1
y2 = value_2
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot_date(x, y1, c='b', label='fig 1')
ax1.plot_date(x, y2, c='r', label='fig 2')
plt.legend(loc='upper left')
plt.xticks(rotation=70)
plt.show()
I want the background of the graph of x to be grey when y=1 and white when y=0
#some random data
x = np.random.random(12)
#0's and 1's
y = [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]
plt.plot(np.linspace(0, 12, 12), x);
So it looks something like this in stead of this
You can try manually drawing the rectangles using a loop:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
idx = np.linspace(0, 12, 12)
x = np.random.random(12)
y = [0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1]
fig, ax = plt.subplots(1)
ax.plot(idx, x)
rect_height = np.max(x)
rect_width = 1
for i, draw_rect in enumerate(y):
if draw_rect:
rect = patches.Rectangle(
(i, 0),
rect_width,
rect_height,
linewidth=1,
edgecolor='grey',
facecolor='grey',
fill=True
)
ax.add_patch(rect)
plt.show()
I'm plotting precipitation data from weather model output. I'm contouring the data I have, using contourf. However, I don't want it to fill in the "0" level with color (only the values >0). Is there a good way to do this? I've tried messing around with the levels.
Here's the code I'm using to plot:
m = Basemap(projection='stere', lon_0=centlon, lat_0=centlat,
lat_ts=centlat, width=width, height=height)
m.drawcoastlines()
m.drawstates()
m.drawcountries()
parallels = np.arange(0., 90, 10.)
m.drawparallels(parallels, labels=[1, 0, 0, 0], fontsize=10)
meridians = np.arange(180., 360, 10.)
m.drawmeridians(meridians, labels=[0, 0, 0, 1], fontsize=10)
lons, lats = m.makegrid(nx, ny)
x, y = m(lons, lats)
cs = m.contourf(x, y, snowfall)
cbar = plt.colorbar(cs)
cbar.ax.set_ylabel("Accumulated Snow (km/m^2)")
plt.show()
And here's the image I'm getting.
An example snowfall dataset would look something like:
0 0 0 0 0 0
0 0 1 1 1 0
0 1 2 2 1 0
0 2 3 2 1 0
0 1 0 1 2 0
0 0 0 0 0 0
This can also be achieved using 'locator' with MaxNLocator('prune = 'lower') from the ticker subclass. See docs.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'))
fig.colorbar(p)
plt.show()
Image of output
The 'nbins' parameter can be used to control the number of intervals (levels)
p = ax.contourf(a, locator = ticker.MaxNLocator(prune = 'lower'), nbins = 5)
If you don't include 0 in your levels, you won't plot a contour at the 0 level.
For example:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 1, 2, 2, 1, 0],
[0, 2, 3, 2, 1, 0],
[0, 1, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0]
])
fig, ax = plt.subplots(1)
p = ax.contourf(a, levels=np.linspace(0.5, 3.0, 11))
fig.colorbar(p)
plt.show()
yields:
An alternative is to mask any datapoints which are 0:
p = ax.contourf(np.ma.masked_array(a, mask=(a==0)),
levels=np.linspace(0.0, 3.0, 13))
fig.colorbar(p)
Which looks like:
I suppose its up to you which of those matches your desired plot the most.
I was able to figure things out myself, there are two ways I found of solving this problem.
Mask out all data <0.01 from the data set using
np.ma.masked_less(snowfall, 0.01)
or
Set the levels of the plot to be from 0.01 -> whatever maximum value
levels = np.linspace(0.1, 10, 100)
then
cs = m.contourf(x, y, snowfall, levels)
I found that option 1 worked best for me.
In short:
Height of bars does not match the numbers.
Labels seem to be placed on the wrong height. (should be right in the middle of each bar)
On the very bottom I also see the '0' labels which I really don't want to see in the graph.
Explained:
I'm trying to make a stacked bar chart and label each bar with it's appropriate value in it. But for some reason the height of the bars is completely wrong. Like for the first week the green bar should be 20 points long but it is only 10. And the red bar should be 10 points long but it is only 8 or so. And week 17 should have multiple bars in it but instead has only one (the white one)
I am guessing that because of the wrong bar heights the labels are misplaced too. I have no idea why the 0's on the very bottom are also showing but that's a problem too.
I don't know if these are all separate questions and should be asked in separate posts, but I feel like they are all connected and that there is an answer that solves them all.
import matplotlib.pyplot as plt
import numpy as np
newYearWeek =[201613, 201614, 201615, 201616, 201617, 201618, 201619, 201620, 201621, 201622]
uniqueNames = ['Word1', 'Word2', 'Word3', 'Word4', 'Word5', 'Word6',
'Word7', 'Word8', 'Word9', 'Word10', 'Word11']
#Each column in the multiarray from top to bottom represents 1 week
#Each row from left to right represents the values of that word.
#So that makes 11 rows and 10 columns.
#And yes the multidimensional array have to be like this with the 0's in it.
keywordsMuliarray = [
[20, 3, 1, 0, 0, 1, 6, 3, 1, 2],
[10, 1, 0, 0, 3, 1, 3, 1, 0, 2],
[2, 2, 5, 3, 5, 4, 5, 4, 3, 2],
[0, 4, 3, 3, 1, 0, 2, 7, 1, 2],
[0, 0, 2, 0, 1, 1, 1, 0, 1, 3],
[0, 0, 3, 2, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 7, 6, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 1]]
fig = plt.figure(figsize=(8.5, 5.5))
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
N = len(newYearWeek)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat',
'orange', 'mediumslateblue', 'silver',
'whitesmoke', 'black', 'darkkhaki', 'dodgerblue', 'crimson',
'sage', 'navy', 'plum', 'darkviolet', 'lightpink']
def autolabel(rects, values):
# Attach some text labels.
for (rect, value) in zip(rects, values):
ax.text(rect.get_x() + rect.get_width() / 2.,
rect.get_y() + rect.get_height() / 2.,
'%d'%value,
ha = 'center',
va = 'center')
left = np.zeros(len(uniqueNames)) # left alignment of data starts at zero
helpingNumber = 0
for i in range(0, len(newYearWeek)):
rects1 = plt.bar(ind, keywordsMuliarray[helpingNumber][:],width, color=colors[helpingNumber], label=uniqueNames[helpingNumber])
autolabel(rects1, keywordsMuliarray[helpingNumber][:])
helpingNumber = helpingNumber+1
# Shrink current axis by 20%
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 1, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
#plt.ylabel('Scores')
plt.xticks(ind + width/2., newYearWeek, fontsize=8)
#plt.yticks(np.arange(0, 81, 10))
plt.margins(x=0.02)
plt.tight_layout(rect=[0,0,0.8,1])
plt.show()
This is how the graph looks now:
To make what you want you have to sum heights of all previous bars in current column (list bot_heights), like here:
import matplotlib.pyplot as plt
import numpy as np
newYearWeek =[201613, 201614, 201615, 201616, 201617, 201618, 201619, 201620, 201621, 201622]
uniqueNames = ['Word1', 'Word2', 'Word3', 'Word4', 'Word5', 'Word6',
'Word7', 'Word8', 'Word9', 'Word10', 'Word11']
#Each column in the multiarray from top to bottom represents 1 week
#Each row from left to right represents the values of that word.
#So that makes 11 rows and 10 columns.
#And yes the multidimensional array have to be like this with the 0's in it.
keywordsMuliarray = [
[20, 3, 1, 0, 0, 1, 6, 3, 1, 2],
[10, 1, 0, 0, 3, 1, 3, 1, 0, 2],
[2, 2, 5, 3, 5, 4, 5, 4, 3, 2],
[0, 4, 3, 3, 1, 0, 2, 7, 1, 2],
[0, 0, 2, 0, 1, 1, 1, 0, 1, 3],
[0, 0, 3, 2, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 7, 6, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 2, 0, 1]]
fig = plt.figure(figsize=(8.5, 5.5))
ax = fig.add_subplot(111)
fig.subplots_adjust(top=0.85)
N = len(newYearWeek)
ind = np.arange(N) # the x locations for the groups
width = 0.35 # the width of the bars: can also be len(x) sequence
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat',
'orange', 'mediumslateblue', 'silver',
'whitesmoke', 'black', 'darkkhaki', 'dodgerblue', 'crimson',
'sage', 'navy', 'plum', 'darkviolet', 'lightpink']
def autolabel(rects, values):
# Attach some text labels
for (rect, value) in zip(rects, values):
if value > 0:
ax.text(rect.get_x() + rect.get_width() / 2.,
rect.get_y() + rect.get_height() / 2.,
'%d'%value, ha = 'center', va = 'center', size = 9)
left = np.zeros(len(uniqueNames)) # left alignment of data starts at zero
# plot the first bars
rects1 = plt.bar(ind, keywordsMuliarray[0][:],width,
color=colors[0], label=uniqueNames[0])
autolabel(rects1, keywordsMuliarray[0][:])
# put other bars on previuos
bot_heights = [0.] * len(keywordsMuliarray[0][:])
for i in xrange(1,N):
bot_heights = [bot_heights[j] + keywordsMuliarray[i-1][j] for j in xrange(len(bot_heights))]
rects1 = plt.bar(ind, keywordsMuliarray[i][:],width,
color=colors[i], label=uniqueNames[i],
bottom=bot_heights)
autolabel(rects1, keywordsMuliarray[i][:])
# Shrink current axis by 20%
box = ax.get_position()
ax.set_position([box.x0, box.y0, box.width * 1, box.height])
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
#plt.ylabel('Scores')
plt.xticks(ind + width/2., newYearWeek, fontsize=8)
plt.yticks(np.arange(0, 41, 5))
plt.margins(x=0.02)
plt.tight_layout(rect=[0,0,0.8,1])
plt.show()
To prevent overlapping of bar labels I recommend you do not add a label if a value is zero (look to modified autolabel function). As a result I get:
The other answer doesn't plot data for 'Word11'
Lists and arrays of data can most easily be plotted by loading them into pandas
Plot the dataframe with pandas.DataFrame.plot and kind='bar'
When plotting data from pandas, the index values become the axis tick labels and the column names are the segment labels
matplotlib.pyplot.bar_label can be used to add annotations
See Adding value labels on a matplotlib bar chart for more options using .bar_label.
Tested in pandas 1.3.1, python 3.81., and matplotlib 3.4.21.
Minimum version required
labels = [f'{v.get_height():0.0f}' if v.get_height() > 0 else '' for v in c ] without the assignment expression (:=).
import pandas as pd
import matplotlib.pyplot as plt
# create a dataframe from the data in the OP and transpose it with .T
df = pd.DataFrame(data=keywordsMuliarray, index=uniqueNames, columns=newYearWeek).T
# display(df.head())
Word1 Word2 Word3 Word4 Word5 Word6 Word7 Word8 Word9 Word10 Word11
201613 20 10 2 0 0 0 1 0 0 0 0
201614 3 1 2 4 0 0 0 0 1 0 0
201615 1 0 5 3 2 3 1 0 0 0 0
201616 0 0 3 3 0 2 0 1 0 0 0
201617 0 3 5 1 1 0 1 0 7 0 0
colors = ['seagreen', 'indianred', 'steelblue', 'darkmagenta', 'wheat', 'orange', 'mediumslateblue', 'silver', 'whitesmoke', 'black', 'darkkhaki']
# plot the dataframe
ax = df.plot(kind='bar', stacked=True, figsize=(9, 6), color=colors, rot=0, ec='k')
# Put a legend to the right of the current axis
ax.legend(loc='center left', fontsize=9, bbox_to_anchor=(1, 0.5))
# add annotations
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{h:0.0f}' if (h := v.get_height()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center', fontsize=8)
plt.show()
I have following unsorted dict (dates are keys):
{"23-09-2014": 0, "11-10-2014": 0, "30-09-2014": 0, "26-09-2014": 0,
"03-10-2014": 0, "19-10-2014": 0, "15-10-2014": 0, "22-09-2014": 0,
"17-10-2014": 0, "29-09-2014": 0, "13-10-2014": 0, "16-10-2014": 0,
"12-10-2014": 0, "25-09-2014": 0, "14-10-2014": 0, "08-10-2014": 0,
"02-10-2014": 0, "09-10-2014": 0, "18-10-2014": 0, "24-09-2014": 0,
"28-09-2014": 0, "10-10-2014": 0, "21-10-2014": 0, "20-10-2014": 0,
"06-10-2014": 0, "04-10-2014": 0, "27-09-2014": 0, "05-10-2014": 0,
"01-10-2014": 0, "07-10-2014": 0}
I am trying to sort it from oldest to newest.
I've tried code:
mydict = OrderedDict(sorted(mydict .items(), key=lambda t: t[0], reverse=True))
to sort it, and it almost worked. It produced sorted dict, but it has ignored months:
{"01-10-2014": 0, "02-10-2014": 0, "03-10-2014": 0, "04-10-2014": 0,
"05-10-2014": 0, "06-10-2014": 0, "07-10-2014": 0, "08-10-2014": 0,
"09-10-2014": 0, "10-10-2014": 0, "11-10-2014": 0, "12-10-2014": 0,
"13-10-2014": 0, "14-10-2014": 0, "15-10-2014": 0, "16-10-2014": 0,
"17-10-2014": 0, "18-10-2014": 0, "19-10-2014": 0, "20-10-2014": 0,
"21-10-2014": 0, "22-09-2014": 0, "23-09-2014": 0, "24-09-2014": 0,
"25-09-2014": 0, "26-09-2014": 0, "27-09-2014": 0, "28-09-2014": 0,
"29-09-2014": 0, "30-09-2014": 0}
How can I fix this?
EDIT:
I need this to count objects created in django application in past X days, for each day.
event_chart = {}
date_list = [datetime.datetime.today() - datetime.timedelta(days=x) for x in range(0, 30)]
for date in date_list:
event_chart[formats.date_format(date, "SHORT_DATE_FORMAT")] = Event.objects.filter(project=project_name, created=date).count()
event_chart = OrderedDict(sorted(event_chart.items(), key=lambda t: t[0]))
return HttpResponse(json.dumps(event_chart))
You can use the datetime module to parse the strings into actual dates:
>>> from datetime import datetime
>>> sorted(mydict .items(), key=lambda t:datetime.strptime(t[0], '%d-%m-%Y'), reverse=True)
If you want to create a json response in the format: {"22-09-2014": 0, 23-09-2014": 0, "localized date": count_for_that_date} so that oldest dates will appear earlier in the output then you could make event_chart an OrderedDict:
event_chart = OrderedDict()
today = DT.date.today() # use DT.datetime.combine(date, DT.time()) if needed
for day in range(29, -1, -1): # last 30 days
date = today - DT.timedelta(days=day)
localized_date = formats.date_format(date, "SHORT_DATE_FORMAT")
day_count = Event.objects.filter(project=name, created=date).count()
event_chart[localized_date] = day_count
return HttpResponse(json.dumps(event_chart))