How to plot scatter graph with SCATTER fill_between in Python?

How to plot scatter graph with SCATTER fill_between in Python? - python

I am a manufacturing engineer, very new to Python and Matplotlib. Currently, I am trying to plot a scatter time graph, where for every single record, I have the data (read from a sensor) and upper and lower limits for that data that will stop the tool if data is not between them.
So for a simple set of data like this:
time = [1, 2, 3, 7, 8, 9, 10]*
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
When the tool is not working, nothing will be recorded, hence a gap b/w 3 & 7 in time records.
The desired graph would look like this:
A few rules that I am trying to stick to:
All three graphs (data, upper_limit, and lower_limit) are required to be scattered points and not lines, with the x-axis (time) being shared among them. - required.
A green highlight that fills between upper and lower limits, considering only the two points with the same time for each highlight. - highly recommended.
(I tried matplotlib.fill_between, but it creates a polygon between trend lines, rather than straight vertical lines between matching pairs of L.L. & U.L. dots. Therefore, it won't be accurate, and it will fill up the gap b/w times 3s and 7s, which is not desired. Also, I tried to use matplot.bar for limits along the scatter plot for the 'data', but I was not able to set a minimum = lower_limit for the bars.)
When the value of data is not equal to or between the limits, the representing dot should appear in red, rather than the original color. -highly recommended.
So, with all of that in mind, and thousands of records per day, a regular graph, for a 24hr time span, should look like the following: (notice the gap due to possible lack of records in a time span, as well as vertical green lines, for the limits.)
Thanks for your time and help!

This is a version using numpys masking and matplotlibs errorbar
import matplotlib.pyplot as plt
import numpy as np
time = np.array( [0, 1, 2, 3, 7, 8, 9, 10] )
data = np.array([2, 5, 6, 5, 5, 6, 7, 8] )
lower = np.array([4, 4, 4, 5, 5, 5, 5, 5] )
upper = np.array([6, 6, 6, 6, 7, 7, 7, 7] )
nn = len( lower )
delta = upper - lower
### creating masks
inside = ( ( upper - data ) >= 0 ) & ( ( data - lower ) >= 0 )
outside = np.logical_not( inside )
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.errorbar( time, lower, yerr=( nn*[0], delta), ls='', ecolor="#00C023" )
ax.scatter( time[ inside ], data[ inside ], c='k' )
ax.scatter( time[ outside ], data[ outside ], c='r' )
plt.show()

Something like this should work, plotting each component separately:
time = [1, 2, 3, 7, 8, 9, 10]
data = [5, 6, 5, 5, 6, 7, 8]
lower_limit = [4, 4, 5, 5, 5, 5, 5]
upper_limit = [6, 6, 6, 7, 7, 7, 7]
# put data into dataframe and identify which points are out of range (not between the lower and upper limit)
df = pd.DataFrame({'time': time, 'data': data, 'll': lower_limit, 'ul': upper_limit})
df.loc[:, 'in_range'] = 0
df.loc[((df['data'] >= df['ll']) & (df['data'] <= df['ul'])), 'in_range'] = 1
# make the plot
fig, ax = plt.subplots()
# plot lower-limit and upper-limit points
plt.scatter(df['time'], df['ll'], c='green')
plt.scatter(df['time'], df['ul'], c='green')
# plot data points in range
plt.scatter(df.loc[df['in_range']==1, :]['time'], df.loc[df['in_range']==1, :]['data'], c='black')
# plot data points out of range (in red)
plt.scatter(df.loc[df['in_range']==0, :]['time'], df.loc[df['in_range']==0, :]['data'], c='red')
# plot lines between lower limit and upper limit
plt.plot((df['time'],df['time']),([i for i in df['ll']], [j for j in df['ul']]), c='lightgreen')

Related

Plotting averages of box plots as a box plot

I have a set of lists (about 100) of the form [6, 17, 5, 1, 4, 7, 14, 19, 0, 10] and I want to get one box plot which plots the averages of box-plot information (i.e. median, max, min, Q1, Q3, outliers) of all of the lists.
For example, if I have 2 lists
l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]
I can get averages of max, median, and min of the lists as follows:
maxs = np.array([])
mins = np.array([])
medians = np.array([])
for l in [l1, l2]:
medians = np.append(medians, np.median(l))
maxs = np.append(maxs, np.max(l))
mins = np.append(mins, np.min(l))
averMax = np.mean(maxs)
averMin = np.mean(mins)
averMedian = np.mean(medians)
I should do the same for other info in the box plot such as average Q1, average Q3. I then need to use this information (averMax, averMin, etc.) to plot just one single box plot (not multiple box plots in one graph).
I know from Draw Box-Plot with matplotlib that you don't have to calculate the values for a normal box plot. You just need to specify the data as a variable.
Is it possible to do the same for my case instead of manually calculating the averages of the values of all the lists?

pd.describe() will get the quartiles, so you can make a graph based on them. I customized the calculated numbers with the help of this answer and the example graph from the official reference.
import pandas as pd
import numpy as np
import io
l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]
df = pd.DataFrame({'l1':l1, 'l2':l2}, index=np.arange(len(l1)))
df.describe()
l1 l2
count 10.000000 10.000000
mean 8.300000 8.400000
std 6.532823 5.561774
min 0.000000 0.000000
25% 4.250000 4.250000
50% 6.500000 7.500000
75% 13.000000 13.500000
max 19.000000 16.000000
import matplotlib.pyplot as plt
# spread,center, filer_high, flier_low
x1 = [l1[4]-1.5*(l1[6]-l1[4]), l1[4], l1[5], l1[5]+1.5*(l1[6]-l1[4])]
x2 = [l2[4]-1.5*(l2[6]-l2[4]), l2[4], l2[5], l2[5]+1.5*(l2[6]-l2[4])]
fig = plt.figure(figsize=(8,6))
plt.boxplot([x for x in [x1, x2]], 0, 'rs', 1)
plt.xticks([y+1 for y in range(len([x1, x2]))], ['x1', 'x2'])
plt.xlabel('measurement x')
t = plt.title('Box plot')
plt.show()

DOATools.py - Using my own signal source (NOT generated)

I'm using doatools.py library (https://github.com/morriswmz/doatools.py)
Now, my code looks like:
import numpy as np
from scipy import constants as const
import math
import doatools.model as model
import doatools.estimation as estimation
def calculate_wavelength(frequency):
return const.speed_of_light / frequency
# Uniform circular array
# X
# |
# X---------X
# |
# X
NUMBER_OF_ELEMENTS = 4 # elements are shown as "X"
RADIUS = 0.47 / 2
FREQ_MHZ = 315
freq = FREQ_MHZ * const.mega
wavelength = calculate_wavelength(freq)
antenna_array = model.UniformCircularArray(NUMBER_OF_ELEMENTS, RADIUS)
# Create a MUSIC-based estimator.
grid = estimation.FarField1DSearchGrid()
estimator = estimation.MUSIC(antenna_array, wavelength, grid)
R = np.array([[1.5, 2, 3, 4], [4, 5, 6, 5], [45, 5, 5, 6], [5, 1, 0, 5]])
_, estimates = estimator.estimate(R, 1, return_spectrum=False, refine_estimates=True)
print('Estimates: {0}'.format(estimates.locations))
I can generate signal with this library, but how to use my own? For example, signal from ADC (like this:
-> Switching to antenna 0 : [0, 4, 7, 10]
-> Switching to antenna 1 : [5, 6, 11, 83]
-> Switching to antenna 2 : [0, 23, 2, 34]
-> Switching to antenna 3 : [23, 105, 98, 200]
)

I think your question is how you should feed the real data from antennas, right?
Supposedly your data should be in order along time. I mean in case of "antenna 0 : [0, 4, 7, 10]", 0 is the 1st-in data, and 4, 7, in order, and the 10 is the last one in time.
If yes, you could leave them as a simple matrix like what you typed above:
r = matrix 4x4 of
0, 4, 7, 10
5, 6, 11, 83
0, 23, 2, 34
23, 105, 98, 200
//===============
r(0,0) = 0, r(0,1) = 4, r(0,2) = 7, r(0,3) = 10
r(1,0) = 5, r(1,1) = 6, ... etc.
r(2,0) = 0, ...etc.
//==============
R = the product of r and its hermitian matrix (r.h in python).
R = r # r.h
And this is the covariance matrix that you need to fill in as the 1st argument in function.

Matplotlib's bar chart displays uneven bars

If we look at this code and x,y data,
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],[1, 2, 4, 10, 5, 9, 1,4, 9, 9],edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
this displays the following graph
I can not understand how the x values go beyond 1 and even takes negative values. Also, why do the bars have different widths?

The problem is that your x-values are separated by a spacing of 0.1 and the default bar width is 1 so you see overlapping bars. The solution is to define the bar width. In your case, a bar width smaller than 0.1 will work perfectly fine. For instance, you can use width=0.05 and you will get the following graph.
Why negative?: The bars are by default centered at 0, 1, 2, 3 and so on. So your first bar in the question was drawn centered at 0 and had a width of 1. That's why it was spanning from -0.5 to +0.5.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, edgecolor='black')
plt.xlabel('Sample Mean')
plt.ylabel('Probability')
If you don't want bars at x<0: You can align your bars to the right by passing argument align='edge.
rects1 = plt.bar([0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1],
[1, 2, 4, 10, 5, 9, 1,4, 9, 9], width=0.05, align='edge',
edgecolor='black')

Annotate values for stacked horizontal bar plot

I'm trying to annotate the values for a stacked horizontal bar graph created using pandas. Current code is below
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
for p in ax.patches:
ax.annotate(str(p.get_x()), xy=(p.get_x(), p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False)
The problem is the annotation method I used gives the x starting points and not the values of each segment. I'd like to be able to annotate values of each segment in the center of each segment for each of the bars.
edit: for clarity, what I would like to achieve is something like this where the values are centered horizontally (and vertically) for each segment:

You can use the patches bbox to get the information you want.
ax = df.plot.barh(stacked=True, figsize=(10, 12))
for p in ax.patches:
left, bottom, width, height = p.get_bbox().bounds
ax.annotate(str(width), xy=(left+width/2, bottom+height/2),
ha='center', va='center')

Another possible solution is to get your df.values to a flatten array via values = df.values.flatten("F")
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
ax = df.plot.barh(stacked=True, figsize=(10,12))
values = df.values.flatten("F")
for i, p in enumerate(ax.patches):
ax.annotate(str(values[i]), xy=(p.get_x()+ values[i]/2, p.get_y()+0.2))
plt.legend(bbox_to_anchor=(0, -0.15), loc=3, prop={'size': 14}, frameon=False);

From matplotlib 3.4.0 use matplotlib.pyplot.bar_label
The labels parameter can be used to customize annotations, but it's not required.
See this answer for additional details and examples.
Each group of containers must be iterated through to add labels.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1
Horizontal Stacked
d = {'group 1': [1, 2, 5, 7, 4, 5, 10],
'group 2': [5, 6, 1, 8, 2, 6, 2],
'group 3': [12, 2, 2, 4, 4, 8, 4]}
df = pd.DataFrame(d)
# add tot to sort the bars
df['tot'] = df.sum(axis=1)
# sort
df = df.sort_values('tot')
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=True, figsize=(10, 12))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
Horizontal Grouped
Not stacked is a better presentation of the data, because it is easier to compare bar lengths visually.
# plot all columns except tot
ax = df.iloc[:, :-1].plot.barh(stacked=False, figsize=(8, 9))
# iterate through each group of bars
for c in ax.containers:
# format the number of decimal places (if needed) and replace 0 with an empty string
labels = [f'{w:.0f}' if (w := v.get_width()) > 0 else '' for v in c ]
ax.bar_label(c, labels=labels, label_type='center')
df view
group 1 group 2 group 3 tot
2 5 1 2 8
1 2 6 2 10
4 4 2 4 10
6 10 2 4 16
0 1 5 12 18
3 7 8 4 19
5 5 6 8 19

matplotlib, pyplot : custom color for a specific data value

I am generating a heat map for my data.
everything works fine, but I have a little problem. My data (numbers) are from 0 to 10.000.
0 means nothing (no data) and at the moment the field with 0 just take the lowest color of my color scala. My problem is how to make the data with 0 to have a total different color (e.g. black or white)
Just see the Picture to better understand what i mean:
My code (snippet) looks like this:
matplotlib.pyplot.imshow(results, interpolation='none')
matplotlib.pyplot.colorbar();
matplotlib.pyplot.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8], [10, 15, 20, 25, 30, 35, 40, 45, 50]);
matplotlib.pyplot.xlabel('Population')
matplotlib.pyplot.yticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 'serial']);
matplotlib.pyplot.ylabel('Communication Step');
axis.xaxis.tick_top();
matplotlib.pyplot.savefig('./results_' + optimisationProblem + '_dim' + str(numberOfDimensions) + '_' + statisticType + '.png');
matplotlib.pyplot.close();

If you are not interested in a smooth transition between the values 0 and 0.0001, you can just set every value that equals 0 to NaN. This will result in a white color whereas 0.0001 will still be deep blue-ish.
In the following code I include an example. I generate the data randomly. I therefore select a single element from my array and set it to NaN. This results in the color white. I also included a line in which you can set every data point that equals 0 to NaN.
import numpy
import matplotlib.pyplot as plt
#Random data
data = numpy.random.random((10, 10))
#Set all data points equal to zero to NaN
#data[data == 0.] = float("NaN")
#Set single data value to nan
data[2][2] = float("NaN")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.imshow(data, interpolation = "nearest")
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.