Getting data of a box plot - Matplotlib - python

I have to plot a boxplot of some data, which I could easily do with Matplotlib. However, I was requested to provide a table with the data presented there, like the whiskers, the medians, standard deviation, and so on.
I know that I could calculate these "by hand", but I also know, from the reference, that the boxplot method:
Returns a dictionary mapping each component of the boxplot to a list of the matplotlib.lines.Line2D instances created. That dictionary has the following keys (assuming vertical boxplots):
boxes: the main body of the boxplot showing the quartiles and the median’s confidence intervals if enabled.
medians: horizonal lines at the median of each box.
whiskers: the vertical lines extending to the most extreme, n-outlier data points.
caps: the horizontal lines at the ends of the whiskers.
fliers: points representing data that extend beyone the whiskers (outliers).
So I'm wondering how could I get these values, since they are matplotlib.lines.Line2D.
Thank you.

As you've figured out, you need to access the members of the return value of boxplot.
Namely, e.g. if your return value is stored in bp
bp['medians'][0].get_ydata()
>> array([ 2.5, 2.5])
As the boxplot is vertical, and the median line is therefore a horizontal line, you only need to focus on one of the y-values; i.e. the median is 2.5 for my sample data.
For each "key" in the dictionary, the value will be a list to handle for multiple boxes. If you have just one boxplot, the list will only have one element, hence my use of bp['medians'][0] above.
If you have multiple boxes in your boxplot, you will need to iterate over them using e.g.
for medline in bp['medians']:
linedata = medline.get_ydata()
median = linedata[0]
CT Zhu's answer doesn't work unfortunately, as the different elements behave differently. Also e.g. there's only one median, but two whiskers...therefore it's safest to manually treat each quantity as outlined above.
NB the closest you can come is the following;
res = {}
for key, value in bp.items():
res[key] = [v.get_data() for v in value]
or equivalently
res = {key : [v.get_data() for v in value] for key, value in bp.items()}

Related

Differentiate some points in a python plot

I have a plot in Python of the following
a = (1,2,3,4,5)
b = (1,2,3,4,5)
plot(a,b)
I want to differentiate some of the x axis points in the plot with a dots of unique colors
for examples this points
c = (2,4)
I tried the following:
a = (1,2,3,4,5)
b = (1,2,3,4,5)
plot(a,b)
matplotlib.pyplot.scatter(a, c)
But I got the error "x and y must be the same size"
If you google "matplotlib scatter" or something like that, one of the first links will be this one, which says that the function gives "a scatter plot of y vs. x". So, the error message you shared makes sense, since the length of a is greater than the length of c. I hope it makes sense to you why what your original code gives that error.
Your problem is specific enough that there isn't an out-of-the-box solution in matplotlib that I'm aware of, so this will require the use of your own custom functions. I'll give one approach that might be helpful to you. I'm structuring my answer here so you can see how to solve the issue to your own specifications, instead of relying too heavily on copying & pasting other people's code, since the latter makes it harder for you to do exactly what you want to do.
To restate the problem in more concise terms: How can a matplotlib user plot a line, and then put markers on a subset of the line's points, specified by their x-values?
To begin, here is what your program might look like currently:
import matplotlib.pyplot as plt
x_coords = [ ] # fill in x_coords here
y_coords = [ ] # fill in y_coords here
marker_x_coords = [ ] # fill in with x coordinates of points you want to have markers
plt.plot(x_coords, y_coords) # plots the line
#### TODO: plot the markers ####
Now, you have the x-values of the points you want to put markers on. How might you get their corresponding y-values? Well, you can make a function that searches for the index of the x-value in x_coords, and then gives the corresponding value at the same index of y_coords:
def getYVals(x_coords_lst, y_coords_lst, marker_x_coords_lst):
marker_y_coords = [] # start with an empty list
for x_point in marker_x_coords_lst:
point_index = x_coords_lst.index(x_point) # get the index of a point in the x list
marker_y_coords.append(y_coords_lst[point_index]) # add the value of the y list at that index to the list that will be returned
return marker_y_coords
This isn't the fastest method, but it is the clearest on what is happening. Here's an alternative that would give the same results but would likely perform faster computationally (it uses something called "list comprehension"):
def getYVals(x_coords_lst, y_coords_lst, marker_x_coords_lst):
return [y_coords_lst[x_coords_lst.index(x_val)] for x_val in marker_x_coords_lst]
The output of either of the getYVals function above will work as the y values of the markers. This can be put in the y argument of plt.scatter, and you already have the x values of it, so from there you should be good to go!

Creating pie chart from given percentage, not values

When I print the following pct_tested, I get a nicely displayed percentage (for example 27.83%) which corresponds to the percentage of tested elements to total elements.
pct_tested = "{:.2%}".format(number_of_tested/len(df.index))
Now, I want to create a pie chart where I input this pct_tested as the value and it outputs a pie chart with a slice = 27.83% and the other slice is the remainder (so 72.17%).
My trial:
plt.pie([pct_tested], colors = colors, autopct='%.2f%%')
Does not work because "%" is present in the value of pct_tested.
So I tried again with a purely numerical value just to see if that would fix it.
plt.pie([number_of_tested], colors = colors, autopct='%.2f%%')
This worked to create a pie chart, however, the outcome is simply a one slice pie chart with written 100% on it.
Then I realised that for my data I could simply use [pct_tested, 100-pct_tested] but I still run in the problem that it doesn't work because the value is not purely numerical, how can I fix it?
I am new to coding and realise this is quite a trivial request but I am not sure where to go from here, I simply need a bit of guidance. Thank you for your help
You need to pass a list of numbers with one value for each slice to plt.pie. There are a couple of ways to do this given the information you have.
Using the pct_tested value, you can calculate the other percentage by subtracting from 100.
plt.pie([pct_tested, 100-pct_tested], autopct='%.2f%%')
Using the number_of_tested value, you can calculate the other value by subtracting from the total. I'll assume that total is len(df.index), since you divided by that to get the pct_tested value.
plt.pie([number_of_tested, len(df.index)-number_of_tested], autopct='%.2f%%')
Either one of those should give you a pie chart with two sections.

Maya - query animation curve data

I have a problem to query on deeper level Maya's animation curve data.
So as input I want to have an animation curve that is not connected to any attribure, just single node.
Having this I want to create a function that either returns:
Value with given time (I know that this can be easly done by connecting this anim curve to any attibute and then grabbing value by using command like: cmds.getAttr([objName]+'.'[attrName], t=[timeValue]...but how to do it without connecting animation curve to anything?)
Time with given value - I couldnt get it using any Maya commands and this is what I need the most.. :(
This is playing with bezier curve equation basicly but I'm pretty sure that Maya API is needed here but because Im very basic with API need your help:), any clues how to solve this? Thanks!!
You can use keyframe to query an anim curve's values that isn't connected to any attribute. Here's an example that will create an anim curve with 2 keys, print the key values, then print its value at each frame:
import maya.cmds as cmds
# Create an animation curve with 2 keys.
anim_curve = cmds.createNode("animCurveTL")
cmds.setKeyframe(anim_curve, t=0, v=0)
cmds.setKeyframe(anim_curve, t=10, v=10)
# Get its key count.
key_count = cmds.keyframe(anim_curve, q=True, keyframeCount=True)
# Iterate through key count and print all key values.
for i in range(key_count):
print cmds.keyframe(anim_curve, q=True, index=(i, i))[0]
# Get the scene's frame ranges.
start = int(cmds.playbackOptions(q=True, min=True))
end = int(cmds.playbackOptions(q=True, max=True))
# Iterate through each frame and print the anim curve's value.
for f in range(start, end):
print f, cmds.keyframe(anim_curve, q=True, eval=True, time=(f, f))[0]
Getting a value at a given time is easy enough, that's what the last portion is doing.
Getting a time with a given value isn't though, and you may need to re-think your approach. First of all it's possible to have the value during sub-frames, but I'm assuming you'd want to ignore that. Also you're trying to match a float value, which is tricky in programming. As a value of 3.5 will not match a value of 3.5000001, which is likely the kind of values the curve will have since it's interpolating between the keyed values.
But if you insist on doing it then you probably need to compare the value on every frame between the curve's first/last key times. When you compare the value you might need some threshold that is deemed acceptable, or close enough, to solve precision issues. If it passes then you can append the time to a list and return it later.
Hope that points you to the right direction.

Map a tuple of arbitrary length to an RGB value

I need to compute tuples (of integers) of arbitrary (but the same) length into RGB values. It would be especially nice if I could have them ordered more-or-less by magnitude, with any standard way of choosing sub-ordering between (0,1) and (1,0).
Here's how I'm doing this now:
I have a long list of RGB values of colors.
colors = [(0,0,0),(255,255,255),...]
I take the hash of the tuple mod the number of colors, and use this as the index.
def tuple_to_rgb(atuple):
index = hash(atuple) % len(colors)
return colors[index]
This works OK, but I'd like it to work more like a heatmap value, where (5,5,5) has a larger value than (0,0,0), so that adjacent colors make some sense, maybe getting "hotter" as the values get larger.
I know how to map integers onto RGB values, so perhaps if I just had a decent way of generating a unique integer from a tuple that sorted first by the magnitude of the tuple and then by the interior values it might work.
I could simply write my own sort comparitor, generate the entire list of possible tuples in advance, and use the order in the list as the unique integer, but it would be much easier if I didn't have to generate all of the possible tuples in advance.
Does anyone have any suggestions? This seems like something do-able, and I'd appreciate any hints to push me in the right direction.
For those who are interested, I'm trying to visualize predictions of electron occupations of quantum dots, like those in Fig 1b of this paper, but with arbitrary number of dots (and thus an arbitrary tuple length). The tuple length is fixed in a given application of the code, but I don't want the code to be specific to double-dots or triple-dots. Probably won't get much bigger than quadruple dots, but experimentalists dream up some pretty wild systems.
Here's an alternative method. Since the dots I've generated so far only have a subset of the possible occupations, the color maps were skewed one way, and didn't look as good. This method requires a list of possible states to be passed in, and thus these must be generated in advance, but the resulting colormaps look much nicer.
class Colormapper2:
"""
Like Colormapper, but uses a list of possible occupations to
generate the maps, rather than generating all possible occupations.
The difference is that the systems I've explored only have a subset
of the possible states occupied, and the colormaps look better
this way.
"""
def __init__(self,occs,**kwargs):
import matplotlib.pyplot as plt
colormap = kwargs.get('colormap','hot')
self.occs = sorted(list(occs),key=sum)
self.n = float(len(self.occs))
self.cmap = plt.get_cmap(colormap)
return
def __call__(self,occ):
ind255 = int(255*self.occs.index(occ)/self.n)
return self.cmap(ind255)
Here's an example of the resulting image:
You can see the colors are better separated than the other version.
Here's the code I came up with:
class Colormapper:
"""
Create a colormap to map tuples onto RGBA values produced by matplolib's
cmap function.
Arguments are the maximum value of each place in the tuple. Dimension of
the tuple is inferred from the length of the args array.
"""
def __init__(self,*args):
from itertools import product
import matplotlib.pyplot as plt
self.occs = sorted(list(product(*[xrange(arg+1) for arg in args])),key=sum)
self.n = float(len(self.occs))
self.hotmap = plt.get_cmap('hot')
return
def __call__(self,occ):
ind255 = int(255*self.occs.index(occ)/self.n)
return self.hotmap(ind255)
Here's an example of the result of this code:

Sort arrays by two criteria

My figure has a very large legend, and to make it easier to find each corresponding line, I want to sort the legend by the y value of the line at the last datapoint.
plots[] contains a list of Line2D objects,
labels[] is the corresponding labels to each Line2D object, generated through labels = [plot._label for plot in plots]
I want to sort each/both arrays by plots._y[-1], the value of y at the last point
Bonus points if I can also sort first by _linestyle (a string) and then by the y value.
I am unsure of how to do this well, I wouldn't think it would require a loop, but it might because I am sorting by 2 criteria, one of which will be tricky to deal with (':' and '-' are the values of linestyle). Is there a function that can help me out here?
edit: it just occurred to me that I can generate labels after I sort, so that uncomplicates things a bit. However, I still have to sort plots by each object's linestyle and y[-1] value.
I believe this may work:
sorted(plots, key = lambda plot :(plot._linestyle, plot._y[-1]))

Categories

Resources