Sort arrays by two criteria - python

My figure has a very large legend, and to make it easier to find each corresponding line, I want to sort the legend by the y value of the line at the last datapoint.
plots[] contains a list of Line2D objects,
labels[] is the corresponding labels to each Line2D object, generated through labels = [plot._label for plot in plots]
I want to sort each/both arrays by plots._y[-1], the value of y at the last point
Bonus points if I can also sort first by _linestyle (a string) and then by the y value.
I am unsure of how to do this well, I wouldn't think it would require a loop, but it might because I am sorting by 2 criteria, one of which will be tricky to deal with (':' and '-' are the values of linestyle). Is there a function that can help me out here?
edit: it just occurred to me that I can generate labels after I sort, so that uncomplicates things a bit. However, I still have to sort plots by each object's linestyle and y[-1] value.

I believe this may work:
sorted(plots, key = lambda plot :(plot._linestyle, plot._y[-1]))

Related

Differentiate some points in a python plot

I have a plot in Python of the following
a = (1,2,3,4,5)
b = (1,2,3,4,5)
plot(a,b)
I want to differentiate some of the x axis points in the plot with a dots of unique colors
for examples this points
c = (2,4)
I tried the following:
a = (1,2,3,4,5)
b = (1,2,3,4,5)
plot(a,b)
matplotlib.pyplot.scatter(a, c)
But I got the error "x and y must be the same size"
If you google "matplotlib scatter" or something like that, one of the first links will be this one, which says that the function gives "a scatter plot of y vs. x". So, the error message you shared makes sense, since the length of a is greater than the length of c. I hope it makes sense to you why what your original code gives that error.
Your problem is specific enough that there isn't an out-of-the-box solution in matplotlib that I'm aware of, so this will require the use of your own custom functions. I'll give one approach that might be helpful to you. I'm structuring my answer here so you can see how to solve the issue to your own specifications, instead of relying too heavily on copying & pasting other people's code, since the latter makes it harder for you to do exactly what you want to do.
To restate the problem in more concise terms: How can a matplotlib user plot a line, and then put markers on a subset of the line's points, specified by their x-values?
To begin, here is what your program might look like currently:
import matplotlib.pyplot as plt
x_coords = [ ] # fill in x_coords here
y_coords = [ ] # fill in y_coords here
marker_x_coords = [ ] # fill in with x coordinates of points you want to have markers
plt.plot(x_coords, y_coords) # plots the line
#### TODO: plot the markers ####
Now, you have the x-values of the points you want to put markers on. How might you get their corresponding y-values? Well, you can make a function that searches for the index of the x-value in x_coords, and then gives the corresponding value at the same index of y_coords:
def getYVals(x_coords_lst, y_coords_lst, marker_x_coords_lst):
marker_y_coords = [] # start with an empty list
for x_point in marker_x_coords_lst:
point_index = x_coords_lst.index(x_point) # get the index of a point in the x list
marker_y_coords.append(y_coords_lst[point_index]) # add the value of the y list at that index to the list that will be returned
return marker_y_coords
This isn't the fastest method, but it is the clearest on what is happening. Here's an alternative that would give the same results but would likely perform faster computationally (it uses something called "list comprehension"):
def getYVals(x_coords_lst, y_coords_lst, marker_x_coords_lst):
return [y_coords_lst[x_coords_lst.index(x_val)] for x_val in marker_x_coords_lst]
The output of either of the getYVals function above will work as the y values of the markers. This can be put in the y argument of plt.scatter, and you already have the x values of it, so from there you should be good to go!

Creating pie chart from given percentage, not values

When I print the following pct_tested, I get a nicely displayed percentage (for example 27.83%) which corresponds to the percentage of tested elements to total elements.
pct_tested = "{:.2%}".format(number_of_tested/len(df.index))
Now, I want to create a pie chart where I input this pct_tested as the value and it outputs a pie chart with a slice = 27.83% and the other slice is the remainder (so 72.17%).
My trial:
plt.pie([pct_tested], colors = colors, autopct='%.2f%%')
Does not work because "%" is present in the value of pct_tested.
So I tried again with a purely numerical value just to see if that would fix it.
plt.pie([number_of_tested], colors = colors, autopct='%.2f%%')
This worked to create a pie chart, however, the outcome is simply a one slice pie chart with written 100% on it.
Then I realised that for my data I could simply use [pct_tested, 100-pct_tested] but I still run in the problem that it doesn't work because the value is not purely numerical, how can I fix it?
I am new to coding and realise this is quite a trivial request but I am not sure where to go from here, I simply need a bit of guidance. Thank you for your help
You need to pass a list of numbers with one value for each slice to plt.pie. There are a couple of ways to do this given the information you have.
Using the pct_tested value, you can calculate the other percentage by subtracting from 100.
plt.pie([pct_tested, 100-pct_tested], autopct='%.2f%%')
Using the number_of_tested value, you can calculate the other value by subtracting from the total. I'll assume that total is len(df.index), since you divided by that to get the pct_tested value.
plt.pie([number_of_tested, len(df.index)-number_of_tested], autopct='%.2f%%')
Either one of those should give you a pie chart with two sections.

How to prevent x-axis values ranging from least to greatest?

I am unable to prevent the x values from going least to greatest and I need them in a specific order, is there such a way to do this in Python?
This is what the order of the x values needs to be instead.
You can plot them with "default" x values and change the tick labels.
plt.plot(Y)
plt.xticks(ticks=range(len(Y)), labels=X) # where X is your list with the order you want

Getting data of a box plot - Matplotlib

I have to plot a boxplot of some data, which I could easily do with Matplotlib. However, I was requested to provide a table with the data presented there, like the whiskers, the medians, standard deviation, and so on.
I know that I could calculate these "by hand", but I also know, from the reference, that the boxplot method:
Returns a dictionary mapping each component of the boxplot to a list of the matplotlib.lines.Line2D instances created. That dictionary has the following keys (assuming vertical boxplots):
boxes: the main body of the boxplot showing the quartiles and the median’s confidence intervals if enabled.
medians: horizonal lines at the median of each box.
whiskers: the vertical lines extending to the most extreme, n-outlier data points.
caps: the horizontal lines at the ends of the whiskers.
fliers: points representing data that extend beyone the whiskers (outliers).
So I'm wondering how could I get these values, since they are matplotlib.lines.Line2D.
Thank you.
As you've figured out, you need to access the members of the return value of boxplot.
Namely, e.g. if your return value is stored in bp
bp['medians'][0].get_ydata()
>> array([ 2.5, 2.5])
As the boxplot is vertical, and the median line is therefore a horizontal line, you only need to focus on one of the y-values; i.e. the median is 2.5 for my sample data.
For each "key" in the dictionary, the value will be a list to handle for multiple boxes. If you have just one boxplot, the list will only have one element, hence my use of bp['medians'][0] above.
If you have multiple boxes in your boxplot, you will need to iterate over them using e.g.
for medline in bp['medians']:
linedata = medline.get_ydata()
median = linedata[0]
CT Zhu's answer doesn't work unfortunately, as the different elements behave differently. Also e.g. there's only one median, but two whiskers...therefore it's safest to manually treat each quantity as outlined above.
NB the closest you can come is the following;
res = {}
for key, value in bp.items():
res[key] = [v.get_data() for v in value]
or equivalently
res = {key : [v.get_data() for v in value] for key, value in bp.items()}

Plotting two objects using a 4-item list

I have this simulator (gravitation) I've been working on, and I've dissected the equations, math, etc. and it's totally legitimate. However, when I animate the thing I get weird behavior. I'd rather not bore everyone with the entire script because it's sorta lengthy, but the method I'm calling in line.set under the animate(i) function returns a list of four values, which are the positions of my two particles in Cartesian (x,y) coordinates. For example my list looks like:
[1.2, 3.2, 4.5, 5.1]
where the first index is the x-position of the first particle, the second index is the y-position and likewise for the the last two elements corresponding to the second particle (indices 2 and 3).
My question is whether the line.set_data(force.updatePosition(dt)) should be working the way I think it does, i.e. plotting the first particle with indices 0 and 1 and particle two with indices 2 and 3, or am I missing the point? The plotting works, the particles show up, but they get weird, non-sensical movement.
If it's completely necessary here is the script in its entirety...again it's long-ish that's why I didn't post it directly. Also, it's pretty messy as I'm still fighting with it and haven't cleaned it up yet.
Tl;DR Should line.set_data() be able to plot two separate objects if it is fed a list with 4 items?
def init():
line.set_data([], [])
return line,
def animate(i):
line.set_data(force.updatePosition(dt))
return line,
The docs say:
Definition: l.set_data(self, *args)
Docstring:
Set the x and y data
ACCEPTS: 2D array (rows are x, y) or two 1D arrays
So I imagine you want to give it two lists:
line.set_data([x1, x2], [y1, y2])
But it seems that force.updatePosition already returns a list of two lists([pos1]+[pos2]), so you can maybe try:
line.set_data(np.transpose(force.updatePosition(dt)))
My opinion is you might be better off keeping all this info in arrays and remove half the lines of your code, since you write every line two or four times for each element.

Categories

Resources