Plotting different colors in matplotlib - python - python

I'm working on a small project on the vehicle routing problem, where a set of vehicles delivering goods to a set of customers from depot.
The solution would be something like:
Sub-route 1: Depot Customer4 Customer7
Sub-route 2: Depot Customer1 Customer5 Customer3
Sub-route 3: Depot Customer2 Customer6
where depot always have x-y coordinate (0,0), so x_all and y_all would be something like
x_all = [0,x4,x7,0,x1,x5,x3,0,...]
y_all = [0,y4,y7,0,y1,y5,y3,0,...]
plt.plot(x_all, y_all)
How could I plot a graph that has different colors for different routes? In other words, the colors would change when (x,y) = (0,0).
Thanks

You could do something like this:
# Find indices where routes get back to the depot
depot_stops = [i for i in range(len(x_all)) if x_all[i] == y_all[i] == 0]
# Split route into sub-routes
sub_routes = [(x_all[i:j+1], y_all[i:j+1]) for i, j in zip(depot_stops[:-1], depot_stops[1:])]
for xs, ys in sub_routes:
plt.plot(xs, ys)
# (Consecutive calls will automatically use different colours)
plt.show()

There are a few ways you could do this, but I would suggest using a multidimensional list:
x = [[0, 4, 7],
[0, 5, 3],
[0, 2, 1]]
y = [[0, 4, 7],
[0, 1, 5],
[0, 2, 1]]
for i in range(len(x)):
plt.plot(x[i], y[i])
plt.show()
And matplotlib will take care of the coloring for you
This is an advisable way of managing your data as now you can index each route independently without worrying about all the routes being of the same length. For example if one route had 4 stops and you needed to get that set of stops, you'd have to index your x and y arrays knowing where this route is. Instead, I could just index the 1st route of x and y:
x[1]
>> [0, 5, 3]
y[1]
>> [0, 1, 5]

Related

perform numpy mean over matrix using labels as indicators

import numpy as np
arr = np.random.random((5, 3))
labels = [1, 1, 2, 2, 3]
arr
Out[136]:
array([[0.20349907, 0.1330621 , 0.78268978],
[0.71883378, 0.24783927, 0.35576746],
[0.17760916, 0.25003952, 0.29058267],
[0.90379712, 0.78134806, 0.49941208],
[0.08025936, 0.01712403, 0.53479622]])
labels
Out[137]: [1, 1, 2, 2, 3]
assume I have this dataset.
I would like, using the labels as indicators, to perform np.mean over the rows.
(The labels here indicates the class of each row.
labels could also be [0, 1, 1, 0, 4, 1, 4] So have no assumptions over them.)
So the output here will be an average over the:
1st and 2nd row.
3rd and 4th row.
5th row.
in the most efficient way numpy offers. like so:
[np.mean(arr[:2], axis=0),
np.mean(arr[2:4], axis=0),
np.mean(arr[4:], axis=0)]
Out[180]:
[array([0.46116642, 0.19045069, 0.56922862]),
array([0.54070314, 0.51569379, 0.39499737]),
array([0.08025936, 0.01712403, 0.53479622])]
(in real life scenario the matrix dimensions could be (100000, 256))
First we would like to sort our label and matrix:
labels = np.array(labels)
# Getting the indices of a sorted array
sorted_indices = np.argsort(labels)
# Use the indices to sort both labels and matrix
sorted_labels = labels[sorted_indices]
sorted_matrix = matrix[sorted_indices]
Then, we calculate the "steps" or pairs of indices, (from, to) we want to calculate average over, We sum them and divide by their count.
# Here we're getting the amount of rows per label to average (over the sorted_matrix).
# Infact, we're getting the start and end indices per label.
label_indices = np.concatenate(([0], np.where(np.diff(sorted_labels) != 0)[0] + 1, [len(sorted_labels)]))
# using add + reduceat to add all rows with regard to the label indices
group_sums = np.add.reduceat(sorted_matrix, label_indices[:-1], axis=0)
# getting count for each group using the diff in label_indices
group_counts = np.diff(label_indices)
# Calculating the mean
group_means = group_sums / group_counts[:, np.newaxis]
Example:
matrix
Out[265]:
array([[0.69524902, 0.22105336, 0.65631557, 0.54823511, 0.25248685],
[0.61675048, 0.45973729, 0.22410694, 0.71403135, 0.02391662],
[0.02559926, 0.41640708, 0.27931808, 0.29139379, 0.76402121],
[0.27166955, 0.79121862, 0.23512671, 0.32568048, 0.38712154],
[0.94519182, 0.99834516, 0.23381289, 0.40722346, 0.95857389],
[0.01685432, 0.8395658 , 0.73460083, 0.08056013, 0.02522956],
[0.27274409, 0.64602305, 0.05698037, 0.23214598, 0.75130743],
[0.65069115, 0.32383729, 0.86316629, 0.69659358, 0.26667206],
[0.91971818, 0.02011127, 0.91776206, 0.79474582, 0.39678431],
[0.94645805, 0.18057829, 0.23292538, 0.93111373, 0.44815706]])
labels
Out[266]: array([3, 3, 2, 3, 1, 0, 2, 0, 2, 5])
group_means
Out[267]:
array([[0.33377274, 0.58170155, 0.79888356, 0.38857686, 0.14595081],
[0.94519182, 0.99834516, 0.23381289, 0.40722346, 0.95857389],
[0.40602051, 0.36084713, 0.41802017, 0.43942853, 0.63737099],
[0.52788969, 0.49066976, 0.37184974, 0.52931565, 0.221175 ],
[0.94645805, 0.18057829, 0.23292538, 0.93111373, 0.44815706]])
and the results are suited for: np.unique(sorted_labels)
np.unique(sorted_labels)
Out[271]: array([0, 1, 2, 3, 5])
I did not understand the labels part in your question. but there is a way to calculate the mean of each row in a matrix.
use --> np.mean(arr, axis = 1).
If lables to be used, please go through below mentioned script.
import numpy as np
arr = np.array([[1,2,3],
[4,5,6],
[7,8,9],
[1,2,3],
[4,5,6]])
labels =np.array([0, 1, 1, 0, 4])
#print(arr)
#print('LABEL IS :', labels)
#print('MEAN VALUES ARE : ',np.mean(arr[:2], axis = 1))
id = labels.argsort()
eq_lal = labels[id]
print(eq_lal)
print(arr[eq_lal])
print(np.mean(arr[eq_lal], axis = 1))

Find the highest value of y for each x value and connect the points with a line

I am exploring the best way to do this.
I have a scatter plot of y versus x, where x is income per capita.
After plotting all values as a scatter plot, I would like to find the highest value for y for each x value (i.e., at each income level) and then connect these points with a line.
How can I do this in Python?
You could use pandas, because it has a convenient groupby method and plays well with matplotlib:
import pandas as pd
# example data
df = pd.DataFrame({'x': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
'y': [3, 7, 9, 4, 1, 2, 8, 6, 4, 4, 3, 1]})
# standard scatter plot
ax = df.plot.scatter('x', 'y')
# find max. y value for each x
groupmax = df.groupby('x').max()
# connect max. values with lines
groupmax.plot(ax=ax, legend=False);
You have two parallel lists: x and y. You want to group them by x and take the maximum in y. First, you should sort the lists together. Zip them into a list of tuples and sort:
xy = sorted(zip(x, y))
Now, group the sorted list by the first element ("x"). The result is a list of tuples where the first element is x and the second is a list of all points with that x. Naturally, each point is also a tuple, and the first element of each tuple is the same x:
from itertools import groupby
grouped = groupby(xy, lambda item: item[0])
Finally, take the x and the max of the points for each group:
envelope = [(xp, max(points)[1]) for xp, points in grouped]
envelope is a list of xy tuples that envelope your scatter plot. You can further unzip it into xs and ys:
x1, y1 = zip(*envelope)
Putting it all together:
x1, y1 = zip(*[(xp, max(points)[1])
for xp, points
in groupby(sorted(zip(x, y)), lambda item: item[0])])

Google ORTools CP-SAT | Get list of index of 1's from a list of ortools-variables

I want to convert [0, 0, 1, 0, 1, 0, 1, 0] to [2, 4, 6] using ortools.
Where "2", "4", "6" in the second list are the index of "1" in the first list.
Using the below code I could get a list [0, 0, 2, 0, 4, 0, 6, 0]. How can I get [2, 4, 6]?
from ortools.sat.python import cp_model
model = cp_model.CpModel()
solver = cp_model.CpSolver()
work = {}
days = 8
horizon = 7
for i in range(days):
work[i] = model.NewBoolVar("work(%i)" % (i))
model.Add(work[0] == 0)
model.Add(work[1] == 0)
model.Add(work[2] == 1)
model.Add(work[3] == 0)
model.Add(work[4] == 1)
model.Add(work[5] == 0)
model.Add(work[6] == 1)
model.Add(work[7] == 0)
v1 = [model.NewIntVar(0, horizon, "") for _ in range(days)]
for d in range(days):
model.Add(v1[d] == d * work[d])
status = solver.Solve(model)
print("status:", status)
vec = []
for i in range(days):
vec.append(solver.Value(work[i]))
print("work",vec)
vec = []
for v in v1:
vec.append(solver.Value(v))
print("vec1",vec)
You should see this output on the console,
status: 4
work [0, 0, 1, 0, 1, 0, 1, 0]
vec1 [0, 0, 2, 0, 4, 0, 6, 0]
Thank you.
Edit:
I also wish to get a result as [4, 6, 2].
For just 3 variables, this is easy. In pseudo code:
The max index is max(work[i] * i)
The min index is min(horizon - (horizon - i) * work[i])
The medium is sum(i * work[i]) - max_index - min_index
But that is cheating.
If you want more that 3 variable, you will need parallel arrays of Boolean variables that indicate the rank of each variable.
Let me sketch the full solution.
You need to build a graph. The X axis are the variables. The why axis are the ranks. You have horizontal arcs going right, and diagonal arcs going right and up. If the variable is selected, you need to use a diagonal arc, otherwise an horizontal arc.
If using a diagonal arc, you will assign the current variable to the rank of the tail of the arc.
Then you need to add constraints to make it a contiguous path:
mass conservation at each node
variable is selected -> one of the diagonal arc must be selected
variable is not selected -> one of the horizontal arc must be selected
bottom left node has one outgoing arc
top right node has one incoming arc

Clipping tensor data to a bounding volume

I have 2 questions about tensorflow 2.0, with the focus on how tensorflow handles combined conditional tests in it's operations graph.
The task: cut up a volume of data points into blocks, and store the indices to the samples that belong to the volume (not the samples themselves).
My initial approach: loop all elements and collect the indices of the data points that are inside the 'bounding volume'. This was pretty slow, no matter how I reordered the compares on the coordinates.
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
i = 0
indices = tf.zero(shape[0], dtype = tf.int32)
for x in X:
if (x[0] > xmin[0]):
if (x[1] > xmin[1]):
if (x[2] <= xmax[2]):
# ...and so on...
indices = tf.concat([indices, i], axis = 0)
i = i + 1
return indices
I then came up with the idea to produce boolean tensors and logically 'and' them to get the indices of the elements I need. A whole lot faster, as shown in the next sample:
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
# example of 3 different conditions to clip to (a part of) the bounding volume
# X is the data and xmin and xmax are tensors containing the bounding volume
c0 = (X[:,0] > xmin[0])
c1 = (X[:,1] > xmin[1]) # processing all elements
c2 = (X[:,2] <= xmax[2]) # idem
# ... there could be many more conditions, you get the idea..
indices = tf.where(tf.math.logical_and(c1, tf.math.logical_and(c2, c3) )
return indices
# ...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices)
This code produces the correct result, but I wonder if it is optimal.
The first question is about scheduling:
Will the tensorflow graph that holds the operations cull (blocks of)
conditional tests if it knows some (blocks of) elements already tested
False. Because of the logical_and combining the logical
conditionals, no subsequent conditional test on these elements will
ever yield a True.
Indeed, in the above example c1 and c2 are asking questions on elements that may c0 already excluded from the set. Especially when you have a high number of elements to test, this could be a waste of time, even on parallel hardware platforms
So, what if we cascade the tests based on the results of a previous test? Although it seems like a solved problem, this solution is incorrect, because the final indices tensor will refer to a subset _X, not to the total set X:
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
c0 = (X[:,0] > xmin[0])
indices = tf.where(c0)
_X = tf.gather(X, indices)
c1 = (_X[:,1] > xmin[1]) # processing only trimmed elements
indices = tf.where(c1)
_X = tf.gather(_X, indices)
c2 = (_X[:,2] <= xmax[2]) # idem
indices = tf.where(c2)
return indices
...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices) # fails: indices refer to a trimmed subset, not X
I could of course 'solve' this by simply expanding X, so that each element also includes the index of itself in the original list, and then proceed as before.
So my second question is about functionality:
Does tf have a method to make the GPU/tensor infrastructure provide
the bookkeeping without spending memory / time on this seemingly
simple problem?
This will return all indices larger than minimum and less than maximum when both of these have the same number of features as X
import tensorflow as tf
minimum = tf.random.uniform((1, 5), 0., 0.5)
maximum = tf.random.uniform((1, 5), 0.5, 1.)
x = tf.random.uniform((10, 5))
indices = tf.where(
tf.logical_and(
tf.greater(x, minimum),
tf.less(x, maximum)
)
)
<tf.Tensor: shape=(22, 2), dtype=int64, numpy=
array([[0, 3],
[0, 4],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[3, 1],
[3, 3],
[3, 4],
[4, 0],
[4, 4],
[5, 3],
[6, 2],
[6, 3],
[7, 1],
[7, 4],
[8, 2],
[8, 3],
[8, 4],
[9, 1],
[9, 3],
[9, 4]], dtype=int64)>

Indexing from an ndimensional array - numpy/ python

I am working with matrices of (x,y,z) dimensions, and would like to index numerous values from this matrix simultaneously.
ie. if the index A[0,0,0] = 5
and A[1,1,1] = 10
A[[1,1,1], [5,5,5]] = [5, 10]
however indexing like this seems to return huge chunks of the matrix.
Does anyone know how I can accomplish this? I have a large array of indices (n, x, y, z) that i need to use to index from A)
Thanks
You are trying to use 1 as the first index 3 times and 5 as the index into the second dimension (again three times). This will give you the element at A[1,5,:] repeated three times.
A = np.random.rand(6,6,6);
B = A[[1,1,1], [5,5,5]]
# [[ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839],
# [ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839],
# [ 0.17135991, 0.80554887, 0.38614418, 0.55439258, 0.66504806, 0.33300839]]
B.shape
# (3, 6)
Instead, you will want to specify [1,5] for each axis of your matrix.
A[[1,5], [1,5], [1,5]] = [5, 10]
Advanced indexing works like this:
A[I, J, K][n] == A[I[n], J[n], K[n]]
with A, I, J, and K all arrays. That's not the full, general rule, but it's what the rules simplify down to for what you need.
For example, if you want output[0] == A[0, 0, 0] and output[1] == A[1, 1, 1], then your I, J, and K arrays should look like np.array([0, 1]). Lists also work:
A[[0, 1], [0, 1], [0, 1]]

Categories

Resources