Merging arrays and plots - python

Let's say I have 2 arrays like these:
x1 = [ 1.2, 1.8, 2.3, 4.5, 20.0]
y1 = [10.3, 11.8, 12.3, 11.5, 11.5]
and other two that represent the same function but sampled in different values
x2 = [ 0.2, 1,8, 5.3, 15.5, 17.2, 18.3, 20.0]
y2 = [10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0]
is there a way with numpy to merge x1 and x2 and according to the result merging also the related values of y without explicitly looping all over the arrays? (like doing an average of y or taking the max for that interval)

I don't know if you can find something in numpy, but here is a solution using pandas instead. (Pandas is using numpy behind the scenes, so there isn't so much data conversion.)
import numpy as np
import pandas as pd
x1 = np.asarray([ 1.2, 1.8, 2.3, 4.5, 20.0])
y1 = np.asarray([10.3, 11.8, 12.3, 11.5, 11.5])
x2 = np.asarray([ 0.2, 1.8, 5.3, 15.5, 17.2, 18.3, 20.0])
y2 = np.asarray([10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0])
c1 = pd.DataFrame({'x': x1, 'y': y1})
c2 = pd.DataFrame({'x': x2, 'y': y2})
c = pd.concat([c1, c2]).groupby('x').mean().reset_index()
x = c['x'].values
y = c['y'].values
# Result:
x = array([ 0.2, 1.2, 1.8, 2.3, 4.5, 5.3, 15.5, 17.2, 18.3, 20. ])
y = array([10.3 , 10.3, 11.8, 12.3, 11.5, 12.3, 12.5, 15.2, 10.3, 10.75])
Here I concatenate the two vectors and do a groupby operation to get the equal values for 'x'. For these "groups" I than take the mean(). reset_index() will than move the index 'x' back to a column. To get the result back as a numpy array I use .values. (Use to_numpy() for pandas version 24.0 and higher.)

How about using numpy.hstack followed by sorting using numpy.sort ?
In [101]: x1_arr = np.array(x1)
In [102]: x2_arr = np.array(x2)
In [103]: y1_arr = np.array(y1)
In [104]: y2_arr = np.array(y2)
In [111]: np.sort(np.hstack((x1_arr, x2_arr)))
Out[111]:
array([ 0.2, 1.2, 1.8, 1.8, 2.3, 4.5, 5.3, 15.5, 17.2, 18.3, 20. ,
20. ])
In [112]: np.sort(np.hstack((y1_arr, y2_arr)))
Out[112]:
array([10. , 10.3, 10.3, 10.3, 11.5, 11.5, 11.8, 11.8, 12.3, 12.3, 12.5,
15.2])
If you want to get rid of the duplicates, you can apply numpy.unique on top of the above results.

I'd propose a solution based on the accepted answer of this question:
import numpy as np
import pylab as plt
x1 = [1.2, 1.8, 2.3, 4.5, 20.0]
y1 = [10.3, 11.8, 12.3, 11.5, 11.5]
x2 = [0.2, 1.8, 5.3, 15.5, 17.2, 18.3, 20.0]
y2 = [10.3, 11.8, 12.3, 12.5, 15.2, 10.3, 10.0]
# create a merged and sorted x array
x = np.concatenate((x1, x2))
ids = x.argsort(kind='mergesort')
x = x[ids]
# find unique values
flag = np.ones_like(x, dtype=bool)
np.not_equal(x[1:], x[:-1], out=flag[1:])
# discard duplicated values
x = x[flag]
# merge, sort and select values for y
y = np.concatenate((y1, y2))[ids][flag]
plt.plot(x, y, marker='s', color='b', ls='-.')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
This is the result:
x = [ 0.2 1.2 1.8 2.3 4.5 5.3 15.5 17.2 18.3 20. ]
y = [10.3 10.3 11.8 12.3 11.5 12.3 12.5 15.2 10.3 11.5]
As you notice, this code keeps only one value for y if several ones are available for the same x: in this way, the code is faster.
Bonus solution: the following solution is based on a loop and mainly standard Python functions and objects (not numpy), so I known that it is not acceptable; by the way, it is very coincise and elegant and it handles multiple values for y, so I decied to include it here as a plus:
x = sorted(set(x1 + x2))
y = np.nanmean([[d.get(i, np.nan) for i in x]
for d in map(lambda a: dict(zip(*a)), ((x1, y1), (x2, y2)))], axis=0)
In this case, you get the following results:
x = [0.2, 1.2, 1.8, 2.3, 4.5, 5.3, 15.5, 17.2, 18.3, 20.0]
y = [10.3 10.3 11.8 12.3 11.5 12.3 12.5 15.2 10.3 10.75]

Related

How to sum up elements in list in a moving range

I'm trying to sum up elements from list in a moving range. For instance, when the user input a customized range 'n', list[0] to list[n] will be added up and stored in a new list, followed by list[1] to list[n+1] until the end. Finally the maximum number in the new list will print out. However, in my code, it seems that the elements are continuously summing up.
Thanks a lot for your help.
The list is:
[5.8, 1.2, 5.8, 1.0, 6.9, 0.8, 6.0, 18.4, 18.6, 1.0, 0.8, 6.4, 12.2, 18.2, 1.4, 6.8, 41.8, 3.6, 5.2, 5.2, 4.6, 8.6, 16.6, 13.2, 9.6, 41.6, 37.2, 110.0, 30.0, 34.8, 24.6, 7.0, 13.4, 0.5, 37.0, 18.8, 20.4, 0.6, 6.4, 2.4, 1.0, 7.6, 6.6, 4.4, 2.4, 0.6, 3.2, 21.2, 28.2, 3.2, 2.4, 14.4, 0.6, 1.6, 4.4, 0.8, 0.6, 1.6, 1.0, 27.0, 52.6, 10.2, 1.0, 4.2]
My code:
days = int(input('Enter customized range: '))
n = np.arange(days)
total = 0
count = 1
max_total = []
while (count + len(n) - 2) <= (len(rain_b) - 1):
for i in range(count+len(n)-4, count+len(n)-2):
total += rain_c[i]
#print(rain_b[count+number-1])
#total = sum([(rain_c(count+number-4)) : (count+number-2)])
max_total.append(total)
count += 1
print(max_total)
Since you're already using numpy, you can use np.convolve() with an array of ones with length n:
>>> n = 5
>>> x = np.arange(10)
>>> np.max(np.convolve(x, np.ones(n, dtype=x.dtype), mode="valid"))
35
This has the effect of performing the dot product of np.ones(n) with each n-element "window" of the array x. The sliding_window_view() from numpy.lib.stride_tricks is analogous and helps explain:
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> windows = np.lib.stride_tricks.sliding_window_view(x, n)
>>> windows
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8],
[5, 6, 7, 8, 9]])
>>> windows.sum(axis=1)
array([10, 15, 20, 25, 30, 35])
>>> np.convolve(x, np.ones(n, dtype=x.dtype), mode="valid")
array([10, 15, 20, 25, 30, 35])
Try this (lst being your list and n being your range):
print(max(sum(lst[i:i+n+1]) for i in range(len(lst)-n)))
So for example:
>>> lst = [5.8, 1.2, 5.8, 1.0, 6.9, 0.8, 6.0, 18.4]
>>> n = 5
>>> print([sum(lst[i:i+n+1]) for i in range(len(lst)-n)])
[21.5, 21.7, 38.9]
>>> print(max(sum(lst[i:i+n+1]) for i in range(len(lst)-n)))
38.9
I would clean up your loop conditionals to be more clear and idiomatic.
I believe the problem is that you're not zeroing total out between iterations.
What are rain_b and rain_c? There should only be 1 input list and 1 output list.
Why not store n as an integer instead of some object? I don't have numpy on my pc so I just removed that part.
Here's psudo code of how I would do this:
For x in range 0 up to len(input_list) - n:
window_total = 0
for y in range x to x+n-1:
window_total += input_list[y]
output_list.append(window_total)
Based on an iterator/array containing cumulative sum of numbers, you can get the rolling sum of n values by subtracting the cumulative values that are n positions behind. This approach has an O(N) time complexity (as opposed to computing the sum of every subrange, which is O(N x W) where W is the rolling window size)
Without numpy:
L = [5.8, 1.2, 5.8, 1.0, 6.9, 0.8, 6.0, 18.4, 18.6, 1.0, 0.8, 6.4, 12.2, 18.2, 1.4, 6.8, 41.8, 3.6, 5.2, 5.2, 4.6, 8.6, 16.6, 13.2, 9.6, 41.6, 37.2, 110.0, 30.0, 34.8, 24.6, 7.0, 13.4, 0.5, 37.0, 18.8, 20.4, 0.6, 6.4, 2.4, 1.0, 7.6, 6.6, 4.4, 2.4, 0.6, 3.2, 21.2, 28.2, 3.2, 2.4, 14.4, 0.6, 1.6, 4.4, 0.8, 0.6, 1.6, 1.0, 27.0, 52.6, 10.2, 1.0, 4.2]
n = 3
from itertools import accumulate
S = (a-b for a,b in zip(accumulate(L),accumulate([0]*n+L)))
print(max(S)) # 188.8
Using numpy
import numpy as np
L = np.array([5.8, 1.2, 5.8, 1.0, 6.9, 0.8, 6.0, 18.4, 18.6, 1.0, 0.8, 6.4, 12.2, 18.2, 1.4, 6.8, 41.8, 3.6, 5.2, 5.2, 4.6, 8.6, 16.6, 13.2, 9.6, 41.6, 37.2, 110.0, 30.0, 34.8, 24.6, 7.0, 13.4, 0.5, 37.0, 18.8, 20.4, 0.6, 6.4, 2.4, 1.0, 7.6, 6.6, 4.4, 2.4, 0.6, 3.2, 21.2, 28.2, 3.2, 2.4, 14.4, 0.6, 1.6, 4.4, 0.8, 0.6, 1.6, 1.0, 27.0, 52.6, 10.2, 1.0, 4.2])
n = 3
S = np.cumsum(L)
S[n:] -= S[:-n]
print(np.max(S)) # 188.8

How to combine these two numpy arrays?

How would I combine these two arrays:
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
Into something like this:
xy = [[0.1, [1.0, 1.1, 1.2, 1.3]], [0.2, [2.0, 2.1, 2.2, 2.3]...
Thank you for the assistance!
Someone suggested I post code that I have tried and I realized I had forgot to:
xy = np.array(list(zip(x, y)))
This is my current solution, however it is extremely inefficient.
You can use zip to combine
[[a,b] for a,b in zip(y,x)]
Out:
[[array([0.1]), array([1. , 1.1, 1.2, 1.3])],
[array([0.2]), array([2. , 2.1, 2.2, 2.3])],
[array([0.3]), array([3. , 3.1, 3.2, 3.3])],
[array([0.4]), array([4. , 4.1, 4.2, 4.3])],
[array([0.5]), array([5. , 5.1, 5.2, 5.3])]]
A pure numpy solution will be much faster than list comprehension for large arrays.
I do have to say your use case makes no sense, as there is no logic in putting these arrays into a single data structure, and I believe you should re check your design.
Like #user2357112 supports Monica was subtly implying, this is very likely an XY problem. See if this is really what you are trying to solve, and not something else. If you want something else, try asking about that.
I strongly suggest checking what you want to do before moving on, as you will put yourself in a place with bad design.
That aside, here's a solution
import numpy as np
x = np.asarray([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3], [3.0, 3.1, 3.2, 3.3],
[4.0, 4.1, 4.2, 4.3], [5.0, 5.1, 5.2, 5.3]])
y = np.asarray([[0.1], [0.2], [0.3], [0.4], [0.5]])
xy = np.hstack([y, x])
print(xy)
prints
[[0.1 1. 1.1 1.2 1.3]
[0.2 2. 2.1 2.2 2.3]
[0.3 3. 3.1 3.2 3.3]
[0.4 4. 4.1 4.2 4.3]
[0.5 5. 5.1 5.2 5.3]]

Numpy - custom sort of rows and columns in array

Can I sort the rows or columns of an array according to values stored in a separate list?
For example:
row_keys = [10, 11, 5, 6]
z = np.array([[2.77, 11., 4.1, 7.2],
[3.7, 2.2, 1.1, 0.5],
[2.5, 3.5, 5.0, 9.0],
[4.3, 2.2, 5.1, 6.1]])
Should produce something like
array([[ 2.5, 3.5, 5. , 9. ],
[ 4.3, 2.2, 5.1, 6.1]
[ 2.77, 11. , 4.1, 7.2],
[ 3.7, 2.2, 1.1, 0.5],
])
And similar functionality applied to the columns, please.
Another way for rows
z_rows = z[np.argsort(row_keys)]
and for columns
z_columns = z.T[np.argsort(row_keys)].T

Creating a numpy array of 3D coordinates from three 1D arrays, first index changing fastest

similar to the question here
I have three arbitrary 1D arrays, for example:
x_p = np.array((0.0,1.1, 2.2, 3.3, 4.4))
y_p = np.array((5.5,6.6,7.7))
z_p = np.array((8.8, 9.9))
I need
points = np.array([[0.0, 5.5, 8.8],
[1.1, 5.5, 8.8],
[2.2, 5.5, 8.8],
...
[4.4, 7.7, 9.9]])
1) with the first index changing fastest.2) points are float coordinates, not integer index.
3) I noticed from version 1.7.0, numpy.meshgrid has changed behavior with default indexing='xy' and need to use
np.vstack(np.meshgrid(x_p,y_p,z_p,indexing='ij')).reshape(3,-1).T
to get the result points with last index changing fast, which is not I want.(It was mentioned only from 1.7.0,meshgrid supports dimension>2, I didn't check)
I found this with some trial and error.
I think the ij v xy indexing has been in meshgrid forever (it's the sparse parameter that's newer). It just affects the order of the 3 returned elements.
To get x_p varying fastest I put it last in the argument list, and then used a ::-1 to reverse column order at the end.
I used stack to join the arrays on a new axis at the end, so I don't need to transpose. But the reshaping and transpose's are all cheap (time wise). So they can be used in any combination that works and is understandable.
In [100]: np.stack(np.meshgrid(z_p, y_p, x_p, indexing='ij'),3).reshape(-1,3)[:,::-1]
Out[100]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
...
[ 2.2, 7.7, 9.9],
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])
You might permute axes with np.transpose to achieve the output in that desired format -
np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Sample output -
In [104]: np.array(np.meshgrid(x_p, y_p, z_p)).transpose(3,1,2,0).reshape(-1,3)
Out[104]:
array([[ 0. , 5.5, 8.8],
[ 1.1, 5.5, 8.8],
[ 2.2, 5.5, 8.8],
[ 3.3, 5.5, 8.8],
[ 4.4, 5.5, 8.8],
[ 0. , 6.6, 8.8],
[ 1.1, 6.6, 8.8],
[ 2.2, 6.6, 8.8],
[ 3.3, 6.6, 8.8],
[ 4.4, 6.6, 8.8],
[ 0. , 7.7, 8.8],
[ 1.1, 7.7, 8.8],
....
[ 3.3, 7.7, 9.9],
[ 4.4, 7.7, 9.9]])

matplotlib: grouping error bars for each x-axes tick

I am trying to use matplotlib to plot error bars but have a slightly different requirements. So, the setup is as follows:
I have 3 different methods that I am comparing across 10 different parameter setting. So, on the y-axes I have the model fitting errors as given by the 3 methods and on the x-axes, I have the different parameter settings.
So, for each parameter setting, I would like to get 3 error bar plots corresponding to the three methods. Ideally, I would like to plot the 95% confidence interval and also the minimum and maximum for each method for each parameter setting.
Some example data can be simulated as:
parameters = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
mean_1 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_1 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
mean_2 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_1 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
mean_3 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_3 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
I have kept the values same as it does not change anything from the plotting point of view. I see matplotlib.errorbar method but I do not know how to extend it for multiple methods over one single x-axes value as I have in my case. Additionally, I am not sure how to add the [min, max] markers for each of the methods.
Taking your parameters list as x axis, mean_1 as y value and std_1 as errors you can plot an errorbar chart with
pylab.errorbar(parameters, mean_1, yerr=std_1, fmt='bo')
In case the error bars are not symmetric, i.e. you have lower_err and upper_err, the statement reads
pylab.errorbar(parameters, mean_1, yerr=[lower_err, upper_err], fmt='bo')
The same works with keyword xerr for errors in x direction, which is now hopefully self-explanatory.
To show several (in your case 3) different datasets, you can go the following way:
# import pylab and numpy
import numpy as np
import pylab as pl
# define datasets
parameters = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
mean_1 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_1 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
mean_2 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_2 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
mean_3 = [10.1, 12.1, 13.6, 14.5, 18.8, 11.8, 28.5]
std_3 = [2.6, 5.7, 4.3, 8.5, 11.8, 5.3, 2.5]
# here comes the plotting;
# to achieve a grouping, two things are extra here:
# 1. Don't use line plot but circular markers and different marker color
# 2. slightly displace the datasets in x direction to avoid overlap
# and create visual grouping
pl.errorbar(np.array(parameters)-0.01, mean_1, yerr=std_1, fmt='bo')
pl.errorbar(parameters, mean_2, yerr=std_2, fmt='go')
pl.errorbar(np.array(parameters)+0.01, mean_3, yerr=std_3, fmt='ro')
pl.show()
This is about pylab.errorbar, where you have to give the errors explicitly. An alternative approach is to use pylab.boxplot and to prodice a boxplot for each model, but therefore I guess I'll need the full distribution per model per parameter instead of just mean and std.

Categories

Resources