Boolean sum with numpy finding matching pairs

Boolean sum with numpy finding matching pairs - python

I am trying to find mirror images in a numpy array. In particular, (x,y) == (y,x) but I want to rule out tuples with identical values (x,x).
Given a numpy array pckList with the size (198L,3L) containing floats.
I have the following code:
np.sum([x==pckLst[:,2] for x in pckLst[:,1]])
Which returns a given number, lets say 73
np.sum([x==pckLst[:,2] for x in pckLst[:,1]] and [x==pckLst[:,1] for x in pckLst[:,1]])
Returns a larger number, lets say 266.
Can someone please explain how this comes about?
I thought the first line returns True, when seen as tuples (x,y) == (any,y) and the second line returns only true when (x,y) == (y,x).
Is this correct?
EDIT:
Further explaination:
pckLst=[[ 112.066, 6.946, 6.938],
[ 111.979, 6.882, 7.634],
[ 112.014, 6.879, 7.587],
[ 112.005, 6.887, 7.554],
[ 111.995, 6.88, 6.88 ],
[ 112.048, 6.774, 6.88 ],
[ 111.808, 7.791, 7.566],
[ 111.802, 6.88, 6.774]]
Now I would like to find [ 112.048, 6.774, 6.88 ], since (6.88, 6.774) == (6.774, 6.88). However, [ 111.995, 6.88, 6.88 ] should not be considered a match.

Rather than commenting on your code here is a simpler implementation
a=np.array([[1,1,10],[1,2,20],[2,1,30],[1,3,40],[2,3,50]])
xy= a[:,:2].tolist()
[[x,y,z] for [x,y,z] in a if [y,x] in xy and x!=y]
[[1, 2, 20], [2, 1, 30]]

The arguments to "and" in your example are python-lists. The truth value of a list is True if it is not empty. Thats why you get a bigger sum in the latter case.
This will return the sum of elements with (x,y) == (y,x). It obviously only works if your just interested in the sum and not particular indices:
import numpy
pckLst = numpy.array([[ 112.066, 6.946, 6.938],
[ 111.979, 6.882, 7.634],
[ 112.014, 6.879, 7.587],
[ 112.005, 6.887, 7.554],
[ 111.995, 6.88, 6.88 ],
[ 112.048, 6.774, 6.88 ],
[ 111.808, 7.791, 7.566],
[ 111.802, 6.88, 6.774]])
coords = pckLst[:,1:]
equal_ids = numpy.ravel(coords[:,:1] != coords[:,1:])
unequal_coords = coords[equal_ids]
flipped = numpy.fliplr(unequal_coords)
coords_tuple_set = set(tuple(map(tuple, unequal_coords)))
flipped_tuple_set = set(tuple(map(tuple, flipped)))
print coords_tuple_set
print flipped_tuple_set
# need to devide by two, because we get (x,y) and (y,x) by the intersection
print "number of mirrored points:",
print len(coords_tuple_set.intersection(flipped_tuple_set))/2

Related

How to select in a numpy array all paris with a defined index difference?

Let's say that I have this numpy array:
import numpy as np
np.random.seed(0)
data = np.random.normal(size=(5,5))
which result in:
I would like to select all pairs with a specific indexes distance along each row.
For example if I choose a index distance 4 along each row I expect to have:
res[0,0]=1.76,res[0,1]=2.24
res[1,0]=0.40,res[1,1]=1.86
res[2,0]=-0.97,res[2,1]=-0.10
res[3,0]=0.95,res[3,1]=0.41
...
....
I now that I could that with a for cycle but I would like to have something smarter. I was thing to create two list of indexes and then to fill res but also in this I need a cycle.
Best

hstack
I guess that something in the line of
win=3 # Size of window. You say 4, but what you describe is 3 in my view. But you know how to add 1 if needed :D
np.hstack((data[:, :data.shape[1]-win].reshape(-1,1), data[:, win:].reshape(-1,1)))
should do
Result is
array([[ 1.76405235, 2.2408932 ],
[ 0.40015721, 1.86755799],
[-0.97727788, -0.10321885],
[ 0.95008842, 0.4105985 ],
[ 0.14404357, 0.12167502],
[ 1.45427351, 0.44386323],
[ 0.33367433, 0.3130677 ],
[ 1.49407907, -0.85409574],
[-2.55298982, -0.74216502],
[ 0.6536186 , 2.26975462]])
Explanation:
data[:,:data.shape[1]-win] is
array([[ 1.76405235, 0.40015721],
[-0.97727788, 0.95008842],
[ 0.14404357, 1.45427351],
[ 0.33367433, 1.49407907],
[-2.55298982, 0.6536186 ]])
So, just the first columns of data. Number of column, data.shape[1]-win, being the number of possible columns for data's width and win size.
Likewise, data[:, win:] is
array([[ 2.2408932 , 1.86755799],
[-0.10321885, 0.4105985 ],
[ 0.12167502, 0.44386323],
[ 0.3130677 , -0.85409574],
[-0.74216502, 2.26975462]])
Which are this time the last columns (same number of columns), but separated by win indexes.
.reshape(-1,1) flatten vertically those data, if I may use this "flatten vertically" description. For example data[:,:data.shape[1]-win].reshape(-1,1) is the same but with 10 rows of 1 column instead of 5 rows of 2 columns.
array([[ 1.76405235],
[ 0.40015721],
[-0.97727788],
[ 0.95008842],
[ 0.14404357],
[ 1.45427351],
[ 0.33367433],
[ 1.49407907],
[-2.55298982],
[ 0.6536186 ]])
hstack put those two together.
Indexation
Another method, maybe closer to the one you're apparently about to create indexes list, would be
W=data.shape[1]-win # number of pair per row
iy=np.arange(len(data)*2*W)//W//2
ix=np.array([[i,i+win] for i in range(W)]*len(data)).flatten()
data[iy,ix].reshape(-1,2)
That is about 2 times longer in term of cpu time. But it is worth noting that most of cpu time is spend in the creation of indexes ix and iy. So if you have many data sets of the same shape, this option could be faster, since you compute ix and iy once for all

You can take elements by pairs of indices with numpy.take:
np.take(data, [[0, 3], [1, 4]], axis=1).reshape(data.shape[0] * 2, 2)
array([[ 1.76405235, 2.2408932 ],
[ 0.40015721, 1.86755799],
[-0.97727788, -0.10321885],
[ 0.95008842, 0.4105985 ],
[ 0.14404357, 0.12167502],
[ 1.45427351, 0.44386323],
[ 0.33367433, 0.3130677 ],
[ 1.49407907, -0.85409574],
[-2.55298982, -0.74216502],
[ 0.6536186 , 2.26975462]])

Create a new array of all values from an array with step

I am new to Python. I would like to create a new array, that contains all values from an existing array with the step.
I tried to implement it but I think there is another way to have better performance. Any try or recommendation is highly appreciated.
Ex: Currently, I have:
An array: 115.200 values (2D dimension)
Step: 10.000
....
array([[ 0.2735, -0.308 ],
[ 0.287 , -0.3235],
[ 0.2925, -0.324 ],
[ 0.312 , -0.329 ],
[ 0.3275, -0.345 ],
[ 0.3305, -0.352 ],
[ 0.332 , -0.3465],
...
[ 0.3535, -0.353 ],
[ 0.361 , -0.3445],
[ 0.3545, -0.329 ]])
Expectation: A new array is sliced the array above by step of 10.000.
Below is my code:
for x in ecg_data:
number_samples_by_duration_exp_temp = 10000
# len(ecg_property.sample) = 115200
times = len(ecg_property.sample) / number_samples_by_duration_exp_temp
index_by_time = [int(y)*number_samples_by_duration_exp_temp for y in np.arange(1, times, 1)]
list = []
temp = 0
for z in index_by_time:
arr_samples_by_duration = ecg_property.sample[temp:z]
list.append(arr_samples_by_duration)
temp = z

numpy can not be used for this purpose as len(ecg_property.sample) #115,200 is not fully divisible by number_samples_by_duration_exp_temp #10,000 and numpy cannot allow elements of varying lengths :)
You can try list comprehension.
result_list = [ecg_property.sample[temp :temp+step] for temp in np.arange(times)*step ]
where
step=10000 and times = len(ecg_property.sample) / step
It can be further modified if needed and as per requirement.
(You can try out each step in above line of code in this answer and see the output to understand each step )
Hope this works out.
ty!

How to slice an array around its minimun

I am trying to define a function that finds the minimum value of an array and slices it around that value (plus or minus 5 positions). My array looks something like this:
[[ 0. 9.57705087]
[ 0.0433 9.58249315]
[ 0.0866 9.59745942]
[ 0.1299 9.62194967]
[ 0.1732 9.65324278]
[ 0.2165 9.68725702]
[ 0.2598 9.72263184]
[ 0.3031 9.75256437]
[ 0.3464 9.77025178]
[ 0.3897 9.76889121]
[ 0.433 9.74167982]
[ 0.4763 9.68589645]
[ 0.5196 9.59881999]
[ 0.5629 9.48861383]
[ 0.6062 9.3593597 ]]
However, I am dealing with much larger sets and need a function that can do it automatically without me having to manually find the minimun and then slice the array around that.I want to find the minimun of the array[:,1] values and then apply the slicing to the whole array.

Use np.argmin() to get the index of the minimum value. This will do it using the second column only (you haven't specified if it's the minimum value across columns or not).
your_array[:np.argmin(your_array[:, 1]), :]
To slice it 5 values further than the minimum, use:
your_array[:np.argmin(your_array[:, 1]) + 5, :]

Given your objective array:
import numpy as np
anarray = np.array([[ 0., 9.57705087],
[ 0.0433, 9.58249315],
[ 0.0866, 9.59745942],
[ 0.1299, 9.62194967],
[ 0.1732, 9.65324278],
[ 0.2165, 9.68725702],
[ 0.2598, 9.72263184],
[ 0.3031, 9.75256437],
[ 0.3464, 9.77025178],
[ 0.3897, 9.76889121],
[ 0.433, 9.74167982],
[ 0.4763, 9.68589645],
[ 0.5196, 9.59881999],
[ 0.5629, 0.48861383],
[ 0.6062, 9.3593597]])
This function will do the job:
def slice_by_five(array):
argmin = np.argmin(array[:,1])
if argmin < 5:
return array[:argmin+6,:]
return array[argmin-5:argmin+6,:]
check = slice_by_five(anarray)
print(check)
Output:
[[0.3897 9.76889121]
[0.433 9.74167982]
[0.4763 9.68589645]
[0.5196 9.59881999]
[0.5629 9.48861383]
[0.6062 9.3593597 ]]
The function can certainly be generalized to account for any neighborhood of size n:
def slice_by_n(array, n):
argmin = np.argmin(array[:,1])
if argmin < n:
return array[:argmin+n+1,:]
return array[argmin-n:argmin+n+1,:]
check = slice_by_n(anarray, 2)
print(check)
Output:
[[0.5196 9.59881999]
[0.5629 9.48861383]
[0.6062 9.3593597 ]]

Filter numpy ndarray with another ndarray, row by row

I have 2 numpy ndarray
The first contain x and y values :
xy_arr = [[ 736190.125 1130. ]
[ 736190.16666667 1130. ]
[ 736190.20833333 1130. ]
...,
[ 736190.375 1140. ]
[ 736190.41666667 1140. ]
[ 736190.45833333 1140. ]
[ 736190.5 1140. ]]
the second have x y and index values and is much bigger than the first:
xyind_arr = [[ 7.35964000e+05 1.02000000e+03 0.00000000e+00]
[ 7.35964042e+05 1.02000000e+03 1.00000000e+00]
[ 7.35964083e+05 1.02000000e+03 2.00000000e+00]
...,
[ 7.36613397e+05 1.09500000e+03 3.07730000e+04]
[ 7.36613404e+05 1.10000000e+03 3.07740000e+04]
[ 7.36613411e+05 1.10500000e+03 3.07750000e+04]]
I want to keep all rows of the xyind_arr where values are same in xy_arr like :
(xyind_arr[:,0] == xy_arr[:,0]) and (xyind_arr[:,1] == xy_arr[:,1])
My code :
sub_array = xyind_arr[((xyind_arr[:, 0] == xy_arr[:, 0]) &
(xyind_arr[:, 1] == xy_arr[:, 1]))]
Only work if the xy_array have one element.
For example :
import numpy as np
xy_arr = np.array([[56, 400]])
xyind_arr = np.array([[5, 6, 0],[8, 12, 1],[9, 17, 2],[56, 400, 3],[23, 89, 4]])
sub_array = xyind_arr[((xyind_arr[:, 0] == xy_arr[:, 0]) &
(xyind_arr[:, 1] == xy_arr[:, 1]))]
print(sub_array)
result OK :
[[ 56 400 3]]
But with
xy_arr = np.array([[5, 6],[8, 12],[23, 89]])
The result is
[]
And I expected
[[5, 6, 0],[8, 12, 1],[23, 89, 4]]
Is there any clean numpy method to obtain this filtered sub array ?
Edit :
Finally I let down the numpy solution and use the python set() :
xy_arr_set = set(map(tuple, xy_arr))
xyind_arr_set = set(map(tuple, xyind_arr))
for x, y, ind in xyind_arr_set:
if (x,y) in xy_arr_set:
"do what i need"

There is numpy.isin but it tests only against a scalar array; there is no tuple-comparison in it. You could use this method to find all rows of Array1 where the 0th column entry is in 0th column of Array2, and also the 1st column entry is in 1st column of Array2. But this is different from your task, because there is no guarantee that both 0th and 1st entry were found in the same row of Array2.
Since xyind_arr is much larger, I think it should be acceptable to loop over the smaller array xy_arr, applying one of the xy_arr filters at a time, and concatenate the results. For this to work, the rows of xy_arr must be unique, so better check that first:
xy_arr = np.unique(xy_arr, axis=0)
sub_array = np.concatenate([xyind_arr[(xyind_arr[:, 0] == xy_arr[k, 0]) &
(xyind_arr[:, 1] == xy_arr[k, 1])]
for k in np.arange(xy_arr.shape[0])], axis=0)
Note: the order of rows will not be preserved.

List of List of List slicing in Python

I have simulated 10000 scenarios for 4 variables during 120 months.
Hence, I have a scenarios list of lists of lists on which to get and element I would have to use scenarios[1][1][1], for example, and this would give me a float.
I want to slice this in two, dividing by the second list. Which means I want to keep the 10000 scenarios for 4 variables for the first 60 months.
How would I go about doing this?
My intuition would tell me to do
scenarios[:][0:60]
but this does not work. Instead of cutting the second list, it cuts the first. What is wrong?
Example:
Q = data.cov().as_matrix() # monthly covariance matrix Q
r=[0.00565,0.00206,0.00368,0.00021] # monthly return
scenarios = [[]]*10000
for i in range(10000):
scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios
In my case, Q=
2.167748064990633258e-03 -8.736421379048196659e-05 1.457397098602368978e-04 2.799384719379381381e-06
-8.736421379048196659e-05 9.035930360181909865e-04 3.196576120840064102e-04 3.197146643002681875e-06
1.457397098602368978e-04 3.196576120840064102e-04 2.390042779951682440e-04 2.312645986876262622e-06
2.799384719379381381e-06 3.197146643002681875e-06 2.312645986876262622e-06 4.365866475269951553e-06

Use a list comprehension:
early_scenarios = [x[:60] for x in scenarios]

So, you are trying to use multidimensional slicing on Python list objects, but fundamentally, list objects do not have dimensions. They have no inherent knowledge of their contents, other than the total number of them. But, you *shouldn't be working with list objects at all! Instead, replace this:
scenarios = [[]]*10000
for i in range(10000):
scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios
With this:
scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))
In a REPL:
>>> scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))
>>> scenarios.shape
(1000, 120, 4)
Then, you can slice to your heart's content in N dimensions using:
scenarios[:, 0:60]
Or, a more wieldy slice:
>>> scenarios[500:520, 0:60]
array([[[-0.05785267, 0.01122828, 0.00786622, -0.00204875],
[ 0.01682276, 0.00163375, 0.00439909, -0.0022255 ],
[ 0.02821342, -0.01634708, 0.01175085, -0.00194007],
...,
[ 0.04918003, -0.02146014, 0.00071328, -0.00222226],
[-0.03782566, -0.00685615, -0.00837397, -0.00095019],
[-0.06164655, 0.02817698, 0.01001757, -0.00149662]],
[[ 0.00071181, -0.00487313, -0.01471801, -0.00180559],
[ 0.05826763, 0.00978292, 0.02442642, -0.00039461],
[ 0.04382627, -0.00804489, 0.00046985, 0.00086524],
...,
[ 0.01231702, 0.01872649, 0.01534518, -0.0022179 ],
[ 0.04212831, -0.05289387, -0.03184881, -0.00078165],
[-0.04361605, -0.01297212, 0.00135886, 0.0057856 ]],
[[ 0.00232622, 0.01773357, 0.00795682, 0.00016406],
[-0.04367355, -0.02387383, -0.00448453, 0.0008559 ],
[ 0.01256918, 0.06565425, 0.05170755, 0.00046948],
...,
[ 0.04457427, -0.01816762, 0.00068176, 0.00186112],
[ 0.00220281, -0.01119046, 0.0103347 , -0.00089715],
[ 0.02178122, 0.03183001, 0.00959293, -0.00057862]],
...,
[[ 0.06338153, 0.01641472, 0.01962643, -0.00256244],
[ 0.07537754, -0.0442643 , -0.00362656, 0.00153777],
[ 0.0505006 , 0.0070783 , 0.01756948, 0.0029576 ],
...,
[ 0.03524508, -0.03547517, -0.00664972, -0.00095385],
[-0.03699107, 0.02256328, 0.00300107, 0.00253193],
[-0.0199608 , -0.00536222, 0.01370301, -0.00131981]],
[[ 0.08601913, -0.00364473, 0.00946769, 0.00045275],
[ 0.01943327, 0.07420857, 0.00109217, -0.00183334],
[-0.04481884, -0.02515305, -0.02357894, -0.00198166],
...,
[-0.01221928, -0.01241903, 0.00928084, 0.00066379],
[ 0.10871802, -0.01264407, 0.00601223, 0.00090526],
[-0.02603179, -0.00413112, -0.006037 , 0.00522712]],
[[-0.02929114, 0.02188803, -0.00427137, 0.00250174],
[ 0.02479416, -0.01470632, -0.01355196, 0.00338125],
[-0.01915726, -0.00869161, 0.01451885, -0.00137969],
...,
[ 0.05398784, -0.00834729, -0.00437888, 0.00081602],
[ 0.00626345, -0.0261016 , -0.01484753, 0.00060499],
[ 0.05427697, 0.04006612, 0.03371313, -0.00203731]]])
>>>

You need to explicitly slice each secondary list, either in a loop or in list comprehensions. I built a 10x10 set of lists so you have to change the indexing to fit your problem:
x = []
for a in range(10):
x.append([10*a+n for n in range(10)])
# x is now a list of 10 lists, each of which has 10 elements
print(x)
x1 = [a[:5] for a in x]
# x1 is a list of containing the low elements of the secondary lists
x2 = [a[5:] for a in x]
# x2 is a list containing the high elements of the secondary lists
print(x1, x2)

Python slicing doesn't consider all dimension like this. Your expression makes a copy of the entire list, scenarios[:], and then takes the first 60 elements of the copy. You need to write a comprehension to grab the elements you want.
Perhaps
[scenarios[x][y][z]
for x in range(len(scenarios))
for y in range(60)
for z in range(len(scenarios[0][0])) ]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Boolean sum with numpy finding matching pairs - python

Rather than commenting on your code here is a simpler implementation a=np.array([[1,1,10],[1,2,20],[2,1,30],[1,3,40],[2,3,50]]) xy= a[:,:2].tolist() [[x,y,z] for [x,y,z] in a if [y,x] in xy and x!=y] [[1, 2, 20], [2, 1, 30]]

Related

How to select in a numpy array all paris with a defined index difference?

Create a new array of all values from an array with step

How to slice an array around its minimun

Filter numpy ndarray with another ndarray, row by row

List of List of List slicing in Python

Categories

Resources