NumPy - generate multiple intervals - python

I have an array like this:
[[0.13, 0.19],
[0.25, 0.6 ],
[0.7 , 0.89]]
I want, given the above array, to create a result like this:
[[0, 0.12],
[0.13, 0.19],
[0.20, 0.24],
[0.25, 0.60],
[0.61, 0.69],
[0.70, 0.89],
[0.90, 1]]
Namely, I want to create a total matrix of intervals, given a pre-defined intervals.

This isn't specific to numpy, but maybe it will point you in the correct direction.
Basically, you need to know where to start, end, and the 'resolution' (for lack of a better word) — how far apart the gaps are. With that you can loop through the existing intervals and fill in the others. You'll want to watch the edge cases where the intervals are already filled in — like one starting a 0 or [0.6, 0.8], [0.9, 0.95] so you don't fill those in twice. This might look something like:
def fill_intervals(existing_intervals, start=0, end=1.0, inc=0.01):
l2 = []
for i in l:
if start < i[0]:
l2.append([start, i[0] - inc])
l2.append(i)
start = i[1] + inc
if start < end:
l2.append([start, end])
return l2
l = [
[0.13, 0.19],
[0.25, 0.6 ],
[0.7 , 0.89]
]
fill_intervals(l)
Returning:
[[0, 0.12],
[0.13, 0.19],
[0.2, 0.24],
[0.25, 0.6],
[0.61, 0.69],
[0.7, 0.89],
[0.9, 1.0]]

You can duplicate items and then make it quite close:
arr = np.array([[0.13, 0.19], [0.25, 0.6 ], [0.7 , 0.89]])
consecutive = np.r_[0, np.repeat(arr, 2), 1]
intervals = consecutive.reshape(-1, 2)
intervals:
array([[0. , 0.13], # required: [0, 0.12]
[0.13, 0.19], # OK
[0.19, 0.25], # required: [0.20, 0.24]
[0.25, 0.6 ], # OK
[0.6 , 0.7 ], # required: [0.61, 0.69]
[0.7 , 0.89], # OK
[0.89, 1. ]])# required: [0.9, 1]
It seems you need to fix alternate intervals so just do:
intervals[2::2,0] = intervals[2::2,0] + 0.01
intervals[:-1:2,1] = intervals[:-1:2,1] - 0.01
intervals:
array([[0. , 0.12],
[0.13, 0.19],
[0.2 , 0.24],
[0.25, 0.6 ],
[0.61, 0.69],
[0.7 , 0.89],
[0.9 , 1. ]])

You can use linspace to create your intervals
import numpy as np
>>> np.linspace(0, 1, num=3, endpoint=False)
array([0. , 0.33333333, 0.66666667])

Related

Add a scalar to a numpy matrix based on the indices in a different numpy array

I'm sorry if this question isn't framed well. So I would rather explain with an example.
I have a numpy matrix:
a = np.array([[0.5, 0.8, 0.1], [0.6, 0.9, 0.3], [0.7, 0.4, 0.8], [0.8, 0.7, 0.6]])
And another numpy array as shown:
b = np.array([1, 0, 2, 2])
With the given condition that values in b will be in the range(a.shape[1]) and that b.shape[1] == a.shape[0]. Now this is the operation I need to perform.
For every index i of a, and the corresponding index i of b, I need to subtract 1 from the index j of a[i] where j== b[i]
So in my example, a[0] == [0.5 0.8 0.1] and b[0] == 1. Therefore I need to subtract 1 from a[0][b[0]] so that a[0] = [0.5, -0.2, 0.1]. This has to be done for all rows of a. Any direct solution without me having to iterate through all rows or columns one by one?
Thanks.
Use numpy indexing. See this post for a nice introduction:
import numpy as np
a = np.array([[0.5, 0.8, 0.1], [0.6, 0.9, 0.3], [0.7, 0.4, 0.8], [0.8, 0.7, 0.6]])
b = np.array([1, 0, 2, 2])
a[np.arange(a.shape[0]), b] -= 1
print(a)
Output
[[ 0.5 -0.2 0.1]
[-0.4 0.9 0.3]
[ 0.7 0.4 -0.2]
[ 0.8 0.7 -0.4]]
As an alternative use substract.at:
np.subtract.at(a, (np.arange(a.shape[0]), b), 1)
print(a)
Output
[[ 0.5 -0.2 0.1]
[-0.4 0.9 0.3]
[ 0.7 0.4 -0.2]
[ 0.8 0.7 -0.4]]
The main idea is that:
np.arange(a.shape[0]) # shape[0] is equals to the numbers of rows
generates the indices of the rows:
[0 1 2 3]

Extract arrays based on positions indicated in another array

I have the data below as an example:
import numpy as np
data=[np.array([[0.9,0.6,0.5,0.4,0.7],[0.8,0.0,0.0,0.8,0.2],
[0.9,0.0,0.4,0.4,0.3],[0.9,0.6,0.3,0.2,0.5],[0.8,0.0,0.3,0.1,0.5]]),
np.array([[0.9,0.0,0.2,0.4,0.3],[0.0,0.2,0.4,0.0,0.0],
[0.0,0.0,0.0,0.2,0.0],[0.5,0.0,0.3,0.6,0.8],[0.5,0.6,0.9,0.0,0.0]])]
and I want to extract the relevant data based on these positions below:
positions_non_zero=[np.array([2,3,4]),np.array([1,4])]
the desired output should be this:
[array([[0.9, 0. , 0.4, 0.4, 0.3],
[0.9, 0.6, 0.3, 0.2, 0.5],
[0.8, 0. , 0.3, 0.1, 0.5]]),
array([[0. , 0.2, 0.4, 0. , 0. ],
[0.5, 0.6, 0.9, 0. , 0. ]])]
The reason is this:
The problem with my code is that only the np.array([1,4]) is taken under consideration.
My code:
df_class11=[]
for n in data:
def data_target(df_class_target):
for z in df_class_target:
x_classA=[n[i] for i in z]
x_classA=np.vstack(x_classA)
return x_classA
df_class11.append(data_target(positions_non_zero))
df_class11

Using np.arange to create list of coordinate pairs

I am trying to make a program faster and I found this post and I want to implement a solution that resembles the fourth case given in that question.
Here is the relevant part of the code I am using:
count = 0
hist_dat = np.zeros(r**2)
points = np.zeros((r**2, 2))
for a in range(r):
for b in range(r):
for i in range(N):
for j in range(N):
hist_dat[count] += retval(a/r, (a+1)/r, data_a[i][j])*retval(b/r, (b+1)/r, data_b[i][j])/N
points[count][0], points[count][1] = (a+0.5)/r, (b+0.5)/r
count += 1
What this code does is generate the values of a normalized 2D histogram (with "r" divisions in each direction) and the coordinates for those values as numpy.ndarray.
As you can see in the other question linked, I am currently using the second worst possible solution and it takes several minutes to run.
For starters I want to change what the code is doing for the points array (I think that once I can see how that is done I could figure something out for hist_dat). Which is basically this:
In the particular case I am working on, both A and B are the same. So for example, it could be like going from array([0, 0.5, 1]) to array([[0,0], [0,0.5], [0,1], [0.5,0], [0.5,0.5], [0.5,1], [1,0], [1,0.5], [1,1]])
Is there any method for numpy.ndarray or an operation with the np.arange() that does what the above diagram shows without requiring for loops?
Or is there any alternative that can do this as fast as what the linked post showed for the np.arange()?
You can use np.c_ to combine the result of np.repeat and np.tile:
import numpy as np
start = 0.5
end = 5.5
step = 1.0
points = np.arange(start, end, step) # [0.5, 1.5, 2.5, 3.5, 4.5]
output = np.c_[np.repeat(points, n_elements), np.tile(points, n_elements)]
print(output)
Output:
[[0.5 0.5]
[0.5 1.5]
[0.5 2.5]
[0.5 3.5]
[0.5 4.5]
[1.5 0.5]
[1.5 1.5]
[1.5 2.5]
[1.5 3.5]
[1.5 4.5]
[2.5 0.5]
[2.5 1.5]
[2.5 2.5]
[2.5 3.5]
[2.5 4.5]
[3.5 0.5]
[3.5 1.5]
[3.5 2.5]
[3.5 3.5]
[3.5 4.5]
[4.5 0.5]
[4.5 1.5]
[4.5 2.5]
[4.5 3.5]
[4.5 4.5]]
maybe np.mgird would help?
import numpy as np
np.mgrid[0:2:.5,0:2:.5].reshape(2,4**2).T
Output:
array([[0. , 0. ],
[0. , 0.5],
[0. , 1. ],
[0. , 1.5],
[0.5, 0. ],
[0.5, 0.5],
[0.5, 1. ],
[0.5, 1.5],
[1. , 0. ],
[1. , 0.5],
[1. , 1. ],
[1. , 1.5],
[1.5, 0. ],
[1.5, 0.5],
[1.5, 1. ],
[1.5, 1.5]])

How to select the first 3 rows of every group in pandas?

I get a pandas dataframe like this:
id prob
0 1 0.5
1 1 0.6
2 1 0.4
3 1 0.2
4 2 0.3
6 2 0.5
...
I want to group it by 'id', sort descending order and get the first 3 prob of every group. Note that some groups contain rows less than 3.
Finally I want to get a 2D array like:
[[1, 0.6, 0.5, 0.4], [2, [0.5, 0.3]]...]
How can I do that with pandas?
Thanks!
Use sort_values, groupby, and head:
df.sort_values(by=['id','prob'], ascending=[True,False]).groupby('id').head(3).values
Output:
array([[ 1. , 0.6],
[ 1. , 0.5],
[ 1. , 0.4],
[ 2. , 0.5],
[ 2. , 0.3]])
Following #COLDSPEED lead:
df.sort_values(by=['id','prob'], ascending=[True,False])\
.groupby('id').agg(lambda x: x.head(3).tolist())\
.reset_index().values.tolist()
Output:
[[1, [0.6, 0.5, 0.4]], [2, [0.5, 0.3]]]
You can use groupby and nlargest
df.groupby('id').prob.nlargest(3).reset_index(1,drop = True)
id
1 0.6
1 0.5
1 0.4
2 0.5
2 0.3
For the array
df1 = df.groupby('id').prob.nlargest(3).unstack(1)#.reset_index(1,drop = True)#.set_index('id')
np.column_stack((df1.index.values, df1.values))
You get
array([[ 1. , 0.5, 0.6, 0.4, nan, nan],
[ 2. , nan, nan, nan, 0.3, 0.5]])
If you're looking for a dataframe of array columns, you can use np.sort:
df = df.groupby('id').prob.apply(lambda x: np.sort(x.values)[:-4:-1])
df
id
1 [0.6, 0.5, 0.4]
2 [0.5, 0.3]
To retrieve the values, reset_index and access:
df.reset_index().values
array([[1, array([ 0.6, 0.5, 0.4])],
[2, array([ 0.5, 0.3])]], dtype=object)
[[n, g.nlargest(3).tolist()] for n, g in df.groupby('id').prob]
[[1, [0.6, 0.5, 0.4]], [2, [0.5, 0.3]]]

Python array using numpy

I am confused about doing vectorization using numpy.
In particular, I have a matrix of this form:
of type <type 'list'>
[[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]]
How do I make it look like the following using numpy?
[[ 0.0 0.0 0.0 0.0 ]
[ 0.02 0.04 0.0325 0.04 ]
[ 1 2 3 4 ]]
Yes, I know I can do it using:
np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
But I have a very long matrix, and I can't just type out each rows like that. How can I handle the case when I have a very long matrix?
This is not a matrix of type list, it is a list that contains lists. You may think of it as matrix, but to Python it is just a list
alist = [[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]]
arr = np.array(alist)
works just the same as
arr = np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
This creates 2d array, with shape (3,4) and dtype float
In [212]: arr = np.array([[0.0, 0.0, 0.0, 0.0], [0.02, 0.04, 0.0325, 0.04], [1, 2, 3, 4]])
In [213]: arr
Out[213]:
array([[ 0. , 0. , 0. , 0. ],
[ 0.02 , 0.04 , 0.0325, 0.04 ],
[ 1. , 2. , 3. , 4. ]])
In [214]: print(arr)
[[ 0. 0. 0. 0. ]
[ 0.02 0.04 0.0325 0.04 ]
[ 1. 2. 3. 4. ]]
Assuming you start with a large array, why not split it into arrays of the right size (n):
splitted = [l[i:i + n] for i in range(0, len(array), n)]
and make the matrix from that:
np.array(splitted)
If you're saying you have a list of lists stored in Python object A, all you need to do is call np.array(A) which will return a numpy array using the elements of A. Otherwise, you need to specify what form your data is in right now to clarify how you want to load your data.

Categories

Resources