Mapping in Python list - python

I have a list of index stored in a list of tuples:
index=[(0,0), (0,1), (1,0), (1,1) ....]
These indexes will be used to calculate energy in an image im (a numpy array) in the following formula:
(1-im[0,0])^2+(1-im[0,1])^2+....
im here is a two dimensional numpy array. Here's an example of im:
im=Image.open('lena_noisy.png')
im=numpy.array(im)
print im
[[168 133 131 ..., 127 213 107]
[174 151 111 ..., 191 88 122]
[197 173 143 ..., 182 153 125]
...,
[ 34 15 6 ..., 111 95 104]
[ 37 15 57 ..., 121 133 134]
[ 49 39 58 ..., 115 74 107]]
How to use map function of list to perform this calculation?

If you break index into two tuples, xidx and yidx then you can use fancy indexing to access all the im values as one numpy array.
Then the calculation becomes simple to express, and faster than doing a Python loop (or list comprehension):
import numpy as np
xidx, yidx = zip(*index)
print(((1-im[xidx, yidx])**2).sum())
import numpy as np
import scipy.misc as misc
im = misc.lena()
n = min(im.shape)
index = np.random.randint(n, size = (10000,2)).tolist()
def using_fancy_indexing(index, im):
xidx, yidx = zip(*index)
return (((1-im[xidx, yidx])**2).sum())
def using_generator_expression(index, im):
return sum(((1 - im[i[0], i[1]]) ** 2) for i in index)
Here is a comparison using timeit:
In [27]: %timeit using_generator_expression(index, im)
100 loops, best of 3: 17.9 ms per loop
In [28]: %timeit using_fancy_indexing(index, im)
100 loops, best of 3: 2.07 ms per loop
Thus, depending on the size of index, using fancy indexing could be 8x faster than using a generator expression.

Like this, using a generator expression:
sum((1-im[i][j])**2 for i, j in index)
That is, assuming that im is a two-dimensional list and index is a list of coordinates in im. Notice that in Python, a two-dimensional list is accessed like this: m[i][j] and not like this: m[i,j].

Using sum and a generator expression:
sum(((1 - im[i[0], i[1]]) ** 2) for i in index)
If index is also a numpy array you can use the array as an index:
sum(((1 - im[i]) ** 2) for i in index)

Related

Create matrix 100x100 each row with next ordinal number

I try to create a matrix 100x100 which should have in each row next ordinal number like below:
I created a vector from 1 to 100 and then using for loop I copied this vector 100 times. I received an array with correct data so I tried to sort arrays using np.argsort, but it didn't worked as I want (I don't know even why there are zeros in after sorting).
Is there any option to get this matrix using another functions? I tried many approaches, but the final layout was not what I expected.
max_x = 101
z = np.arange(1,101)
print(z)
x = []
for i in range(1,max_x):
x.append(z.copy())
print(x)
y = np.argsort(x)
y
argsort returns the indices to sort by, that's why you get zeros. You don't need that, what you want is to transpose the array.
Make x a numpy array and use T
y = np.array(x).T
Output
[[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
[ 3 3 3 ... 3 3 3]
...
[ 98 98 98 ... 98 98 98]
[ 99 99 99 ... 99 99 99]
[100 100 100 ... 100 100 100]]
You also don't need to loop to copy the array, use np.tile instead
z = np.arange(1, 101)
x = np.tile(z, (100, 1))
y = x.T
# or one liner
y = np.tile(np.arange(1, 101), (100, 1)).T
import numpy as np
np.asarray([ (k+1)*np.ones(100) for k in range(100) ])
Or simply
np.tile(np.arange(1,101),(100,1)).T

replace repeated values with counting up values in Numpy (vectorized)

I have an array of repeated values that are used to match datapoints to some ID.
How can I replace the IDs with counting up index values in a vectorized manner?
Consider the following minimal example:
import numpy as np
n_samples = 10
ids = np.random.randint(0,500, n_samples)
lengths = np.random.randint(1,5, n_samples)
x = np.repeat(ids, lengths)
print(x)
Output:
[129 129 129 129 173 173 173 207 207 5 430 147 143 256 256 256 256 230 230 68]
Desired solution:
indices = np.arange(n_samples)
y = np.repeat(indices, lengths)
print(y)
Output:
[0 0 0 0 1 1 1 2 2 3 4 5 6 7 7 7 7 8 8 9]
However, in the real code, I do not have access to variables like ids and lengths, but only x.
It does not matter what the values in x are, I just want an array with counting up integers which are repeated the same amount as in x.
I can come up with solutions using for-loops or np.unique, but both are too slow for my use case.
Has anyone an idea for a fast algorithm that takes an array like x and returns an array like y?
You can do:
y = np.r_[False, x[1:] != x[:-1]].cumsum()
Or with one less temporary array:
y = np.empty(len(x), int)
y[0] = 0
np.cumsum(x[1:] != x[:-1], out=y[1:])
print(y)

Selecting image region based on boolean image mask values

I have a rgb image, for example
img_rgb[:,:,0] = [ 125 160; 130 125];
img_rgb[:,:,1] = [ 125 160; 130 125];
img_rgb[:,:,2] = [ 125 160; 130 125];
and a mask boolean image whose size equals the size of img_rgb e.g
mask[:,:] = [ 1 0; 0 1]
for every zero value of mask, I would like to associate a nan value in the img-rgb, thus obtaining the following
img_rgb[:,:,0] = [ 125 nan; nan 125]
img_rgb[:,:,1] = [ 125 nan; nan 125]
img_rgb[:,:,2] = [ 125 nan; nan 125]
Since my image array is really big (length size 10000px) I would like to do that as fast as possible and thus avoiding a double for cycle. In Matlab I would use the logical operator
img_rgb(repmat(mask,1,1,3)==0)=nan;
how can I do something similar in python? python v.2.7
Thanks in advance
When you use numpy arrays, you can use boolean indexing similar to Matlab in python.
Broadcasting will take care of the repmat for you. So you can do just:
import numpy as np
img_rgb[mask == 0] = np.Nan

Indexing Multi-dimensional arrays

I know that multidimensional numpy arrays may be indexed with other arrays, but I did not figure out how the following works:
I would like to have the the items from raster, a 3d numpy array, based on indx, a 3d index array:
raster=np.random.rand(5,10,50)
indx=np.random.randint(0, high=50, size=(5,10,3))
What I want is another array with dimensions of indx that holds the values of raster based on the index of indx.
What we need in order to properly resolve your indices during broadcasting are two arrays a and b so that raster[a[i,j,k],b[i,j,k],indx[i,j,k]] will be raster[i,j,indx[i,j,k]] for i,j,k in corresponding ranges for indx's axes.
The easiest solution would be:
x,y,z = indx.shape
a,b,_ = np.ogrid[:x,:y,:z]
raster[a,b,indx]
Where np.ogrid[...] creates three arrays with shapes (x,1,1), (1,y,1) and (1,1,z). We don't need the last one so we throw it away. Now when the other two are broadcast with indx they behave exactly the way we need.
If I understood the question correctly, for each row of indx, you are trying to index into the corresponding row in raster, but the column numbers vary depending on the actual values in indx. So, with that assumption, you can use a vectorized approach that uses linear indexing, like so -
M,N,R = raster.shape
linear_indx = R*np.arange(M*N)[:,None] + indx.reshape(M*N,-1)
out = raster.ravel()[linear_indx].reshape(indx.shape)
I'm assuming that you want to get 3 random values from each of the 3rd dimension arrays.
You can do this via list-comprehension thanks to advanced indexing
Here's an example using less number of values and integers so the output is easier to read:
import numpy as np
raster=np.random.randint(0, high=1000, size=(2,3,10))
indices=np.random.randint(0, high=10, size=(2,3,3))
results = np.array([ np.array([ column[col_indices] for (column, col_indices) in zip(row, row_indices) ]) for (row, row_indices) in zip(raster, indices) ])
print("Raster:")
print(raster)
print("Indices:")
print(indices)
print("Results:")
print(results)
Output:
Raster:
[[[864 353 11 69 973 475 962 181 246 385]
[ 54 735 871 218 143 651 159 259 785 383]
[532 476 113 888 554 587 786 172 798 232]]
[[891 263 24 310 652 955 305 470 665 893]
[260 649 466 712 229 474 1 382 269 502]
[323 513 16 236 594 347 129 94 256 478]]]
Indices:
[[[0 1 2]
[7 5 1]
[7 8 9]]
[[4 0 2]
[6 1 4]
[3 9 2]]]
Results:
[[[864 353 11]
[259 651 735]
[172 798 232]]
[[652 891 24]
[ 1 649 229]
[236 478 16]]]
It iterates simultaneously over the corresponding 3rd dimension arrays in raster and indices and uses advanced indexing to slice the desired indices from raster.
Here's a more verbose version that does the exact same thing:
results = []
for i in range(len(raster)):
row = raster[i]
row_indices = indices[i]
row_results = []
for j in range(len(row)):
column = row[j]
column_indices = row_indices[j]
column_results = column[column_indices]
row_results.append(column_results)
results.append(np.array(row_results))
results = np.array(results)

Finding range of a numpy array elements

I have a NumPy array of size 94 x 155:
a = [1 2 20 68 210 290..
2 33 34 55 230 340..
.. .. ... ... .... .....]
I want to calculate the range of each row, so that I get 94 ranges in a result. I tried looking for a numpy.range function, which I don't think exists. If this can be done through a loop, that's also fine.
I'm looking for something like numpy.mean, which, if we set the axis parameter to 1, returns the mean for each row in the N-dimensional array.
I think np.ptp might do what you want:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ptp.html
r = np.ptp(a,axis=1)
where r is your range array.
Try this:
def range_of_vals(x, axis=0):
return np.max(x, axis=axis) - np.min(x, axis=axis)

Categories

Resources