Batch-wise loop over array Python to create new array without overwriting

Batch-wise loop over array Python to create new array without overwriting - python

I want to iterate over a 3d array (sequences) with shape (1134500, 1, 50)
array([[[1000, 1000, 1000, ..., 1005, 1005, 1005]],
[[1000, 1000, 1000, ..., 1004, 1005, 1004]],
[[1000, 1000, 1000, ..., 1004, 1005, 1004]],
...,
[[1000, 1000, 1000, ..., 1005, 1005, 1004]],
[[1000, 1000, 1000, ..., 1005, 1005, 1005]],
[[1000, 1000, 1000, ..., 1004, 1005, 1004]]], dtype=int32)
To do this, I use the following for loop, which works well except for it overwriting the results from the batch before:
batchsize = 500
for i in range(0, sequences.shape[0], batchsize):
batch = sequences[i:i+batchsize]
relevances = lrp_model.lrp(batch)
As a result, I want an array (relevances) with shape (1134500, 1, 50), but I get one with shape (500, 1, 50)
Can someone tell me what's going wrong?

In case you want to save the relevances, maybe
batchsize = 500
relevances = np.zeros(sequences.shape)
for i in range(0, sequences.shape[0], batchsize):
batch = sequences[i:i+batchsize]
relevances[i:i+batchsize, :, :] = lrp_model.lrp(batch)

Related

Consecutively split an array by the next max value

Suppose I have an array (the elements can be floats also):
D = np.array([0,0,600,160,0,1200,1800,0,1800,900,900,300,1400,1500,320,0,0,250])
The goal is, starting from the beginning of the array, to find the max value (the last one if there are several equal ones) and cut the anterior part of the array. Then consecutively repeat this procedure till the end of the array. So, the expected result would be:
[[0,0,600,160,0,1200,1800,0,1800],
[900,900,300,1400,1500],
[320],
[0,0,250]]
I managed to find the last max value:
D_rev = D[::-1]
last_max_index = len(D_rev) - np.argmax(D_rev) - 1
i.e. I can get the first subarray of the desired answer. And then I can use a loop to get the rest.
My question is, if there is a numpy way to do it without looping?

IIUC, you can take the reverse cumulated max (see accumulate) of D to form groups, then split with itertools.groupby:
D = np.array([0,0,600,160,0,1200,1800,0,1800,900,900,300,1400,1500,320,0,0,250])
groups = np.maximum.accumulate(D[::-1])[::-1]
# array([1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1800, 1500, 1500,
# 1500, 1500, 1500, 320, 250, 250, 250])
from itertools import groupby
out = [list(list(zip(*g))[0]) for _, g in groupby(zip(D, groups), lambda x: x[1])]
# [[0, 0, 600, 160, 0, 1200, 1800, 0, 1800],
# [900, 900, 300, 1400, 1500],
# [320],
# [0, 0, 250]]

Shifting values with np.apply_along_axis returns inconsistent results

TL;DR: np.apply_along_axis works for a certain array with shape (1561, 338) which is a subset of another array with shape (351225, 338) for which it fails.
I am trying to apply the following function:
def add_min(a):
return a + abs(a.min()) if a.min() < 0 else a
x_train has shape (1561, 15, 15, 338) (n * height * width * channels) and I need to shift all values to positive to be able to log normalize my data. I want to do that per channel, for obvious reasons.
Now if I reshape x_train: x_train = x_train.reshape(-1, 338) and get shape (351225, 338)
I should be able to perform:
x_train = np.apply_along_axis(add_min, 0, x_train)
However...
Before:
x_train.min()
>> -2147483648
After:
x_train.min()
>> -2147370103
In other words, it does not work. On the other hand, if I only keep the center pixel:
# Keep the center value of axes (1, 2)
x_train = x_train[:, x_train.shape[1]//2, x_train.shape[2]//2, :]
x_train.shape
>> (1561, 338)
x_train.min()
>> -32768 # strange coincidence that this location in the image has a much smaller value range
x_train = np.apply_along_axis(add_min, 0, x_train)
x_train.min()
>> 0
I think it has something to do with the large negative values, because if I select random indices in the 2 center axes (i.e. 1 and 8) instead of the middle (7, 7) I again get x_train.min() of -2147483648 and -2147369934, before and after np.apply_along_axis, respectively.
So what am I doing wrong? Is there a better way I can achieve my goal?

The overflow on int32 is a good guess if the problem is "random". But using apply_along_axis may be making it harder to diagnose the issue, since it's wrapping your function in an (obscure) iteration. It should be easier to diagnose things with whole array calculations.
Make a modest array a mix of min values:
In [77]: A = np.random.randint(-50,1000,(4,8))
In [78]: A
Out[78]:
array([[151, 531, 765, 379, 89, 499, 818, 848],
[873, -12, -45, 900, 416, 838, 603, 849],
[540, 0, 1, 589, 297, 566, 688, 556],
[ 53, 170, 461, -16, -6, 480, 321, 392]])
Your function:
In [79]: np.apply_along_axis(add_min, 0, A)
Out[79]:
array([[151, 543, 810, 395, 95, 499, 818, 848],
[873, 0, 0, 916, 422, 838, 603, 849],
[540, 12, 46, 605, 303, 566, 688, 556],
[ 53, 182, 506, 0, 0, 480, 321, 392]])
Let's create an whole-array equivalent. First find the min with a axis specification:
In [80]: am = np.min(A, axis=0, keepdims=True)
In [81]: am
Out[81]: array([[ 53, -12, -45, -16, -6, 480, 321, 392]])
Now create a shift array that imitates your function (without the if that only works for scalars):
In [82]: shift=np.abs(am)
In [83]: shift[am>=0]=0
In [84]: shift
Out[84]: array([[ 0, 12, 45, 16, 6, 0, 0, 0]])
In [85]: A+shift
Out[85]:
array([[151, 543, 810, 395, 95, 499, 818, 848],
[873, 0, 0, 916, 422, 838, 603, 849],
[540, 12, 46, 605, 303, 566, 688, 556],
[ 53, 182, 506, 0, 0, 480, 321, 392]])
There are other ways of getting that shift, but the same basic idea applies, using the am<0 to determine which columns get the shift.
This will also be faster.

Make an array like numpy.array() without numpy

I've an image processing task and we're prohibited to use NumPy so we need to code from scratch. I've done the logic image transformation but now I'm stuck on creating an array without numpy.
So here's my last output code :
Output :
new_log =
[[236,
232,
226,
.
.
.
198,
204]]
I need to convert this to an array so I can write the image like this (with Numpy)
new_log =
array([[236, 232, 226, ..., 208, 209, 212],
[202, 197, 187, ..., 198, 200, 203],
[192, 188, 180, ..., 205, 206, 207],
...,
[233, 226, 227, ..., 172, 189, 199],
[235, 233, 228, ..., 175, 182, 192],
[235, 232, 228, ..., 195, 198, 204]], dtype=uint8)
cv.imwrite('log_transformed.jpg', new_log)
# new_log must be shaped like the second output

You can make a straightforward function to take your list and reshape it in a similar way to NumPy's np.reshape(). But it's not going to be fast, and it doesn't know anything about data types (NumPy's dtype) so... my advice is to challenge whoever it is that doesn't like NumPy. Especially if you're using OpenCV — it depends on NumPy!
Here's an example of what you could do in pure Python:
def reshape(l, shape):
"""Reshape a list.
Example
-------
>>> l = [1,2,3,4,5,6,7,8,9]
>>> reshape(l, shape=(3, -1))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
"""
nrows, ncols = shape
if ncols == -1:
ncols = len(l) // nrows
if nrows == -1:
nrows = len(l) // ncols
array = []
for r in range(nrows):
row = []
for c in range(ncols):
row.append(l[ncols*r + c])
array.append(row)
return array

Python find convolution kernel if input image and output image is known

I have a problem with convolution kernel in python. It is about simple convolution operator. I have input matrix and output matrix. I want to find a possible convolution kernel with size(5x5). How to solve this problem with python, numpy or tensorflow ?
import scipy.signal as ss
input_img = np.array([[94, 166, 76, 106, 152, 232],
[48, 242, 30, 98, 46, 210],
[52, 60, 86, 60, 216, 248],
[52, 236, 116, 240, 224, 184],
[138, 160, 146, 254, 236, 252],
[94, 100, 224, 246, 152, 74]], dtype=float)
output_img = np.array([[15, 49, 23, 105, 0, 0],
[43,30, 108, 124, 0, 0],
[58, 120, 112, 92, 0, 0],
[73, 127, 118, 126, 0, 0],
[112, 123, 76, 37, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=float)
# I want to find this kernel
conv = np.zeros((5,5), dtype=int)
# So if I do convolution operator, output_img will resulting a value same as I defined above
output_img = ss.convolve2d(input_img, conv, padding='same')

As far as I understood, you need to reconstruct window weights by given input, output arrays and window size. This is possible, I think, especially, if input array (image) is sufficiently big.
Look at the code below:
import scipy.signal as ss
import numpy as np
source_dataset = np.random.rand(20, 10)
sample_convolution = np.diag([1, 1, 1])
output_dataset = ss.convolve2d(data, sample_convolution, mode='same')
conv_size = c.shape[0]
# Given output_dataset, source_datset, and conv_size we need to reconstruct
# window weights.
def reconstruct(data, output, csize):
half_size = int(csize / 2)
min_row_ind = half_size
max_row_ind = int(data.shape[0]) - half_size
min_col_ind = half_size
max_col_ind = int(data.shape[1]) - half_size
A = list()
b = list()
for i in np.arange(min_row_ind, max_row_ind, dtype=int):
for j in np.arange(min_col_ind, max_col_ind, dtype=int):
A.append(data[(i - half_size):(i + half_size + 1), (j - half_size):(j + half_size + 1)].ravel().tolist())
b.append(output[i, j])
if len(A) == csize * csize and np.linalg.matrix_rank(A) == csize * csize:
return (np.linalg.pinv(A)#np.array(b)[:, np.newaxis]).reshape(csize, csize)
if len(A) < csize*csize:
raise Exception("Insufficient data")
result = reconstruct(source_dataset, output_dataset, 3)
I got the following result
array([[ 1.00000000e+00, -1.77635684e-15, -1.11022302e-16],
[ 0.00000000e+00, 1.00000000e+00, -8.88178420e-16],
[ 0.00000000e+00, -1.22124533e-15, 1.00000000e+00]])
So, it works as expected; but definitely need to be improved to take into account edge effects, case when size of window is even etc.

Subtract 3 lists in a tuple from 1 tuple

What is the best way to do this? Looking to take the difference but not like this horrible way. For each A, B, C it is subtracted from subtract from
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
triples= [tuple(A),tuple(B), tuple(C)]
subtract_from = tuple([1234,4321,1234,4321,5555])
diff = []
for main in subtract_from:
for i in range(len(triples)):
for t in triples[i]:
diff[i].append(main-t)

Try something like this:
all_lists = [A, B, C]
[[i-j for i,j in zip(subtract_from,l)] for l in all_lists]
[
[734, 3821, 734, 3821, 555],
[1134, 4221, 694, 3771, 4355],
[694, 4021, 934, 4221, 5545]
]
It is the best practice of doing this. no need to import any library, just use builtins.

You could try using map and operator:
import operator
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
l = [A, B, C]
subtract_from = [1234,4321,1234,4321,5555]
diff = list((list(map(operator.sub, subtract_from , i)) for i in l))
print(diff)
# [[734, 3821, 734, 3821, 555], [1134, 4221, 694, 3771, 4355], [694, 4021, 934, 4221, 5545]]

First of all, if you want tuples, use tuples explicitly without converting lists. That being said, you should write something like this:
a = 500, 500, 500, 500, 5000
b = 100, 100, 540, 550, 1200
c = 540, 300, 300, 100, 10
vectors = a, b, c
data = 1234, 4321, 1234, 4321, 5555
diff = [
[de - ve for de, ve in zip(data, vec)]
for vec in vectors
]
If you want list of tuples, use tuple(de - ve for de, ve in zip(data, vec)) instead of [de - ve for de, ve in zip(data, vec)].

I think everyone else nails it with list comprehensions already so here's a few odd ones in cases if you are using a mutable lists and reusing it in an imperative style is acceptable style, then the following code can be done
A = [500, 500, 500, 500, 5000]
B = [100, 100, 540, 550, 1200]
C = [540, 300, 300, 100, 10]
subtract_from = (1234,4321,1234,4321,5555)
for i,x in enumerate(subtract_from):
A[i], B[i], C[i] = x-A[i], x-B[i], x-C[i]
# also with map
#for i,x in enumerate(zip(subtract_from,A,B,C)):
# A[i], B[i], C[i] = map(x[0].__sub__, x[1:])
diff = [A,B,C]
It's less elegant but more efficient*(...I have not done any benchmark for this claim)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Batch-wise loop over array Python to create new array without overwriting - python

In case you want to save the relevances, maybe batchsize = 500 relevances = np.zeros(sequences.shape) for i in range(0, sequences.shape[0], batchsize): batch = sequences[i:i+batchsize] relevances[i:i+batchsize, :, :] = lrp_model.lrp(batch)

Related

Consecutively split an array by the next max value

Shifting values with np.apply_along_axis returns inconsistent results

Make an array like numpy.array() without numpy

Python find convolution kernel if input image and output image is known

Subtract 3 lists in a tuple from 1 tuple

Categories

Resources