Struggling with numpy libs where()

Struggling with numpy libs where() - python

I somehow got mixed up with primitive AI and came across this code which I am having hard time understanding.
I read some site but none seem to have answer I am looking for. :(
Could anyone explain np.where() function in this scenario?
It occured to me that this line of code makes child_pos an empty 2d array
if curr_node.get_curr_child() == 0
But I am not sure... Glad for every response.
The code in question is:
child_pos = np.where(np.asarray(curr_node.get_curr_child()) == 0)[0][0]

Disregarding your code, np.where returns the positions of the values you are searching for in the where statement.
For example:
Let's assume
matrix = array([[1., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
If we were to run np.where(matrix == 0) what we would get is
(array([1, 2], dtype=int64),
array([1, 2], dtype=int64))
Which basically gives you the row/column positions of the value 0 in the original 2-dimensional array. The first array represents the row positions and the second array represents the column positions.
This logic extends to higher/lower dimensions as well.
Returning to your code, you turn the result of get_curr_child into an np array and then you fetch the first value from the first dimension of the np.where result.

Related

Calculate the mean over a mixed data structure

I have a list of lists that looks something like:
data = [
[1., np.array([2., 3., 4.]), ...],
[5., np.array([6., 7., 8.]), ...],
...
]
where each of the internal lists is the same length and contains the same data type/shape at each entry. I would like to calculate the mean over corresponding entries and return something of the same structure as the internal lists. For example, in the above case (assuming only two entries) I want the result to be:
[3., np.array([4., 5., 6.]), ...]
What is the best way to do this with Python?

data is a list, so a list comprehension seems like a natural option. Even if it were a numpy array, given that it's a jagged array, it wouldn't benefit from being wrapped in an ndarray anyway, so a list comp would still be the best option, in my opinion.
Anyway, use zip() to "transpose" data and call np.mean() in a loop to find mean along the first axis.
[np.mean(x, axis=0) for x in zip(*data)]
# [3.0, array([4., 5., 6.]), array([[2., 2.], [2., 2.]])]

if you have a list exactly the same as the one shown in the example, you can do it with the following code.
First we declare some variables to store our results:
number_sum = 0
list_sum = np.array([0,0,0])
It is important that you initialize the values you need to 0 in list_sum. That is, if the data array contains 5 elements, that array should be: list_sum = np.array([0,0,0,0,0]).
The next step is to perform the sum of all elements in data. First we add the int values and then we perform the sum of each element of the list as follows:
for number, nparray in data:
number_sum += number
for index, item in enumerate(nparray):
list_sum[index] += item
Since we know how the variable data is structured (each input is made up of an int value and an np.array) we can do the addition that way. Although be careful with the computational complexity because in examples with longer arrays it could become very high in terms of complexity, since two for loops are being nested.
Finally, you can check that if you divide the sum of the elements by the length of data you get the desired value:
print(number_sum/len(data))
print(list_sum/len(data))
Now you just have to add those two new values to a new list. I hope it helps, greetings and good luck!

The following works:
import numpy as np
data = [
[1., np.array([2., 3., 4.]), np.array([[1., 1.], [1., 1.]])],
[5., np.array([6., 7., 8.]), np.array([[3., 3.], [3., 3.]])],
]
number_of_samples = len(data)
number_of_elements = len(data[0])
means = []
for ielement in range(number_of_elements):
mean_list = []
for isample in range(number_of_samples):
mean_list.append(data[isample][ielement])
mean_list = np.stack(mean_list)
mean = mean_list.mean(axis=0)
means.append(mean)
print(means)
but is a bit ugly, nests a for loops, and does not seem to be very pythonic. Any improvements over this are welcomed.

How to slice a np.array using a list of unknown length?

I am probably using the wrong names/notation (and an answer probably exists on SO, but I can't find it). Please help me clarify so that I can update the post, and help people like me in the future.
I have an array A of unknown dimension n, and a list of indexes of unknown length l, where l<=n.
I want to be able to select the slice of A that correspond to the indexes in l. I.e I want:
A = np.zeros([3,4,5])
idx = [1,3]
B = # general command I am looking for
B_bad = A[idx] # shape = (2,4,5), not what I want!
B_manual = A[idx[0], idx[1]] # shape = (5), what I want, but as a general expression.
# it is okay that the indexes are in order i.e. 0, 1, 2, ...

You need a tuple:
>>> A[tuple(idx)]
array([0., 0., 0., 0., 0.])
>>> A[tuple(idx)].shape
(5,)
Indexing with a list doesn't have the same meaning. See numpy's documentation on indexing for more information.

More pythonian way for getting a mean value of an array

I am still having troubles adjusting to 'more pythonian ways' of writing code sometimes ... right now I am iterating over some values (x). I have many arrays and I always compare the first value of all the arrays, the second value ... shortly: a mean value of all the entries in an array by position in the array.
sum_mean_x = []
for i in range(0, int_points):
for j in range(0, len(x)):
mean_x.append(x[j][i])
sum_mean_x.append(sum(mean_x)/len(x))
mean_x = []
I am pretty sure that can be done super beautiful. I know I could change the second last line to something like sum_mean_x.append(mean_x.mean) but, I guess I miss some serious magic this way.

Use the numpy package for numeric processing. Suppose you have the following three lists in plain Python:
a1 = [1., 4., 6.]
a2 = [3., 7., 3.]
a3 = [2., 0., -1.]
And you want to get the mean value for each position. Arrange the vectors in a single array:
import numpy as np
a = np.array([a1, a2, a3])
Then you can get the per-column mean like this:
>>> a.mean(axis=0)
array([ 2. , 3.66666667, 2.66666667])

It sounds like what you're trying to do is treat your list of lists are a 2D array where each list is a row, and then average each column.
The obvious way to do this is to use NumPy, make it an actual 2D array, and just call mean by columns. See simleo's answer, which is better than what I was going to add here. :)
But if you want to stick with lists of lists, going by column effectively means transposing, and that means zip:
>>> from statistics import mean
>>> arrs = [[1., 2., 3.], [0., 0., 0.], [2., 4., 6.]]
>>> column_means = [mean(col) for col in zip(*arrs)]
>>> column_means
[1.0, 2.0, 3.0]
That statistics.mean is only in the stdlib in 3.4+, but it's based on stats on PyPI, and if yur Python is too old even for that, you can write it on your own. Getting the error handling right on the edge cases is tricky, so you probably want to look at the code from statistics, but if you're only dealing with values near 1, you can just do it the obvious way:
def mean(iterable):
total, length = 0.0, 0
for value in iterable:
total += value
length += 1
return total / length

ar1 = [1,2,3,4,5,6]
ar2 = [3,5,7,2,5,7]
means = [ (i+j)/2.0 for (i,j) in zip(ar1, ar2)]
print(means)

You mean something like
import numpy as np
ar1 = [1,2,3,4,5,6]
ar2 = [3,5,7,2,5,7]
mean_list = []
for i, j in zip(ar1, ar2):
mean_list.append(np.array([i,j]).mean())
print(mean_list)
[2.0, 3.5, 5.0, 3.0, 5.0, 6.5]

Efficient way to drop a column from a Numpy array?

If I have a very large numpy array with one useless column, how could I drop it without creating a copy of the original array?
np.delete(my_np_array, 0, 1)
The above code will return a copy of the array without the zero-th column. But instead I would like to simply delete that column from my_np_array since I don't need it. For very large datasets, the memory management becomes important and copying may not be an option.

If memory is the main concern, what you can do is move columns around within your array such that the unneeded column gets at the very end of your array, then use ndarray.resize, which modifies he array in-place, to shrink it down and discard the outer column.
You cannot simply remove the first column of an array in-place using the provided API, and I suspect it is because of the memory layout of an ndarray that maps multidimensional indexing to unidimensional byte-oriented addressing within blocks of contiguous memory.
The following example copies the last column into the first and then deletes the last (now unneeded), immediately purging the associated memory. So it basically removes the obsolete column from memory completely, at the cost of changing your column order.
D1, D2 = A.shape
A[:, 0] = A[:, D2-1]
A.resize((D1, D2-1), refcheck=False)
A.shape
# => would be (5, 4) if the shape was initially (5, 5) for example

If you use slicing numpy won't make a copy; in other words
a = numpy.array([1, 2, 3, 4, 5])
b = a[1:] # view elements from second to last, NOT making a copy
b[0] = 12 # Change first element of `b`, i.e. second of `a`
print a
will reply [1, 12, 3, 4, 5]
If you need to delete an element in the middle however a single slicing won't work.

Numpy arrays are immutable. So they can't be re-sized without creating a intermediate copy.
How to remove specific elements in a numpy array
Creating a view with slicing, and make a copy of that is probably the fastest you can do.
In [804]: a = np.ones((2,2))
In [805]: a
Out[805]:
array([[ 1., 1.],
[ 1., 1.]])
In [806]: np.resize(a,(3,2))
Out[806]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
In [807]: a <- a should now be resized if it was done inplace?
Out[807]:
array([[ 1., 1.],
[ 1., 1.]])

clearing elements of numpy array

Is there a simple way to clear all elements of a numpy array? I tried:
del arrayname
This removes the array completely. I am using this array inside a for loop that iterates thousands of times, so I prefer to keep the array but populate it with new elements every time.
I tried numpy.delete, but for my requirement I don't see the use of subarray specification.
*Edited*:
The array size is not going to be the same.
I allocate the space, inside the loop at the beginning, as follows. Please correct me if this is a wrong way to go about:
arrname = arange(x*6).reshape(x,6)
I read a dataset and construct this array for each tuple in the dataset. All I know is the number of columns is going to be the same but not the number of rows. For example, the first time I might need an array of size (3,6), for the next tuple as (1,6) and the next time as (4,6) and so on. The way I populate the array is as follows:
arrname[:,0] = lstname1
arrname[:,1] = lstname2
...
In other words, the columns are filled from lists constructed from the tuples. So, before the next loop begins I want to clear its elements and make it ready for the consecutive loop since I don't want remnants from the previous loop mixing the current contents.

I'm not sure what you mean by clear, the array will always have some values stored in it, but you can set those values to something, for example:
>>> A = numpy.array([[1, 2], [3, 4], [5, 6]], dtype=numpy.float)
>>> A
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
>>> A.fill(0)
>>> A
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> A[:] = 1.
>>> A
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
Update
First, your question is very unclear. The more effort you put into writing a good question the better answers you'll get. A good question should make it clear to us what you're trying to do and why. Also example data is very helpful, just a small amount, so we can see exactly what you're trying to do.
That being said. It seems like maybe you should just create a new array for each iteration. Creating arrays is pretty fast and it's not clear why you would want to reuse an array when the size and contents need to change. If you're trying to reuse it for performance reasons, you're probably not going to see any measurable difference, resizing arrays is not noticeably faster than creating a new array. You can create a new array by calling numpy.zeros((X, 6))
Also in your question you say:
the columns are filled from lists constructed from the tuples
If your data is already housed as a list of tuples you use numpy.array to convert it to an array. You don't need to go the the trouble of creating an array and filling it. For example if I wanted to get a (2, 3) array from a list of tuples I would do:
data = [(0, 0, 1), (0, 0, 2)]
A = numpy.array(data)
# or if the data is stored like this
data = [(0, 0), (0, 0), (1, 2)]
A = numpy.array(data).T
Hope that helps.

With a wag of the finger for possible premature optimization, I will offer some thoughts:
You say you don't want any remnants left over from previous iterations. From your code it looks like you populate each of the new elements column by column for each of the known number of columns. "Left over" values don't look like a problem. consider:
using arange and reshape serves no purpose. use np.empty((n,6)). Faster than ones or zeros by a hair.
you could alternatively construct your new array from the constituents
See:
lstname1 = np.arange(3)
lstname2 = 22*np.arange(3)
np.vstack((lstname1,lstname2)).T
# returns
array([[ 0, 0],
[ 1, 22],
[ 2, 44]])
#or
np.hstack((lstname1[:,np.newaxis],lstname2[:,np.newaxis]))
array([[ 0, 0],
[ 1, 22],
[ 2, 44]])
Lastly, If you are really really concerned about speed, you could allocate the largest expected size (if not known the you could check the requested size vs the last largest and if it is larger then use np.empty((rows,cols)) to increase the size.
Then at each iteration, your create a view of the larger matrix of just the number of rows you want. This will cause numpy to reuse the same buffer space and not need to to any allocation at each of your iterations. Notice:
In [36]: big = np.vstack((lstname1,lstname2)).T
In [37]: smaller = big[:2]
In [38]: smaller[:,1]=33
In [39]: smaller
Out[39]:
array([[ 0, 33],
[ 1, 33]])
In [40]: big
Out[40]:
array([[ 0, 33],
[ 1, 33],
[ 2, 44]])
Note These are suggestions that fit your expanded question with clarification and does not fit your earlier question about "clearing" the array. Even in the latter example you could easily say smaller.fill(0) to allay concerns depending on whether you reliably reassign all elements of the array in your iterations.

If you want to keep the array allocated, and with the same size, you don't need to clear the elements. Simply keep track of where you are, and overwrite the values in the array. This is the most efficient way of doing it.

I would simply begin putting the new values into the array.
But if you insist on clearing out the array, try making a new one of the same size using zeros or empty.
>>> A = numpy.array([[1, 2], [3, 4], [5, 6]])
>>> A
array([[1, 2],
[3, 4],
[5, 6]])
>>> A = numpy.zeros(A.shape)
>>> A
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Struggling with numpy libs where() - python

Related

Calculate the mean over a mixed data structure

How to slice a np.array using a list of unknown length?

More pythonian way for getting a mean value of an array

Efficient way to drop a column from a Numpy array?

clearing elements of numpy array

Categories

Resources