How to implement fast numpy array computation with multiple occuring slice indices? - python

I was recently wondering how I could by-pass the following numpy behavior.
Starting with an simple example:
import numpy as np
a = np.array([[1,2,3,4,5,6,7,8,9,0], [11, 12, 13, 14, 15, 16, 17, 18, 19, 10]])
then:
b = a.copy()
b[:, [0,1,4,8]] = b[:, [0,1,4,8]] + 50
print(b)
...results in printing:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
but also taking one index double into the slice then:
c = a.copy()
c[:, [0,1,4,4,8]] = c[:, [0,1,4,4,8]] + 50
print(c)
giving:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
(in short; they do the same thing)
Could I also have that for index 4 it is executed 2 times?
Or more practically; Let the slice element i be given r times: Can we let the above expression be applied r times, instead of numpy just taking it once into account? Also if we replace "50" by something that differs for every occurance of i?
For my current code, I used:
w[p1] = w[p1] + D[pix]
where I define "pix", "p1" as some numpy arrays with dtype int, same length and some integers may appear multiple times.
(So one may have pix = [..., 1,1,1,2,2,3,...] at the same time as p1 = [..., 21,32,13,23,11,78,...], however, thus resulting on its own into taking for index 1 only the first 1 and the corresponding 21 and scraping the rest of the ones.)
Of course using a for loop would solve the problem easily. The point is that both the integers and the sizes of the arrays are huge, so it would cost a lot of computational resources to use for-loops instead of efficient numpy-array routines. Any ideas, links to existing documentation etc.?

Related

Assign values from small matrix to specified places in larger matrix

I would like to know if there exists a similar way of doing this (Mathematica) in Python:
Mathematica
I have tried it in Python and it does not work. I have also tried it with numpy.put() or with simple 2 for loops. This 2 ways work properly but I find them very time consuming with larger matrices (3000×3000 elements for example).
Described problem in Python,
import numpy as np
a = np.arange(0, 25, 1).reshape(5, 5)
b = np.arange(100, 500, 100).reshape(2, 2)
p = np.array([0, 3])
a[p][:, p] = b
which outputs non-changed matrix a: Python
Perhaps you are looking for this:
a[p[...,None], p] = b
Array a after the above assignment looks like this:
[[100 1 2 200 4]
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[300 16 17 400 19]
[ 20 21 22 23 24]]
As documented in Integer Array Indexing, the two integer index arrays will be broadcasted together, and iterated together, which effectively indexes the locations a[0,0], a[0,3], a[3,0], and a[3,3]. The assignment statement would then perform an element-wise assignment at these locations of a, using the respective element-values from RHS.

Input file to multiple matrices

I know there are related threads, but I've searched and none of them helped me with my problem.
I have a input text file with square matrices that looks as follows:
1 2 3
4 5 6
7 8 9
*
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 31 32 33 34
Now there could be more matrices following, all with a * in between them.
I want to separate the matrices (without using numpy!) and then using list within lists to work with them (list comprehension might be useful I guess), all the entries would be integers. The first matrix would then look as follows [[1,2,3],[4,5,6],[7,8,9]] and the second one would be [[10,11,12,13,14],[15,16,17,18,19],[20,21,22,23,24],[25,26,27,28,29],[31,31,32,33,34]] and so on.
My plan was to separate the matrices in their own list entries and then (using the fact that the matrices are square, and therefore it can easily be determined how many integers one list entry should contain) using some list comprehension to change the strings into integers. But I'm stuck at the very beginning.
matrix = open('input.txt','r').read()
matrix = matrix.replace('\n',' ')
list = matrix.split('* ')
Now if I print list I get ['1 2 3 4 5 6 7 8 9', '10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34']
The problem is that I'm now stuck with two strings instead of a list of integers.
Any suggestions?
mat_list = [[[int(num_str) for num_str in line.split()] for line in inner_mat.split('\n')] for inner_mat in open('input_mat.txt','r').read().split('\n*\n')]
[[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29], [30, 31, 32, 33, 34]]]
Now that you have strings, split each string on space and convert each little string to an integer. After that, you can convert each list of strings to a square "matrix" (list of lists).
An additional point: do not use list as a variable name. That conflicts with the built-in list type. In this code I used the name mylist.
newlist = [[int(nstr) for nstr in v.split()] for v in mylist]
# Now convert each list of strings to a square "matrix" (list of lists)
However, note that the approach given by #Selcuk in his comment is a better way to solve your overall problem. Though it would be a little easier to read the entire file into memory and split on the stars, then split each line into its integers. This would result in easier code but larger memory usage.

How to index a ndarray with another ndarray?

I am doing some machine learning stuff in python/numpy in which I want to index a 2-dimensional ndarray with a 1-D ndarray, so that I get a 1-D array with the indexed values.
I got it to work with some ugly piece of code and I would like to know if there is a better way, because this just seems unnatural for such a nice language and module combination as python+numpy.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print(a)
print(b)
print(a[b, np.arange(0, a.shape[1]).reshape(1,a.shape[1])])
#Prints:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]
[40 41 42 43 44]
[45 46 47 48 49]]
[9 7 5 3 1]
[[45 36 27 18 9]]
This is exactly what I want(even though technically a 2-D ndarray), but this seems very complicated. Is there a neater and tidier way?
Edit:
To clarify, I actually I do not want a 1-D array, that was very poorly explained. I actually do want one dimension with length 1, because that is what I need for processing it later, but this would be easily achieved with a reshape() statement. Sorry for the confusion, I just mixed my actual code with the more general question.
You want a 1D array, yet you included a reshape call whose only purpose is to take the array from the format you want to a format you don't want.
Stop reshaping the arange output. Also, you don't need to specify the 0 start value explicitly:
result = a[b, np.arange(a.shape[1])]
You can just use np.diagonal to get what you want. No need of reshape or indexing. The tricky part here was to identify the pattern which you want to obtain which is basically the diagonal elements of a[b] matrix.
a = np.arange(50).reshape(10, 5) # Array to be indexed
b = np.arange(9, -1, -2) # Indexing array
print (np.diagonal(a[b]))
# [45 36 27 18 9]
As #user2357112 mentioned in the comments, the return of np.diagonal is read only. In my opinion, it would be a problem if you plan to append/modify the values to this final desired list. If you just want to print them or use them for some further indexing, it should be fine.
As per the docs
Starting in NumPy 1.9 it returns a read-only view on the original array. Attempting to write to the resulting array will produce an error.
In some future release, it will return a read/write view and writing to the returned array will alter your original array. The returned array will have the same type as the input array.
If you don’t write to the array returned by this function, then you can just ignore all of the above.

Why does numpy.concatenate work along axis=1 for small one dimensional arrays, but not larger ones?

I couldn't get my 4 arrays of year, day of year, hour, and minute to concatenate the way I wanted, so I decided to test several variations on shorter arrays than my data.
I found that it worked using method "t" from my test code:
import numpy as np
a=np.array([[1, 2, 3, 4, 5, 6]])
b=np.array([[11, 12, 13, 14, 15, 16]])
c=np.array([[21, 22, 23, 24, 25, 26]])
d=np.array([[31, 32, 33, 34, 35, 36]])
print a
print b
print c
print d
q=np.concatenate((a, b, c, d), axis=0)
#concatenation along 1st axis
print q
t=np.concatenate((a.T, b.T, c.T, d.T), axis=1)
#transpose each array before concatenation along 2nd axis
print t
x=np.concatenate((a, b, c, d), axis=1)
#concatenation along 2nd axis
print x
But when I tried this with the larger arrays it behaved the same as method "q".
I found an alternative approach of using vstack over here that did what I wanted, but I am trying to figure out why concatenation sometimes works for this, but not always.
Thanks for any insights.
Also, here are the outputs of the code:
q:
[[ 1 2 3 4 5 6]
[11 12 13 14 15 16]
[21 22 23 24 25 26]
[31 32 33 34 35 36]]
t:
[[ 1 11 21 31]
[ 2 12 22 32]
[ 3 13 23 33]
[ 4 14 24 34]
[ 5 15 25 35]
[ 6 16 26 36]]
x:
[[ 1 2 3 4 5 6 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36]]
EDIT: I added method t to the end of a section of the code that was already fixed with vstack, so you can compare how vstack will work with this data but not concatenate. Again, to clarify, I found a workaround already, but I don't know why the concatenate method doesn't seem to be consistent.
Here is the code:
import numpy as np
BAO10m=np.genfromtxt('BAO_010_2015176.dat', delimiter=",", usecols=range(0-6), dtype=[('h', int), ('year', int), ('day', int), ('time', int), ('temp', float)])
#10 meter weather readings at BAO tower site for June 25, 2015
hourBAO=BAO10m['time']/100
minuteBAO=BAO10m['time']%100
#print hourBAO
#print minuteBAO
#time arrays
dayBAO=BAO10m['day']
yearBAO=BAO10m['year']
#date arrays
datetimeBAO=np.vstack((yearBAO, dayBAO, hourBAO, minuteBAO))
#t=np.concatenate((a.T, b.T, c.T, d.T), axis=1) <this gave desired results in simple tests
#not working for this data, use vstack instead, with transposition after stack
print datetimeBAO
test=np.transpose(datetimeBAO)
#rotate array
print test
#this prints something that can be used for datetime
t=np.concatenate((yearBAO.T, dayBAO.T, hourBAO.T, minuteBAO.T), axis=1)
print t
#this prints a 1D array of all the year values, then all the day values, etc...
#but this method worked for shorter 1D arrays
The file I used can be found at this site.

Mapping functions of 2D numpy arrays

I have a function foo that takes a NxM numpy array as an argument and returns a scalar value. I have a AxNxM numpy array data, over which I'd like to map foo to give me a resultant numpy array of length A.
Curently, I'm doing this:
result = numpy.array([foo(x) for x in data])
It works, but it seems like I'm not taking advantage of the numpy magic (and speed). Is there a better way?
I've looked at numpy.vectorize, and numpy.apply_along_axis, but neither works for a function of 2D arrays.
EDIT: I'm doing boosted regression on 24x24 image patches, so my AxNxM is something like 1000x24x24. What I called foo above applies a Haar-like feature to a patch (so, not terribly computationally intensive).
If NxM is big (say, 100), they the cost of iterating over A will be amortized into basically nothing.
Say the array is 1000 X 100 X 100.
Iterating is O(1000), but the cumulative cost of the inside function is O(1000 X 100 X 100) - 10,000 times slower. (Note, my terminology is a bit wonky, but I do know what I'm talking about)
I'm not sure, but you could try this:
result = numpy.empty(data.shape[0])
for i in range(len(data)):
result[i] = foo(data[i])
You would save a big of memory allocation on building the list ... but the loop overhead would be greater.
Or you could write a parallel version of the loop, and split it across multiple processes. That could be a lot faster, depending on how intensive foo is (as it would have to offset the data handling).
You can achieve that by reshaping your 3D array as a 2D array with the same leading dimension, and wrap your function foo with a function that works on 1D arrays by reshaping them as required by foo. An example (using trace instead of foo):
from numpy import *
def apply2d_along_first(func2d, arr3d):
a, n, m = arr3d.shape
def func1d(arr1d):
return func2d(arr1d.reshape((n,m)))
arr2d = arr3d.reshape((a,n*m))
return apply_along_axis(func1d, -1, arr2d)
A, N, M = 3, 4, 5
data = arange(A*N*M).reshape((A,N,M))
print data
print apply2d_along_first(trace, data)
Output:
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
[[20 21 22 23 24]
[25 26 27 28 29]
[30 31 32 33 34]
[35 36 37 38 39]]
[[40 41 42 43 44]
[45 46 47 48 49]
[50 51 52 53 54]
[55 56 57 58 59]]]
[ 36 116 196]

Categories

Resources