Generate pairs from a numpy array in a specific pattern without loops - python

Let's consider an array [1,2,3] say, what I would like to generate is the list containing the pairs [[1,2], [1,3], [2,3]]. This can be done using itertools. However, I would like to produce them using pure numpy operations, and no loops or branching is allowed.
A close solution is provided here, but it generates all the possible pairs, instead of a particular fashion as in my case.
Can you please suggest a way to do that? The array will always be 1D.
Also, this is my first question on SE. If it requires any edit, please let me know.

Here is a one-liner using np.triu_indices:
>>> a = np.array([1, 2, 3])
>>> a[np.transpose(np.triu_indices(len(a), 1))]
array([[1, 2],
[1, 3],
[2, 3]])

Related

Quick way to iterate through two arrays python

I've had multiple scenarios, where i had to find a huge array's items in another huge array.
I usually solved it like this:
for i in range(0,len(arr1)):
for k in range(0,len(arr1)):
print(arr1[i],arr2[k])
Which works fine, but its kinda slow.
Can someone help me, how to make the iteration faster?
arr1 = [1,2,3,4,5]
arr2 = [4,5,6,7]
same_items = set(arr1).intersection(arr2)
print(same_items)
Out[5]: {4,5}
Sets hashes items so instead of O(n) look up time for any element it has O(1). Items inside need to be hashable for this to work. And if they are not, I highly suggest you find a way to make them hashable.
If you need to handle huge arrays, you may want to use Python's numpy library, which assists you with high-efficiency manipulation methods and avoids you to use loops at all in most cases.
if your array has duplicates and you want to keep them all:
arr1 = [1,2,3,4,5,7,5,4]
arr2 = [4,5,6,7]
res = [i for i in arr1 if i in arr2]
>>> res
'''
[4, 5, 7, 5, 4]
or using numpy:
import numpy as np
res = np.array(arr1)[np.isin(arr1, arr2)].tolist()
>>> res
'''
[4, 5, 7, 5, 4]

Nested array computations in Python using numpy

I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)

Find the number of occurrences of a sequence

I'm looking for an efficient way (maybe numpy?) to count the number of occurrences of a sequence of numbers in a 2D array.
e.g
count_seq_occ([2,3],
array([[ 2, 3 , 5, 2, 3],
[ 5, 2, 3],
[ 1]]))
Will output the result 3.
The Three way nested loop option is clear, but maybe a better approach exists?Thanks
EDITED
KMP search
Try using this code and editing it to search in every vector of the matrix:
http://code.activestate.com/recipes/117214/
It's a KMP (Knuth-Morris-Pratt) python function for finding a pattern in a text or list. You can slightly optimize it by creating the search pattern's shifts array once, then running the rest of the algorithm on every 1D sub-array.
Alternative
How about converting the array into a string representation and counting occurrences in the string?
repr(your_array).count("2, 3")
Notice: you should really format the representation or counted substring to both match the same style. For example, sometimes a repr() of a numpy array would return something like this inside: "1., 2., 3., " and you might want to fix this somehow.
Alternatively you can flatten the array and join all rows into a string, but be careful and add an extra unique character after every row.
The method could vary a bit regarding how you turn it into a string, but it should be fast enough. Searching for substrings in a string is O(n) time so you shouldn't worry about that. The only possible reason not to use this method would be if you don't want to allocate the temporary string object if the array is very large.
This is one way, but I hope there is a better solution. It would be helpful if you showed us your nested loop and provided some data for benchmarking.
from itertools import chain
x = [2, 3]
A = np.array([[ 2, 3, 5, 2, 3],
[ 5, 2, 3],
[ 1]])
arr = list(chain.from_iterable(A))
res = sum(arr[i:i+len(x)] == x for i in range(len(arr))) # 3

Dynamic matrix in Python

I'm new to Python and I need a dynamic matrix that I can manipulate adding more columns and rows to it. I read about numpy.matrix, but I can't find a method in there that does what I mentioned above. It occurred to me to use lists but I want to know if there is a simpler way to do it or a better implementation.
Example of what I look for:
matrix.addrow ()
matrix.addcolumn ()
matrix.changeValue (0, 0, "$200")
Am I asking for too much? If so, any ideas of how to implement something like that? Thanks!
You can do all of that in numpy (np.concatenate for example) or native python (my_list.append()). Which one is more efficient will depend on what else your program will do: numpy will be probably less efficient if all you are doing is adding / changing values one at a time, or do a lot of column 'adding' or 'removing'. However if you do matrix or column operations, the overhead of adding new columns to a numpy array maybe offset by the vectorized computation speed offered by numpy. So pick which ever you prefer, and if speed is an issue, then you need to experiment yourself with both approaches...
There are several ways to represent matrices in Python. You can use List of lists or numpy arrays. For example if you were to use numpy arrays
>>> import numpy as np
>>> a = np.array([[1,2,3], [2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
To add a row
>>> np.vstack([a, [7,8,9]])
array([[1, 2, 3],
[2, 3, 4],
[7, 8, 9]])
To add a column
>>> np.hstack((a, [[7],[8]]))
array([[1, 2, 3, 7],
[2, 3, 4, 8]])

How to simplify a for loop in python

I have this piece of Python code that fills up a 2d matrix in a for loop
img=zeros((len(bins_x),len(bins_y)))
for i in arange(0,len(ix)):
img[ix[i]][iy[i]]=dummy[i]
Is it possible to use a vectorial operation for the last two lines of code? Is there also something that might speed up the calculation?
If ix, iy are index sequences:
img[ix, iy] = dummy
It might be useful to use numpy. In particular, the reshape method might be useful. Here is an example (adapted from the second link):
>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6])
>>> np.reshape(a, (3,2))
array([[1, 2],
[3, 4],
[5, 6]])

Categories

Resources