Replace elements in a matrix (Need help to make this faster) - python

C is composed of elements of the array B and i want to change each element which would correspond to A
#Program Block starts
import numpy as np
A= np.array([1, 2, 3, 4, 5])
B= np.array([1, 5, 3, 9, 15] )
# I have a 3*3 matrix
C = [[0 for x in range(3)] for x in range(3)]
C[0][:]=[1,5,3]
C[1][:]=[7,9,15]
C[2][:]=[2,9,15]
flag=(A==B).astype(int) # comparing for equality of 2 arrays A and B, and storing as binary
C_new=np.copy(C)
flag_ind=[i for i, e in enumerate(flag) if e==0] # storing the indices of non differing elements
for x in flag_ind:
C_new[C_new==B[x]]=A[x]
The output will be C_new=[1, 2, 3; 7 4 5; 2 4 5]
The actual sizes of A and B are ~ 600000 , size of C is 4000000*4.. time for simulation is taking ~ 14 hrs.. If there is a way to do the same operation with greater speed ..kindly let me know

This way you are iterating over the entire C array as many times as the intersection between A and B.
What I suggest is creating a dictionary that maps the B values to the A values so that you can retrieve every equivalent element in approximately constant time.
This is what I did, it took 9s to run with arrays of the same size as you specified.
A_dict = dict((k, v) for k, v in zip(B, A) if k != v)
map_c = np.vectorize(lambda x: A_dict.get(x, x))
C_new = map_c(C)
First I created the dictionary to map every value from B that has a different equivalent on A, then I created the function that will use this dictionary on the C array.

Related

Fill an array using the values of another array as the indices. If an index is repeated, prioritize according to a parallel array

Description
I have an array a with N integer elements that range from 0 to M-1. I have another array b with N positive numbers.
Then, I want to create an array c with M elements. The i-th element of c should the index of a that has a value of i.
If more than one of these indices existed, then we take the one with a higher value in b.
If none existed, the i-th element of c should be -1.
Example
N = 5, M = 3
a = [2, 1, 1, 2, 2]
b = [1, 3, 5, 7, 3]
Then, c should be...
c = [-1, 2, 3]
My Solution 1
A possible approach would be to initialize an array d that stores the current max and then loop through a and b updating the maximums.
c = -np.ones(M)
d = np.zeros(M)
for i, (idx, val) in enumerate(zip(a, b)):
if d[idx] <= val:
c[idx] = i
d[idx] = val
This solution is O(N) in time but requires iterating the array with Python, making it slow.
My Solution 2
Another solution would be to sort a using b as the key. Then, we can just assign a indices to c (max elements will be last).
sort_idx = np.argsort(b)
a_idx = np.arange(len(a))
a = a[sort_idx]
a_idx = a_idx[sort_idx]
c = -np.ones(M)
c[a] = a_idx
This solution does not require Python loops but requires sorting b, making it O(N*log(N)).
Ideal Solution
Is there a solution to this problem in linear time without having to loop the array in Python?
AFAIK, this cannot be implemented in O(n) currently with Numpy (mainly because the index table is both read and written regarding the value of another array). Note that np.argsort(b) can theoretically be implemented in O(n) using a radix sort, but such sort is not implemented yet in Numpy (it would not be much faster in practice due to the bad cache locality of the algorithm on big arrays).
One solution is to use Numba to speed up your algorithmically-efficient solution. Numba uses a JIT compiler to speed up loops. Here is an example (working with np.int32 types):
import numpy as np
import numba as nb
#nb.njit('int32[:](int32[:], int32[:])')
def compute(a, b):
c = np.full(M, -1, dtype=np.int32)
d = np.zeros(M, dtype=np.int32)
for i, (idx, val) in enumerate(zip(a, b)):
if d[idx] <= val:
c[idx] = i
d[idx] = val
return c
a = np.array([2, 1, 1, 2, 2], dtype=np.int32)
b = np.array([1, 3, 5, 7, 3], dtype=np.int32)
c = compute(a, b)

compute matrix from list and two numbers that powers the elements

I'm trying to define a function. The function should compute a matrix from inserting a list of numbers and two additional numbers, which should be the range of what each element in the list is going to be powered to, in the command line.
For example if I insert powers([2,3,4],0,2) in the command line, the output should be a 3x3 matrix with the first row [2^0,2^1,2^2], the second [3^0,3^1,3^2] and third row [3^0,3^1,3^2].
It should look something like:
input: powers([2,3,4],0,2)
output: [[1, 2, 4],[1,3,9],[1,4,16]]
Does anyone know how to do something like that by not importing any additional package to python?
So far I have
def powers(C,a,b):
for c in C:
matrix=[]
for i in range(a,b):
c = c**i
matrix.append(c)
print(matrix)
But that only gives me one row of ones.
In your outer loop, you're emptying the matrix in each iteration. In your inner loop you're appending the powers directly to the matrix, when you should instead create a sub-list and append the numbers to it, then, append the sub-list to the matrix. All you need for this is a simple list comprehension:
def powers(C, a, b):
matrix = [[c ** i for i in range(a, b + 1)] for c in C]
return matrix
Test:
>>> powers([2, 3, 4], 0, 2)
[[1, 2, 4], [1, 3, 9], [1, 4, 16]]
The range is range(a, b + 1) because Python's range stops one step before the end (it doesn't include the end), so to include b use b + 1.

User defined tie breaker for argsort() in numpy

I have two arrays v and c (can read as value and cost).
I need to perform argsort() on v such that if 2 elements in v are the same, then they need to be sorted according to their corresponding elements in c.
Example
v = [4,1,4,4] # Here 0th, 2nd and 3rd elemnt are equal
c = [5,0,30,10]
numpy.argsort(v) = [1,0,2,3] # equal values sorted by index
Required output
[1,0,3,2] # c[0] < c[3] < c[2]
How to achieve this in Python?
The function argsort receives an order parameter, from the docs:
When a is an array with fields defined, this argument specifies which
fields to compare first, second, etc.
So you could create a structured array from the two values, and the pass the fields in order:
import numpy as np
v = [4, 1, 4, 4]
c = [5, 0, 30, 10]
s = np.array(list(zip(v, c)), dtype=[('value', 'i4'), ('cost', 'i4')])
result = np.argsort(s, order=['value', 'cost'])
print(result)
Output
[1 0 3 2]

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

Find whether a numpy array is a subset of a larger array in Python

I have 2 arrays, for the sake of simplicity let's say the original one is a random set of numbers:
import numpy as np
a=np.random.rand(N)
Then I sample and shuffle a subset from this array:
b=np.array() <------size<N
The shuffling I do do not store the index values, so b is an unordered subset of a
Is there an easy way to get the original indexes of b, so they are in the same order as a, say, if element 2 of b has the index 4 in a, create an array of its assignation.
I could use a for cycle checking element by element, but perhaps there is a more pythonic way
Thanks
I think the most computationally efficient thing to do is to keep track of the indices that associate b with a as b is created.
For example, instead of sampling a, sample the indices of a:
indices = random.sample(range(len(a)), k) # k < N
b = a[indices]
On the off chance a happens to be sorted you could do:
>>> from numpy import array
>>> a = array([1, 3, 4, 10, 11])
>>> b = array([11, 1, 4])
>>> a.searchsorted(b)
array([4, 0, 2])
If a is not sorted you're probably best off going with something like #unutbu's answer.

Categories

Resources