"shape mismatch" error using numpy in python - python

I am trying to generate a random array of 0s and 1s, and I am getting the error: shape mismatch: objects cannot be broadcast to a single shape. The error seems to be occurring in the line randints = np.random.binomial(1,p,(n,n)). Here is the function:
import numpy as np
def rand(n,p):
'''Generates a random array subject to parameters p and N.'''
# Create an array using a random binomial with one trial and specified
# parameters.
randints = np.random.binomial(1,p,(n,n))
# Use nested while loops to go through each element of the array
# and assign True to 1 and False to 0.
i = 0
j = 0
rand = np.empty(shape = (n,n),dtype = bool)
while i < n:
while j < n:
if randints[i][j] == 0:
rand[i][j] = False
if randints[i][j] == 1:
rand[i][j] = True
j = j+1
i = i +1
j = 0
# Return the new array.
return rand
print rand
When I run it by itself, it returns <function rand at 0x1d00170>. What does this mean? How should I convert it to an array that can be worked with in other functions?

You needn't go through all of that,
randints = np.random.binomial(1,p,(n,n))
produces your array of 0 and 1 values,
rand_true_false = randints == 1
will produce another array, just with the 1s replaced with True and 0s with False.

Obviously, the answer by #danodonovan is the most Pythonic, but if you really want something more similar to your looping code. Here is an example that fixes the name conflicts and loops more simply.
import numpy as np
def my_rand(n,p):
'''Generates a random array subject to parameters p and N.'''
# Create an array using a random binomial with one trial and specified
# parameters.
randInts = np.random.binomial(1,p,(n,n))
# Use nested while loops to go through each element of the array
# and assign True to 1 and False to 0.
randBool = np.empty(shape = (n,n),dtype = bool)
for i in range(n):
for j in range(n):
if randInts[i][j] == 0:
randBool[i][j] = False
else:
randBool[i][j] = True
return randBool
newRand = my_rand(5,0.3)
print(newRand)

Related

Get a set of maximum number of dissimilar arrays

I have an array A of length n, each element of this array(say Wi) is an array is an array of length 10. There is a function, match_check(Wi, Wj) defined as :
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if (round(Wi[i],4)== round(Wj[i]),4):
num_matches +=1
if (num_matches >= 3):
return True
else :
False
I want to get set of maximum number of elements from this array A, such that for no two elements in this set match_check is True. I have thought of this as a DP problem and written the following solution.
def maximum_arrays(start,end ,curr_items=[], match_dict={}, lookup_dict={}):
key = str(start) + "|" + str(end)
if (lookup_dict.get(key)):
return lookup_dict[key]
if (start == end ):
for items in curr_items:
match_key = str(start)+ ":" + str(items)
if(match_dict[match_key]):
lookup_dict[key] = len(curr_items)
return lookup_dict[key]
lookup_dict[key] = 1 + len(curr_items)
return lookup_dict[key]
match_flag = False
for items in curr_items:
match_key = str(start)+":" + str(items)
if (match_dict.get(match_key)):
match_flag = True
break
if (match_flag):
lookup_dict[key] = maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict)
else:
curr_items_new = curr_items + [start]
lookup_dict[key] = max(1 + maximum_arrays(start+1,end, curr_items_new,match_dict, lookup_dict),
maximum_arrays(start+1,end, curr_items,match_dict, lookup_dict))
return lookup_dict[key]
Where match_dict is contains the result of match_check for all possible pairs of indexes from the array A. But I doubt that dynamic programming would help here and the solution would be O(2^n), since we have to evaluate for all possible cases(keeping and dropping each element in the set).
A simple algorithm which takes O(n^2) would be to first build an adjacency matrix for these arrays by simply applying match_check to every couple of arrays. An edge will be added iff the function match_check returned False.
Then, the problem reduces to finding the maximum clique within the graph and returning its size, a thing which can be done in O(n^2).
Here is a simple demo:
import networkx as nx
import numpy as np
def match_check(Wi, Wj):
n = len(Wi)
num_matches =0
for i in range(n):
if round(Wi[i],4) == round(Wj[i],4):
num_matches +=1
if (num_matches >= 3):
return True
else :
return False
check_arr = [list(10*np.random.rand(5)) for k in range(10)]
n = len(check_arr)
graph_adjacency_mat = np.zeros((n,n))
for i in range(n):
for j in range(n):
if i==j:
continue
graph_adjacency_mat[i][j] = not match_check(check_arr[i],check_arr[j])
graph_adjacency_mat[j][i] = graph_adjacency_mat[i][j]
G=nx.from_numpy_matrix(graph_adjacency_mat)
print(max([len(clique) for clique in nx.find_cliques(G)]))
Note that here I've used the find_cliques function from NetworkX which is NOT O(n^2) (but O(3^(n/3))) because the function max_clique of NetworkX seems to be discarded. You can easily implement max_clique by applying BFS/DFS on the graph starting from every vertex and saving the maximum clique found thus far.

How to find steps in a vector (1d array, list) in Python?

I want to get border of data in a list using python
For example I have this list :
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
I want a code that return data borders. for example:
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
^ ^ ^ ^
b = get_border_index(a)
print(b)
output:
[0,4,7,12]
How can I implement get_border_index(lst: list) -> list function?
The scalable answer that also works for very long lists or arrays is to use np.diff. In that case you should avoid a for loop at all costs.
import numpy as np
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
a = np.array(a)
# this is unequal 0 if there is a step
d = np.diff(a)
# boolean array where the steps are
is_step = d != 0
# get the indices of the steps (first one is trivial).
ics = np.where(is_step)
# get the first dimension and shift by one as you want
# the index of the element right of the step
ics_shift = ics[0] + 1
# and if you need a list
ics_list = ics_shift.tolist()
print(ics_list)
You can use for loop with enumerate
def get_border_index(a):
last_value = None
result = []
for i, v in enumerate(a):
if v != last_value:
last_value = v
result.append(i)
return result
a = [1,1,1,1,4,4,4,6,6,6,6,6,1,1,1]
b = get_border_index(a)
print(b)
Output
[0, 4, 7, 12]
This code will check if an element in the a list is different then the element before and if so it will append the index of the element to the result list.

Why does this code take so long to be executed? - Python

I coded this on Python. It took way too long to finish if the input were just 40x40 (it's for image processing using numpy). Its behavious is the following: there's an array with objects in it (which have an 'image' attribute which is a numpy array) and then I check if that object is somewhere in another array, and then take the next from the first array and repeat the process until I've checked if all are in the other array:
#__sub_images is the array containing the objects to be compared
#to_compare_image is the image that is received as parameter to check if the objects are in there.
#get_sub_images() is just to retrieve the objects from the array from the received array to find.
#get_image() is the method that retrieves the attribute from the objects
same = True
rows_pixels = 40 #problem size
cols_pixels = 40 #problem size
i = 0 #row index to move the array containing the object that must be checked if exist
j = 0 #col index to move the array containing the object that must be checked if exist
k = 0 #row index to move the array where will be checked if the object exist
l = 0 #col index to move the array where will be checked if the object exist
while i < len(self.__sub_images) and k < len(to_compare_image.get_sub_images()) and l < len(to_compare_image.get_sub_images()[0]):
if not np.array_equal(self.__sub_images[i][j].get_image(), to_compare_image.get_sub_images()[k][l].get_image()):
same = False
else:
same = True
k = 0
l = 0
if j == len(self.__sub_images[0]) - 1:
j = 0
i += 1
else:
j += 1
if not same:
if l == len(to_compare_image.get_sub_images()[0]) - 1:
l = 0
k += 1
else:
l += 1
I managed to code it with just a while, instead of 4 for-loops which is what I used to do before. Why is it taking so long still? Is it normal or is there something wrong? The complexity is supposed to be x and not x⁴
The code that is not included are just getters, I hope you can understand it with the #notes at the begining.
THanks.
Instead of this:
if not np.array_equal(self.__sub_images[i][j].get_image(), to_compare_image.get_sub_images()[k][l].get_image()):
same = False
else:
same = True
#snip
if not same:
#snip
You can do this:
same=np.array_equal(self.__sub_images[i][j].get_image(), to_compare_image.get_sub_images()[k][l].get_image())
if same:
#snip
else:
#snip
This uses less if-branches than before.

Does NumPy have a function equivalent to Matlab's buffer?

I see there is an array_split and split methods but these are not very handy when you have to split an array of length which is not integer multiple of the chunk size. Moreover, these method’s input is the number of slices rather than the slice size. I need something more like Matlab's buffer method which is more suitable for signal processing.
For example, if I want to buffer a signals to chunks of size 60 I need to do: np.vstack(np.hsplit(x.iloc[0:((len(x)//60)*60)], len(x)//60)) which is cumbersome.
I wrote the following routine to handle the use cases I needed, but I have not implemented/tested for "underlap".
Please feel free to make suggestions for improvement.
def buffer(X, n, p=0, opt=None):
'''Mimic MATLAB routine to generate buffer array
MATLAB docs here: https://se.mathworks.com/help/signal/ref/buffer.html
Parameters
----------
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
opt: str
Initial condition options. default sets the first `p` values to zero,
while 'nodelay' begins filling the buffer immediately.
Returns
-------
result : (n,n) ndarray
Buffer array created from X
'''
import numpy as np
if opt not in [None, 'nodelay']:
raise ValueError('{} not implemented'.format(opt))
i = 0
first_iter = True
while i < len(X):
if first_iter:
if opt == 'nodelay':
# No zeros at array start
result = X[:n]
i = n
else:
# Start with `p` zeros
result = np.hstack([np.zeros(p), X[:n-p]])
i = n-p
# Make 2D array and pivot
result = np.expand_dims(result, axis=0).T
first_iter = False
continue
# Create next column, add `p` results from last col if given
col = X[i:i+(n-p)]
if p != 0:
col = np.hstack([result[:,-1][-p:], col])
i += n-p
# Append zeros if last row and not length `n`
if len(col) < n:
col = np.hstack([col, np.zeros(n-len(col))])
# Combine result with next row
result = np.hstack([result, np.expand_dims(col, axis=0).T])
return result
def buffer(X = np.array([]), n = 1, p = 0):
#buffers data vector X into length n column vectors with overlap p
#excess data at the end of X is discarded
n = int(n) #length of each data vector
p = int(p) #overlap of data vectors, 0 <= p < n-1
L = len(X) #length of data to be buffered
m = int(np.floor((L-n)/(n-p)) + 1) #number of sample vectors (no padding)
data = np.zeros([n,m]) #initialize data matrix
for startIndex,column in zip(range(0,L-n,n-p),range(0,m)):
data[:,column] = X[startIndex:startIndex + n] #fill in by column
return data
This Keras function may be considered as a Python equivalent of MATLAB Buffer().
See the Sample Code :
import numpy as np
S = np.arange(1,99) #A Demo Array
See Output Here
import tensorflow.keras.preprocessing as kp
list(kp.timeseries_dataset_from_array(S, targets = None,sequence_length=7,sequence_stride=7,batch_size=5))
See the Buffered Array Output Here
Reference : See This
Same as the other answer, but faster.
def buffer(X, n, p=0):
'''
Parameters
----------
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
Returns
-------
result : (n,m) ndarray
Buffer array created from X
'''
import numpy as np
d = n - p
m = len(X)//d
if m * d != len(X):
m = m + 1
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xne = np.concatenate((Xn,np.zeros((1,d))))
Xn = np.concatenate((Xn,Xne[1:,0:p]), axis = 1)
return np.transpose(Xn[:-1])
ryanjdillon's answer rewritten for significant performance improvement; it appends to a list instead of concatenating arrays, latter which copies the array iteratively and is much slower.
def buffer(x, n, p=0, opt=None):
if opt not in ('nodelay', None):
raise ValueError('{} not implemented'.format(opt))
i = 0
if opt == 'nodelay':
# No zeros at array start
result = x[:n]
i = n
else:
# Start with `p` zeros
result = np.hstack([np.zeros(p), x[:n-p]])
i = n-p
# Make 2D array, cast to list for .append()
result = list(np.expand_dims(result, axis=0))
while i < len(x):
# Create next column, add `p` results from last col if given
col = x[i:i+(n-p)]
if p != 0:
col = np.hstack([result[-1][-p:], col])
# Append zeros if last row and not length `n`
if len(col):
col = np.hstack([col, np.zeros(n - len(col))])
# Combine result with next row
result.append(np.array(col))
i += (n - p)
return np.vstack(result).T
def buffer(X, n, p=0):
'''
Parameters:
x: ndarray, Signal array, input a long vector as raw speech wav
n: int, frame length
p: int, Number of values to overlap
-----------
Returns:
result : (n,m) ndarray, Buffer array created from X
'''
import numpy as np
d = n - p
#print(d)
m = len(X)//d
c = n//d
#print(c)
if m * d != len(X):
m = m + 1
#print(m)
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xn_out = Xn
for i in range(c-1):
Xne = np.concatenate((Xn,np.zeros((i+1,d))))
Xn_out = np.concatenate((Xn_out, Xne[i+1:,:]),axis=1)
#print(Xn_out.shape)
if n-d*c>0:
Xne = np.concatenate((Xn, np.zeros((c,d))))
Xn_out = np.concatenate((Xn_out,Xne[c:,:n-p*c]),axis=1)
return np.transpose(Xn_out)
here is a improved code of Ali Khodabakhsh's sample code which is not work in my cases. Feel free to comment and use it.
Comparing the execution time of the proposed answers, by running
x = np.arange(1,200000)
start = timer()
y = buffer(x,60,20)
end = timer()
print(end-start)
the results are:
Andrzej May, 0.005595300000095449
OverLordGoldDragon, 0.06954789999986133
ryanjdillon, 2.427092700000003

filling numpy array by index

I have a function which gives me the index for a given value. Eg,
def F(value):
index = do_something(value)
return index
I want to use this index to fill a huge numpy array by 1s. Lets call array features
l = [1,4,2,3,7,5,3,6,.....]
NOTE: features.shape[0] = len(l)
for i in range(features.shape[0]):
idx = F(l[i])
features[i, idx] = 1
Is there a pythonic way to perform this (as the loop takes a lot of time if the array is huge)?
If you can vectorize F(value) you could write something like
indices = np.arange(features.shape[0])
feature_indices = F(l)
features.flat[indices, feature_indices] = 1
try this:
i = np.arange(features.shape[0]) # rows
j = np.vectorize(F)(np.array(l)) # columns
features[i,j] = 1

Categories

Resources