I have an array, say p = [2,3,2,4] and a number, say n = 4. I need to generate an array of ones and zeros according to the pattern p, n-p. That is for each element, u in p, there are u ones followed by n-u zeros. It's very easy to do this using the np.insert operation. But theano doesn't have any insert op. Is it possible to achieve this without using a loop? Also, given multiple such ps and corresponding ns, is it possible to generate the ones and zeros patterns without using a loop?
Here's an example:
1 value of p:
p = [2,3,2,4,1], n=4
n-p = [2,1,2,0,3]
result = [1,1,0,0,1,1,1,0,1,1,0,0,1,1,1,1,1,0,0,0]
multiple values of p: In this case all p's will have same dimension(p is a 2D array)
p = [[2,3,2,4,1],[2,2,3,5,4]], n = [4, 5]
n-p = [[2,1,2,0,3],[3,3,2,0,1]]
result = [[1,1,0,0,1,1,1,0,1,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0],[1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,1,0]]
Please note that I've padded result[0] with 0s at the end to match the dimensions of result[0] and result[1]
p = numpy.array([2, 3, 2, 4])
n = 4
result = (p[:, None] > numpy.arange(n)).ravel().astype(int)
We compare
[[2]
[3]
[2]
[4]]
to [0 1 2 3] to get an array of booleans, then flatten it and convert it to integers to get the output you want.
Related
Given a numpy ndarray like the following
x = [[4.,0.,2.,0.,8.],
[1.,3.,0.,9.,5.],
[0.,0.,4.,0.,1.]]
I want to find the indices of the top k (e.g. k=3) elements of each row, excluding 0, if possible. If there are less than k positive elements, then just return their indices (in a sorted way).
The result should look like (a list of array)
res = [[4, 0, 2],
[3, 4, 1],
[2, 4]]
or just one flatten array
res = [4,0,2,3,4,2,2,4]
I know argsort can find the indices of top k elements in a sorted order. But I am not sure how to filter out the 0.
You can use numpy.argsort with (-num) for getting index as descending. then use numpy.take_along_axis for getting values base index of 2D sorted.
Because you want to ignore zero you can insert zero for other columns after three (as you mention in the question). At the end return value from the sorted values that is not zero.
x = np.array([[4.,0.,2.,0.,8.],[1.,3.,0.,9.,5.],[0.,0.,4.,0.,1.]])
idx_srt = np.argsort(-x)
val_srt = np.take_along_axis(x, idx_srt, axis=-1)
val_srt[:, 3:] = 0
res = idx_srt[val_srt!=0]
print(res)
[4 0 2 3 4 1 2 4]
Try one of these two:
k = 3
res = [sorted(range(len(r)), key=(lambda i: r[i]), reverse=True)[:min(k, len([n for n in r if n > 0]))] for r in x]
or
res1 = [np.argsort(r)[::-1][:min(k, len([n for n in r if n > 0]))] for r in x]
I came up with the following solution:
top_index = score.argsort(axis=1) # score here is my x
positive = (score > 0).sum(axis=1)
positive = np.minimum(positive, k) # top k
# broadcasting trick to get mask matrix that selects top k (k = min(2000, num of positive scores))
r = np.arange(score.shape[1])
mask = (positive[:,None] > r)
top_index_flatten = top_index[:, ::-1][mask]
I compare my result with the one suggested by #I'mahdi and they are consistent.
I'm trying to write in an algorithm a function that:
Check if all elements in a list are different
Multiply all elements in the list, except the zeros
But I can't find a way to compare all elements in one list, do you have any idea ?
Thanks!
PS: I use arr = np.random.randint(10, size=a) to create a random list
EDIT:
More precisely, I'm trying to check if, in a numpy array to be more precise, all the elements are the same or different, if they are different, that it returns me True.
Also, once that done, multiply all elements in the array except the zeros
For example:
If I have an array [4,2,6,8,9,0], the algorithm tells returns me at first True because all elements are different, then it multiplies them 4*2*6*8*9 except the 0
To check if all elements in a list are different you can convert the list into a set which removes duplicates and compare the length of the set to the original list. If the length of the set is different than the length of the list, then there are duplicates.
x = np.random.randint(10, size=10)
len(set(x)) == len(x)
To multiply all values except 0 you can do list comprehension to remove the 0s and use np.prod to multiply the values in the new list.
np.prod([i for i in x if i != 0])
Example:
x = np.random.randint(10, size=10)
if len(set(x)) == len(x):
print(np.prod([i for i in x if i != 0]))
else:
print("array is not unique")
You can use numpy.unique.
Following code snippet checks if all elements in the array are unique (different from each other) and if so, it will multiply non-zero values with factor factor:
import numpy as np
factor = 5
if np.unique(arr).size == arr.size:
arr[arr != 0] = arr[arr != 0] * factor
You can use Collections to find the unique numbers. I have included a code that solves your problem.
import numpy as np
from collections import Counter
a = 5
arr = np.random.randint(10, size=a)
result = 1 #Variable that will store the product
flag = 0 #The counter
#Check if all the numbers are unique
for i in Counter(arr):
if Counter(arr)[i] > 1:
flag = 1
break
#Convert the dictionary into a list
l = [i for i in Counter(arr)]
#Return the product of all the numbers in the list except 0
if flag == 0:
for i in l:
if i != 0:
result = result * i
else:
print("The numbers are not unique")
Just for fun, here's a one-liner:
arr = np.array([1, 2, 3, 4, 0])
np.prod(arr[arr!=0]) if np.unique(arr).size == arr.size else False
>>> 24
If the array is [1, 2, 3, 4, 4] the result is False
I am quite new in Python programming. What's an efficient and Pyhtonic way to find the most frequent progressive digit from a list of 4-digits numbers?
Let's say I have the following list: [6111, 7111, 6112, 6121, 6115, 6123].
The logic is to observe that the for the first digit the 6 is the most frequent. I can eliminate the number 7111 for the next considerations.
For the second digit I consider the new candidates [6111, 6112, 6121, 6115, 6123] and I observe that the 1 is the most frequent digit and so on.
At the end of the algorithm I'll have just 1 number of the list left.
If there are 2 or more number with the same occurrences for a digit I can either pick the smaller one on a random one between all of them.
A simple approach could be to convert the list into a Nx4 matrix and consider for each column the most frequent digit. This could work but I find a very stupid and inefficient way to solve this problem. Can anyone help?
EDIT: my code for this solution (NOTE: THIS CODE DOES NOT ALWAYS WORK, SOMETHING IS WRONG. FOR THE SOLUTION TO THIS PROBLEM PLEASE REFER TO #MadPhysicist ANSWER)
import numpy as np
import pandas as pd
from collections import Counter
numbers_list = [6111, 7111, 6112, 6121, 6115, 6123]
my_list = []
for number in numbers_list:
digit_list = []
for c in str(number):
digit_list.append(c)
my_list.append(digit_list)
matrix = np.array(my_list)
matrix0 = matrix
my_counter = Counter(matrix.T[0]).most_common(1)
i=0
for digit0 in matrix.T[0]:
if digit0 != my_counter[0][0]:
matrix0 = np.delete(matrix, i, 0)
i += 1
matrix = matrix0
matrix1 = matrix
my_counter = Counter(matrix.T[1]).most_common(1)
i=0
for digit1 in matrix.T[1]:
if digit1 != my_counter[0][0]:
matrix1 = np.delete(matrix, i, 0)
i += 1
matrix = matrix1
matrix2 = matrix
my_counter = Counter(matrix.T[2]).most_common(1)
i=0
for digit2 in matrix.T[2]:
if digit2 != my_counter[0][0]:
matrix2 = np.delete(matrix, i, 0)
i += 1
matrix = matrix2
matrix3 = matrix
my_counter = Counter(matrix.T[3]).most_common(1)
i=0
for digit3 in matrix.T[3]:
if digit3 != my_counter[0][0]:
matrix3 = np.delete(matrix, i, 0)
i += 1
matrix = matrix3
print (matrix[0])
Your idea of converting to a numpy array is solid. You don't need to split it up-front. A series of masks and histograms will pare down the array fairly quickly.
z = np.array([6111, 7111, 6112, 6121, 6115, 6123])
The nth digits (zero-based) can be obtained with something like
nth = (z // 10**n) % 10
Counting the most frequent one can be accomplished quickly with np.bincount as shown here:
frequentest = np.argmax(np.bincount(nth))
You can select the elements that have that digit in the nth place with simply
mask = nth == frequentest
So now run this in a loop over n (going backwards):
# Input array
z = np.array([6111, 7111, 6112, 6121, 6115, 6123])
# Compute the maximum number of decimal digits in the list.
# You can just manually set this to 4 if you prefer
n = int(np.ceil(np.log10(z + 1).max()))
# Empty output array
output = np.empty(n, dtype=int)
# Loop over the number of digits in reverse.
# In this case, i will be 3, 2, 1, 0.
for i in range(n - 1, -1, -1):
# Get the ith digit from each element of z
# The operators //, ** and % are vectorized: they operate
# on each element of an array to return an array
ith = (z // 10**i) % 10
# Count the number of occurrences of each number 0-9 in the ith digit
# Bincount returns an array of 10 elements. counts[0] is the number of 0s,
# counts[1] is the number of 1s, ..., counts[9] is the number of 9s
counts = np.bincount(ith)
# argmax finds the index of the maximum element: the digit with the
# highest count
output[i] = np.argmax(counts)
# Trim down the array to numbers that have the requested digit in the
# right place. ith == output[i] is a boolean mask. It is 1 where ith
# is the most common digit and 0 where it is not. Indexing with such a
# mask selects the elements at locations that are non-zero.
z = z[ith == output[i]]
As it happens, np.argmax will return the index of the first maximum count if there are multiple available, meaning that it will always select the smallest number.
You can recover the number from output with something like
>>> output
array([1, 1, 1, 6])
>>> (output * 10**np.arange(output.size)).sum()
6111
You can also just get the remaining element of z:
>>> z[0]
6111
I am trying to split a list into n sublists where the size of each sublist is random (with at least one entry; assume P>I). I used numpy.split function which works fine but does not satisfy my randomness condition. You may ask which distribution the randomness should follow. I think, it should not matter. I checked several posts which were not equivalent to my post as they were trying to split with almost equally sized chunks. If duplicate, let me know. Here is my approach:
import numpy as np
P = 10
I = 5
mylist = range(1, P + 1)
[list(x) for x in np.split(np.array(mylist), I)]
This approach collapses when P is not divisible by I. Further, it creates equal sized chunks, not probabilistically sized chunks. Another constraint: I do not want to use the package random but I am fine with numpy. Don't ask me why; I wish I had a logical response for it.
Based on the answer provided by the mad scientist, this is the code I tried:
P = 10
I = 5
data = np.arange(P) + 1
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
result = np.split(data, indices)
result
Output:
[array([1, 2]),
array([3, 4, 5, 6]),
array([], dtype=int32),
array([4, 5, 6, 7, 8, 9]),
array([10])]
The problem can be refactored as choosing I-1 random split points from {1,2,...,P-1}, which can be viewed using stars and bars.
Therefore, it can be implemented as follows:
import numpy as np
split_points = np.random.choice(P - 2, I - 1, replace=False) + 1
split_points.sort()
result = np.split(data, split_points)
np.split is still the way to go. If you pass in a sequence of integers, split will treat them as cut points. Generating random cut points is easy. You can do something like
P = 10
I = 5
data = np.arange(P) + 1
indices = np.random.randint(P, size=I - 1)
You want I - 1 cut points to get I chunks. The indices need to be sorted, and duplicates need to be removed. np.unique does both for you. You may end up with fewer than I chunks this way:
result = np.split(data, indices)
If you absolutely need to have I numbers, choose without resampling. That can be implemented for example via np.shuffle:
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
indices.sort()
When we multiply two matrices A of size m x k and B of size k x n we use the following code:
#for resultant matrix rows
for i in range(m):
#for resultant matrix column
for j in range(n):
for l in range(k):
#A's row x B's columns
c[i][j]=c[i][j]+a[i][l]*b[l][j]
are my comments in the code right explanation of the loops? Is there a better explanation of the loops or is there a better thought process to code matrix multiplication?
EDIT1: I am not looking for a better code. My question is about the thought process that goes in when we transform the math of matrix multiplicate into code.
Your code is correct but if you want to add detail comment/explanation like you ask for you can do so:
#for resultant matrix rows
for i in range(m):
#for resultant matrix column
for j in range(n):
#for each entry in resultant matrix we have k entries to sum
for l in range(k):
#where each i, j entry in the result matrix is given by multiplying the
#entries A[i][l] (across row i of A) by the entries B[l][j] (down
#column j of B), for l = 1, 2, ..., k, and summing the results over l:
c[i][j]=c[i][j]+a[i][l]*b[l][j]
EDIT: if you want a better explanation of the loop or thought process than take out #A's row x B's columns comments. and replace it with "where each i, j entry in the result matrix is given by multiplying the entries A[i][l] (across row i of A) by the entries B[l][j] (down column j of B), for l = 1, 2, ..., k, and summing the results over " also don't use l as an iterator it looks like a 1
You can use numpy.dot function. Here's the documentation. Example (extracted from the documentatio):
> a = [[1, 0], [0, 1]]
> b = [[4, 1], [2, 2]]
> np.dot(a, b)
> array([[4, 1],
[2, 2]])
The condition that should always stand in order to do 2 matrices multiplication is that first matrix must have the same amount of rows that the other matrix has columns.
so if matrix_1 is m x n than second matrix_2 should be n x p. The result of the two will have a dimension of m x p
the Pseudocode will be:
multiplyMatrix(matrix1, matrix2)
-- Multiplies rows and columns and sums them
multiplyRowAndColumn(row, column) returns number
var
total: number
begin
for each rval in row and cval in column
begin
total += rval*cval
end
return total
end
begin
-- If the rows don't match up then the function fails
if matrix1:n != matrix2:m return failure;
dim = matrix1:n -- Could also be matrix2:m
newmat = new squarematrix(dim) -- Create a new dim x dim matrix
for each r in matrix1:rows and c in matrix2:columns
begin
end
end
In python either you can do what you did, or you can use ijk-algo, ikj-algo, psyco ikj-algo, Numpy, or SciPy to accomplish this. It appears that Numpy is the fastest and most efficient.
YOUR CODE LOOKS RIGHT AND YOUR COMMENTS ALSO DO LOOK CORRECT