python convert prediction result into one hot - python

I run python neural network prediction expecting one hot result and got back numbers as follow:
[[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
how can I convert the array into one hot result, i.e.
[[0,1,0],
[0,0,1],
[0,1,0],
[1,0,0]
[0,1,0]]

By one hot result I assume you want max value of each sub-list to be 1 and rest to be 0 (based on the pattern in current result). You may do it using list comprehension as:
>>> [[int(item == max(sublist)) else 0 for item in sublist] for sublist in my_list]
# ^ converts bool value returned by `==` into `int`. True -> 1, False -> 0
[[0, 1, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0]]
where my_list is your initial list.
But in the above approach, you will be calculating max() each time while iteration over sub-list. Better way will be to do it like:
def get_hot_value(my_list):
max_val = max(my_list)
return [int(item == max_val) for item in my_list]
hot_list = [get_hot_value(sublist) for sublist in my_list]
Edit: If you are supposed to have just one 1 in the list (in case of more than 1 element of maximum value), you may modify the get_hot_value function as:
def get_hot_value(my_list):
max_val, hot_list, is_max_found = max(my_list), [], False
for item in my_list:
if item == max_val and not is_max_found:
hot_list.append(1)
else:
hot_list.append(0)
is_max_found = True
return hot_list

The other solutions are good, and solve the problem. Alternatively, if you have numpy,
import numpy as np
n = [[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
max_indices = np.argmax(n,axis=1)
final_values = [n[i] for i in max_indices]
argmax is able to find the index of the maximum value in that row, then you just need to do one list comprehension over that. Should be pretty fast I guess?

I would suggest this:
n = [[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
hot_results = []
for row in n:
hot_index = row.index(max(row))
hot_result = [0] * len(row)
hot_result[hot_index] = 1
hot_results.append(hot_result)
print(hot_results)

Related

compare each row with each column in matrix using only pure python

I have a certain function that I made and I want to run it on each column and each row of a matrix, to check if there are rows and columns that produce the same output.
for example:
matrix = [[1,2,3],
[7,8,9]]
I want to run the function, lets call it myfun, on each column [1,7], [2,8] and [3,9] separatly, and also run it on each row [1,2,3] and [7,8,9]. If there is a row and a column that produce the same result, the counter ct would go up 1. All of this is found in another function, called count_good, which basically counts rows and columns that produce the same result.
here is the code so far:
def count_good(mat):
ct = 0
for i in mat:
for j in mat:
if myfun(i) == myfun(j):
ct += 1
return ct
However, when I use print to check my code I get this:
mat = [[1,2,3],[7,8,9]]
​
for i in mat:
for j in mat:
print(i,j)
​
[1, 2, 3] [1, 2, 3]
[1, 2, 3] [7, 8, 9]
[7, 8, 9] [1, 2, 3]
[7, 8, 9] [7, 8, 9]
I see that the code does not return what I need' which means that the count_good function won't work. How can I run a function on each row and each column? I need to do it without any help of outside libraries, no map,zip or stuff like that, only very pure python.
Let's start by using itertools and collections for this, then translate it back to "pure" python.
from itertools import product, starmap, chain # combinations?
from collections import Counter
To iterate in a nested loop efficiently, you can use itertools.product. You can use starmap to expand the arguments of a function as well. Here is a generator of the values of myfun over the rows:
starmap(myfun, product(matrix, repeat=2))
To transpose the matrix and iterate over the columns, use the zip(* idiom:
starmap(myfun, product(zip(*matrix), repeat=2))
You can use collections.Counter to map all the repeats for each possible return value:
Counter(starmap(myfun, chain(product(matrix, repeat=2), product(zip(*matrix), repeat=2))))
If you want to avoid running myfun on the same elements, replace product(..., repeat=2) with combinations(..., 2).
Now that you have the layout of how to do this, replace all the external library stuff with equivalent builtins:
counter = {}
for i in range(len(matrix)):
for j in range(len(matrix)):
result = myfun(matrix[i], matrix[j])
counter[result] = counter.get(result, 0) + 1
for i in range(len(matrix[0])):
for j in range(len(matrix[0])):
c1 = [matrix[row][i] for row in range(len(matrix))]
c2 = [matrix[row][j] for row in range(len(matrix))]
result = myfun(c1, c2)
counter[result] = counter.get(result, 0) + 1
If you want combinations instead, replace the loop pairs with
for i in range(len(...) - 1):
for j in range(i + 1, len(...)):
Using native python:
def count_good(mat):
ct = 0
columns = [[row[col_idx] for row in mat] for col_idx in range(len(mat[0]))]
for row in mat:
for column in columns:
if myfun(row) == myfun(column):
ct += 1
return ct
However, this is very inefficient as it is a triple nested for-loop. I would suggest using numpy instead.
e.g.
def count_good(mat):
ct = 0
mat = np.array(mat)
for row in mat:
for column in mat.T:
if myfun(row) == myfun(column):
ct += 1
return ct
TL;DR
To get a column from a 2D list of N lists of M elements, first flatten the list to a 1D list of N×M elements, then choosing elements from the 1D list with a stride equal to M, the number of columns, gives you a column of the original 2D list.
First, I create a matrix of random integers, as a list of lists of equal
length — Here I take some liberty from the objective of "pure" Python, the OP
will probably input by hand some assigned matrix.
from random import randrange, seed
seed(20220914)
dim = 5
matrix = [[randrange(dim) for column in range(dim)] for row in range(dim)]
print(*matrix, sep='\n')
We need a function to be applied to each row and each column of the matrix,
that I intend must be supplied as a list. Here I choose a simple summation of
the elements.
def myfun(l_st):
the_sum = 0
for value in l_st: the_sum = the_sum+value
return the_sum
To proceed, we are going to do something unexpected, that is we unwrap the
matrix, starting from an empty list we do a loop on the rows and "sum" the
current row to unwrapped, note that summing two lists gives you a single
list containing all the elements of the two lists.
unwrapped = []
for row in matrix: unwrapped = unwrapped+row
In the following we will need the number of columns in the matrix, this number
can be computed counting the elements in the last row of the matrix.
ncols = 0
for value in row: ncols = ncols+1
Now, we can compute the values produced applying myfunc to each column,
counting how many times we have the same value.
We use an auxiliary variable, start, that is initialized to zero and
incremented in every iteration of the following loop, that scans, using a
dummy variable, all the elements of the current row, hence start has the
values 0, 1, ..., ncols-1, so that unwrapped[start::ncols] is a list
containing exactly one of the columns of the matrix.
count_of_column_values = {}
start = 0
for dummy in row:
column_value = myfun(unwrapped[start::ncols])
if column_value not in count_of_column_values:
count_of_column_values[column_value] = 1
else:
count_of_column_values[column_value] = count_of_column_values[column_value] + 1
start = start+1
At this point, we are ready to apply myfun to the rows
count = 0
for row in matrix:
row_value = myfun(row)
if row_value in count_of_column_values: count = count+count_of_column_values[row_value]
print(count)
Executing the code above prints
[1, 4, 4, 1, 0]
[1, 2, 4, 1, 4]
[1, 4, 4, 0, 1]
[4, 0, 3, 1, 2]
[0, 0, 4, 2, 2]
3

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?
I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Rearrange array element based on the number sequence and represent by array id

Suppose, I have multiple arrays in one array with the number from 0 to n in multiple order.
For example,
x = [[0,2,3,5],[1,4]]
Here we have two arrays in x. There could be more than two.
I want to get rearrange all the array elements based on their number sequence. However, they will represent their array ID. The result should be like this
y = [0,1,0,0,1,0]
That means 0,2,3,5 is in array id 0. So, they will show the id in their respective sequence. Same for 1 and 4. Can anyone help me to solve this? [N.B. There could be more than two arrays. So, it will be highly appreciated if the code work for different array numbers]
You can do this by using a dictionary
x = [[0,2,3,5],[1,4]]
lst = {}
for i in range(len(x)):
for j in range(len(x[i])):
lst[x[i][j]] = i
print(lst)
You can also do this by using list, list.insert(idx, value) means value is inserted to the list at the idxth index. Here, we are traversing through all the values of x and the value x[i][j] is in the i th number array.
x = [[0,2,3,5],[1,4]]
lst = []
for i in range(len(x)):
for j in range(len(x[i])):
lst.insert(x[i][j], i)
print(lst)
Output: [0, 1, 0, 0, 1, 0]
You might also consider using np.argsort for rearranging your array values and create the index-array with list comprehension:
x = [[0,2,3,5],[1,4]]
order = np.concatenate(x).argsort()
np.concatenate([ [i]*len(e) for i,e in enumerate(x) ])[order]
array([0, 1, 0, 0, 1, 0])

quickly subset array if values are matched in dictionary in python

I am trying to speed up this function. It checks if the sum of the values of a list exist inside a dictionary. For example, if the value that x takes after adding [0, 1], [1, 0], [0, -1], and [-1, 0] exist inside layout, then remove it as an option in the output. For example:
layout = { 0:[2,1], 1:[3,1], 2:[2,2], 3:[6,3] }
x = [2, 1]
possibilities = numpy.zeros(shape=(4,2))
possibilities[0] = [1, 0]
possibilities[1] = [-1, 0]
possibilities[2] = [0, 1]
possibilities[3] = [0, -1]
def myFun(x, layout, possibilities):
new_possibilities = possibilities + x
output_direction = []
for i in new_possibilities:
i = list(i)
output_direction.append( (i in layout.values()) )
output_direction = true_to_false(output_direction)
possibilities = possibilities[output_direction]
if possibilities.size == 0:
possibilities = [0, 0]
return possibilities
else:
return possibilities
# This changes True to False
def true_to_false(y):
output = []
for i in y:
if i == True:
output.append((False))
elif i == False:
output.append((True))
return output
If I now run this function I get the following output:
myFun(x, layout, possibilities)
array([[-1., 0.],
[ 0., -1.]])
The reason I get this output is because [0, 0] + x is occupied by [2,1] in layout, [0,1] + x is occupied by [2,2] in layout, and [1,0] + x is occupied by [3,1] in layout, whereas [-1, 0] + x and [0, -1] + x do not exist in layout and therefore this is the output result.
This function works fine I would just like it to be faster, since layout can get quite large (tens of thousands of items) and this function is already being run inside a for loop.
style
Please don't say, e.g., print((((42)))), when it suffices to say print(42). The superfluous parentheses make your code harder to read.
negation
Your negation function could be simplified to this:
def true_to_false(y):
return [not b
for b in y]
But you don't even need that. You can delete the function and avoid the cost of a function call by using not when you append:
output_direction = []
for i in new_possibilities:
output_direction.append(list(i) not in layout.values())
possibilities = possibilities[output_direction]
...
Even that much is on the verbose side, as it naturally fits in a list comprehension:
output_direction = [list(i) not in layout.values()
for i in new_possibilities]
speed
The trouble with repeatedly asking whether i is within .values() is that's a linear scan. If len(layout.values()) gets to be at all large, you really want to throw those values into a hash map:
vals = set(layout.values())
output_direction = [list(i) not in vals
for i in new_possibilities]
Now the O(n) linear scans become O(1) constant time hash lookups.
If vals usually doesn't change between one myFun invocation and the next, then consider passing it in as a parameter alongside layout. BTW, you could elide the x parameter if the caller is willing to pass in x + possibilities.
Have you considered using set intersection instead?

numpy: ravel_multi_index increment different results from iterating over indices loop

I have an array of indices (possible duplicates) where I increment each these of indices in another 2D matrix by 1. There have been several several suggestions and this answer proposes to use np.ravel_multi_index.
So, I've tried it out but they don't seem to give me the same set of answers. Any idea why?
raveled = np.ravel_multi_index(legit_indices.T, acc.shape)
counts = np.bincount(raveled)
acc = np.resize(counts, acc.shape)
acc2 = np.zeros(acc2.shape)
for i in legit_indices:
acc2[i[0], i[1]] += 1
(Pdb) np.array_equal(acc, acc2)
False
(Pdb) acc[493][5]
135
(Pdb) acc2[493][5]
0.0
There are a few problems with your current approach. Firstly, np.bincount(x)
will give you the counts for every positive integer value of x starting at 0
and ending at max(x):
print(np.bincount([1, 1, 3, 3, 3, 4]))
# [0, 2, 0, 3, 1]
# i.e. [count for 0, count for 1, count for 2, count for 3, count for 4]
Therefore, if not every location in acc.flat gets indexed, the length of
np.bincount(raveled) will be greater than the number of unique indices. What
you actually want is the counts only for those locations in acc.flat that are
indexed at least once.
Secondly, what you want to do is assign the bin counts to the corresponding
indices into acc.flat. What your call to np.resize does is to repeat parts
of your array of bincounts in order to make it the same size as acc.flat,
then reshape it to the same shape as acc. This will not result in the bin
counts being assigned to the correct locations in acc!
The way I would solve this problem would be to use np.unique instead of
np.bincount, and use it to return both the unique indices and their corresponding
counts. These can then be used to assign the correct counts to the correct unique locations within acc:
import numpy as np
# some example data
acc = np.zeros((4, 3))
legit_indices = np.array([[0, 1],
[0, 1],
[1, 2],
[1, 0],
[1, 0],
[1, 0]])
# convert the index array into a set of indices into acc.flat
flat_idx = np.ravel_multi_index(legit_indices.T, acc.shape)
# get the set of unique indices and their corresponding counts
uidx, ucounts = np.unique(flat_idx, return_counts=True)
# assign the count value to each unique index in acc.flat
acc.flat[uidx] = ucounts
# confirm that this matches the result of your for loop
acc2 = np.zeros_like(acc)
for ii, jj in legit_indices:
acc2[ii, jj] += 1
assert np.array_equal(acc, acc2)

Categories

Resources