quickly subset array if values are matched in dictionary in python - python

I am trying to speed up this function. It checks if the sum of the values of a list exist inside a dictionary. For example, if the value that x takes after adding [0, 1], [1, 0], [0, -1], and [-1, 0] exist inside layout, then remove it as an option in the output. For example:
layout = { 0:[2,1], 1:[3,1], 2:[2,2], 3:[6,3] }
x = [2, 1]
possibilities = numpy.zeros(shape=(4,2))
possibilities[0] = [1, 0]
possibilities[1] = [-1, 0]
possibilities[2] = [0, 1]
possibilities[3] = [0, -1]
def myFun(x, layout, possibilities):
new_possibilities = possibilities + x
output_direction = []
for i in new_possibilities:
i = list(i)
output_direction.append( (i in layout.values()) )
output_direction = true_to_false(output_direction)
possibilities = possibilities[output_direction]
if possibilities.size == 0:
possibilities = [0, 0]
return possibilities
else:
return possibilities
# This changes True to False
def true_to_false(y):
output = []
for i in y:
if i == True:
output.append((False))
elif i == False:
output.append((True))
return output
If I now run this function I get the following output:
myFun(x, layout, possibilities)
array([[-1., 0.],
[ 0., -1.]])
The reason I get this output is because [0, 0] + x is occupied by [2,1] in layout, [0,1] + x is occupied by [2,2] in layout, and [1,0] + x is occupied by [3,1] in layout, whereas [-1, 0] + x and [0, -1] + x do not exist in layout and therefore this is the output result.
This function works fine I would just like it to be faster, since layout can get quite large (tens of thousands of items) and this function is already being run inside a for loop.

style
Please don't say, e.g., print((((42)))), when it suffices to say print(42). The superfluous parentheses make your code harder to read.
negation
Your negation function could be simplified to this:
def true_to_false(y):
return [not b
for b in y]
But you don't even need that. You can delete the function and avoid the cost of a function call by using not when you append:
output_direction = []
for i in new_possibilities:
output_direction.append(list(i) not in layout.values())
possibilities = possibilities[output_direction]
...
Even that much is on the verbose side, as it naturally fits in a list comprehension:
output_direction = [list(i) not in layout.values()
for i in new_possibilities]
speed
The trouble with repeatedly asking whether i is within .values() is that's a linear scan. If len(layout.values()) gets to be at all large, you really want to throw those values into a hash map:
vals = set(layout.values())
output_direction = [list(i) not in vals
for i in new_possibilities]
Now the O(n) linear scans become O(1) constant time hash lookups.
If vals usually doesn't change between one myFun invocation and the next, then consider passing it in as a parameter alongside layout. BTW, you could elide the x parameter if the caller is willing to pass in x + possibilities.
Have you considered using set intersection instead?

Related

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?
I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

python convert prediction result into one hot

I run python neural network prediction expecting one hot result and got back numbers as follow:
[[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
how can I convert the array into one hot result, i.e.
[[0,1,0],
[0,0,1],
[0,1,0],
[1,0,0]
[0,1,0]]
By one hot result I assume you want max value of each sub-list to be 1 and rest to be 0 (based on the pattern in current result). You may do it using list comprehension as:
>>> [[int(item == max(sublist)) else 0 for item in sublist] for sublist in my_list]
# ^ converts bool value returned by `==` into `int`. True -> 1, False -> 0
[[0, 1, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0]]
where my_list is your initial list.
But in the above approach, you will be calculating max() each time while iteration over sub-list. Better way will be to do it like:
def get_hot_value(my_list):
max_val = max(my_list)
return [int(item == max_val) for item in my_list]
hot_list = [get_hot_value(sublist) for sublist in my_list]
Edit: If you are supposed to have just one 1 in the list (in case of more than 1 element of maximum value), you may modify the get_hot_value function as:
def get_hot_value(my_list):
max_val, hot_list, is_max_found = max(my_list), [], False
for item in my_list:
if item == max_val and not is_max_found:
hot_list.append(1)
else:
hot_list.append(0)
is_max_found = True
return hot_list
The other solutions are good, and solve the problem. Alternatively, if you have numpy,
import numpy as np
n = [[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
max_indices = np.argmax(n,axis=1)
final_values = [n[i] for i in max_indices]
argmax is able to find the index of the maximum value in that row, then you just need to do one list comprehension over that. Should be pretty fast I guess?
I would suggest this:
n = [[0.33058667182922363, 0.3436272442340851, 0.3257860243320465],
[0.32983461022377014, 0.3487854599952698, 0.4213798701763153],
[0.3311253488063812, 0.3473075330257416, 0.3215670585632324],
[0.38368630170822144, 0.35151687264442444, 0.3247968554496765],
[0.3332786560058594, 0.343686580657959, 0.32303473353385925]]
hot_results = []
for row in n:
hot_index = row.index(max(row))
hot_result = [0] * len(row)
hot_result[hot_index] = 1
hot_results.append(hot_result)
print(hot_results)

Python - Select elements from matrix within range

I have a question regarding python and selecting elements within a range.
If I have a n x m matrix with n row and m columns, I have a defined range for each column (so I have m min and max values).
Now I want to select those rows, where all values are within the range.
Looking at the following example:
input = matrix([[1, 2], [3, 4],[5,6],[1,8]])
boundaries = matrix([[2,1],[8,5]])
#Note:
#col1min = 2
#col1max = 8
#col2min = 1
#col2max = 5
print(input)
desired_result = matrix([[3, 4]])
print(desired_result)
Here, 3 rows where discarded, because they contained values beyond the boundaries.
While I was able to get values within one range for a given array, I did not manage to solve this problem efficiently.
Thank you for your help.
I believe that there is more elegant solution, but i came to this:
def foo(data, boundaries):
zipped_bounds = list(zip(*boundaries))
output = []
for item in data:
for index, bound in enumerate(zipped_bounds):
if not (bound[0] <= item[index] <= bound[1]):
break
else:
output.append(item)
return output
data = [[1, 2], [3, 4], [5, 6], [1, 8]]
boundaries = [[2, 1], [8, 5]]
foo(data, boundaries)
Output:
[[3, 4]]
And i know that there is not checking and raising exceptions if the sizes of arrays won't match each concrete size. I leave it OP to implement this.
Your example data syntax is not correct matrix([[],..]) so it needs to be restructured like this:
matrix = [[1, 2], [3, 4],[5,6],[1,8]]
bounds = [[2,1],[8,5]]
I'm not sure exactly what you mean by "efficient", but this solution is readable, computationally efficient, and modular:
# Test columns in row against column bounds or first bounds
def row_in_bounds(row, bounds):
for ci, colVal in enumerate(row):
bi = ci if len(bounds[0]) >= ci + 1 else 0
if not bounds[1][bi] >= colVal >= bounds[0][bi]:
return False
return True
# Use a list comprehension to apply test to n rows
print ([r for r in matrix if row_in_bounds(r,bounds)])
>>>[[3, 4]]
First we create a reusable test function for rows accepting a list of bounds lists, tuples are probably more appropriate, but I stuck with list as per your specification.
Then apply the test to your matrix of n rows with a list comprehension. If n exceeds the bounds column index or the bounds column index is falsey use the first set of bounds provided.
Keeping the row iterator out of the row parser function allows you to do things like get min/max from the filtered elements as required. This way you will not need to define a new function for every manipulation of the data required.

Turning a list of coordinates into a set of list/grid references

I have a list which will get the top left section of a grid that is 2n+1 size.
n = 1
section = [[x, y] for x in range (n+1) for y in range (n+1)]
which looks like: [[0, 0], [0, 1], [1, 0], [1, 1]] and will get the coordinates of the top left hand corner of the 3x3(2n+1) grid, e.g:
[[1, 2, 3],
[4, 0, 5],
[6, 7, 8]]
I can then get every single quarter section of the grid by editing the values of the coords in this list:
for coOrd in topRight:
coOrd[1] += n
for coOrd in botLeft:
coOrd[0] += n
for coOrd in botRight:
coOrd[0] += n
coOrd[1] += n
I need to somehow individualise every second list element from
[0, 0], [0, 1], [1, 0]... -> [0][0], [0][1], [1][0]
so that I can use it to reference a grid list which has the values for the entire grid, which will look something like this: gridList[0][0], gridList[0][1]
I would prefer to be able to get each section via a for loop that will go through each coordinate in topLeft, topRight etc, and apply these to the gridList.
gridTL = [gridList[0][0], gridList[0][1], gridList[1][0], gridList[1][1]]
gridTR = [gridList[0][1], gridList[0][2], gridList[1][1], gridList[1][2]]
gridBL = [gridList[1][0], gridList[1][1], gridList[2][0], gridList[2][1]]
gridBR = [gridList[1][1], gridList[1][2], gridList[2][1], gridList[2][2]]
Hopefully I have explained well enough for you to understand, If I haven't sorry, please do ask more and I will try my best to explain.
Essentially it boils down to: I need to split a (3x3) grid into 4 sections, I have come up with a loop to grab the coordinates, I am struggling to implement these coordinates to be able to grab values from the grid.
Thanks.
You can write a helper function:
def gL(gridList, values):
return gridlist[values[0]][values[1]]
Then, your
gridTL = map(lambda m: gl(gridList, m), section)
Where, section is any of the sections you want ...
Note:
map isnt considered very Pythonic. You can consider using list comprehensions for similar things. Something like this :
gridTL = [gridlist[m[0]][m[1]]for m in section]

Is there a 'skip-range' technique in FOR-loops in Python?

Let's pretend (since it's true) that I have a Python (3) script that needs to iterate over a 2D array (any length, but each element is just an array of 2 ints, as per the list below).
linCirc = [[1,10],
[2, 1],
[0, 2],
[2, 2],
[2, 3],
[2, 4],
[2, 0],
[2, 5]]
I want to iterate over this lovely thing recursively so that
for element in linCirc:
if element[0] == 0:
# skip element[1] elements
Essentially, all I need to know is the best way to loop over linCirc, and then when certain conditions are met, instead of going from linCirc.index(element) to linCirc.index(element) + 1, I can control the skip, and skip zero or more elements. For instance, instead of going from [0, 2] to [2, 2], I could go from [0, 2] to [2, 4]. Is this the best way to do this? Should a for loop be involved at all?
For the curious: This code is intended to linearize an electric circuit so that any circuit (with limited components; say, just resistors and batteries for now) can be represented by a 2D array (like linCirc). I will post my full code if you want, but I don't want to clog this up with useless code.
index = 0
while index < linCirc.length:
if linCirc[index][0] == 0:
index = index + linCirc[index][1]
else:
index = index + 1
Hopefully this provides the functionality you're looking for. Obviously you'll have to add some useful code to this - it just runs from start to end of the array. It may also be helpful to add an index check, to make sure it doesn't run out of bounds when it encounters [0, i] less than i elements from the end of the array.
To support an arbitrary iterable (not just sequences such as list) you could use the consume() recipe:
it = iter(linCirc)
for element in it:
if element[0] == 0:
# skip element[1] elements
n = element[1]
next(islice(it, n, n), None) # see consume() recipe from itertools docs
print(element)
example

Categories

Resources