Related
I have a bottleneck in a piece of code that is ruining the performance of my code. I re-wrote the section, but, after timing it, things didn't improve.
The problem is as follows. Given a list of fixed-length-lists of integers
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
I need to append the index of each sublist to a separate list as many times as its list value at a given column index. There is a separate list for each column in the data.
For instance, for the above data, there will be three resulting lists since the sub-lists have three columns.
There are 4 sublists, so we expect the numbers 0-3 to appear in each of the final lists.
We expect the following three lists to be generated from the above data
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
I have two ways of doing this:
processed_data = list([] for _ in range(len(data[0])))
for n in range(len(data)):
sub_list = data[n]
for k, proc_list in enumerate(processed_data):
for _ in range(sub_list[k]):
proc_list.append(n)
processed_data = []
for i, col in enumerate(zip(*data)):
processed_data.append([j for j,count in enumerate(col) for _ in range(count)])
The average size of the data list is around 100,000.
Is there a way I can speed this up?
You can't improve the computational complexity of your algorithm unless you're able to tweak the output format (see below). In other words, you'll at best be able to improve the speed by a modest percentage (and the percentage will be independent of the size of the input).
I don't see any obvious implementation issues. The one idea I had was to get rid of the large number of append() calls and the overhead that is incurred by gradual list expansions by preallocating the output matrix, but #juanpa.arrivillaga suggests in their comment that append() is in fact very optimized on CPython. If you're on another interpreter, you could try it: you know that the length of the output list for column c will be equal to the sum of all the input numbers in column c. So you can just preallocate each output list by using [0] * sum_of_input_values_at_column_c, and then do proc_list[i] = n instead of proc_list.append(n) (and manually increment i). This does, however, require two passes over the input, so it might not actually be an improvement - your problem is quite memory-intensive as its core computation is extremely simple.
The reason that you can't improve the computational complexity is that it is already optimal: any algorithm needs to spend time on generating its output, so the size of the output is a lower bound for how fast the algorithm can possibly be. And in your case, the size of the output is equal to the sum of the values in your input matrix (and it's generally considered bad when you depend on the input values themselves rather than on the number of input values). And that's the number of iterations that your algorithm spends, so it is optimal. However, if the output of this function is going to reside in memory to be consumed by another function (rather than being written to a file), and you are able to make some adaptations in that function, you could instead output a matrix of generators, where each generator knows that it needs to generate sub_list[k] occurrences of n. Then, the complexity of your algorithm becomes proportional to the size of the input matrix (but consuming the output will still take the same amount of time that it would have taken to generate the full output).
Perhaps itertools can make this go faster for you by minimizing the amount of python code inside loops:
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
from itertools import chain,repeat,starmap
result = [ list(chain.from_iterable(starmap(repeat,r)))
for r in map(enumerate,zip(*data)) ]
print(result)
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
If you're processing the output in the same order as the result's rows come out, you can convert this to a generator and use it directly in your main process:
iResult = ( chain.from_iterable(starmap(repeat,r))
for r in map(enumerate,zip(*data)) )
for iRow in iResult: # iRow is also an iterator
for resultItem in iRow:
# Perform your item processing here
print(resultItem, end=" ")
print()
0 1 1 1 2 2 2 2 2 2 2 2 3
0 0 1 1 2 3 3 3
0 0 0 1 3 3 3 3
This will avoid creating and storing the lists of indexes altogether (i.e. bringing that bottleneck down to zero). But that's only if you process the result sequentially
I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?
I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>
I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)
I have a question regarding python and selecting elements within a range.
If I have a n x m matrix with n row and m columns, I have a defined range for each column (so I have m min and max values).
Now I want to select those rows, where all values are within the range.
Looking at the following example:
input = matrix([[1, 2], [3, 4],[5,6],[1,8]])
boundaries = matrix([[2,1],[8,5]])
#Note:
#col1min = 2
#col1max = 8
#col2min = 1
#col2max = 5
print(input)
desired_result = matrix([[3, 4]])
print(desired_result)
Here, 3 rows where discarded, because they contained values beyond the boundaries.
While I was able to get values within one range for a given array, I did not manage to solve this problem efficiently.
Thank you for your help.
I believe that there is more elegant solution, but i came to this:
def foo(data, boundaries):
zipped_bounds = list(zip(*boundaries))
output = []
for item in data:
for index, bound in enumerate(zipped_bounds):
if not (bound[0] <= item[index] <= bound[1]):
break
else:
output.append(item)
return output
data = [[1, 2], [3, 4], [5, 6], [1, 8]]
boundaries = [[2, 1], [8, 5]]
foo(data, boundaries)
Output:
[[3, 4]]
And i know that there is not checking and raising exceptions if the sizes of arrays won't match each concrete size. I leave it OP to implement this.
Your example data syntax is not correct matrix([[],..]) so it needs to be restructured like this:
matrix = [[1, 2], [3, 4],[5,6],[1,8]]
bounds = [[2,1],[8,5]]
I'm not sure exactly what you mean by "efficient", but this solution is readable, computationally efficient, and modular:
# Test columns in row against column bounds or first bounds
def row_in_bounds(row, bounds):
for ci, colVal in enumerate(row):
bi = ci if len(bounds[0]) >= ci + 1 else 0
if not bounds[1][bi] >= colVal >= bounds[0][bi]:
return False
return True
# Use a list comprehension to apply test to n rows
print ([r for r in matrix if row_in_bounds(r,bounds)])
>>>[[3, 4]]
First we create a reusable test function for rows accepting a list of bounds lists, tuples are probably more appropriate, but I stuck with list as per your specification.
Then apply the test to your matrix of n rows with a list comprehension. If n exceeds the bounds column index or the bounds column index is falsey use the first set of bounds provided.
Keeping the row iterator out of the row parser function allows you to do things like get min/max from the filtered elements as required. This way you will not need to define a new function for every manipulation of the data required.
I'm trying to identify if a large list has consecutive elements that are the same.
So let's say:
lst = [1, 2, 3, 4, 5, 5, 6]
And in this case, I would return true, since there are two consecutive elements lst[4] and lst[5], are the same value.
I know this could probably be done with some sort of combination of loops, but I was wondering if there were a more efficient way to do this?
You can use itertools.groupby() and a generator expression within any()
*:
>>> from itertools import groupby
>>> any(sum(1 for _ in g) > 1 for _, g in groupby(lst))
True
Or as a more Pythonic way you can use zip(), in order to check if at least there are two equal consecutive items in your list:
>>> any(i==j for i,j in zip(lst, lst[1:])) # In python-2.x,in order to avoid creating a 'list' of all pairs instead of an iterator use itertools.izip()
True
Note: The first approach is good when you want to check if there are more than 2 consecutive equal items, otherwise, in this case the second one takes the cake!
* Using sum(1 for _ in g) instead of len(list(g)) is very optimized in terms of memory use (not reading the whole list in memory at once) but the latter is slightly faster.
You can use a simple any condition:
lst = [1, 2, 3, 4, 5, 5, 6]
any(lst[i]==lst[i+1] for i in range(len(lst)-1))
#outputs:
True
any return True if any of the iterable elements are True
If you're looking for an efficient way of doing this and the lists are numerical, you would probably want to use numpy and apply the diff (difference) function:
>>> numpy.diff([1,2,3,4,5,5,6])
array([1, 1, 1, 1, 0, 1])
Then to get a single result regarding whether there are any consecutive elements:
>>> numpy.any(~numpy.diff([1,2,3,4,5,5,6]).astype(bool))
This first performs the diff, inverts the answer, and then checks if any of the resulting elements are non-zero.
Similarly,
>>> 0 in numpy.diff([1, 2, 3, 4, 5, 5, 6])
also works well and is similar in speed to the np.any approach (credit for this last version to heracho).
Here a more general numpy one-liner:
number = 7
n_consecutive = 3
arr = np.array([3, 3, 6, 5, 8, 7, 7, 7, 4, 5])
# ^ ^ ^
np.any(np.convolve(arr == number, v=np.ones(n_consecutive), mode='valid')
== n_consecutive)[0]
This method always searches the whole array, while the approach from #Kasramvd ends when the condition is first met. So which method is faster dependents on how sparse those cases of consecutive numbers are.
If you are interested in the positions of the consecutive numbers, and have to look at all elements of the array this approach should be faster (for larger arrays (or/and longer sequences)).
idx = np.nonzero(np.convolve(arr==number, v=np.ones(n_consecutive), mode='valid')
== n_consecutive)
# idx = i: all(arr[i:i+n_consecutive] == number)
If you are not interested in a specific value but at all consecutive numbers in general a slight variation of #jmetz's answer:
np.any(np.convolve(np.abs(np.diff(arr)), v=np.ones(n_consecutive-1), mode='valid') == 0)
# ^^^^^^
# EDIT see djvg's comment
Starting in Python 3.10, the new pairwise function provides a way to slide through pairs of consecutive elements, so that we can test the quality between consecutive elements:
from itertools import pairwise
any(x == y for (x, y) in pairwise([1, 2, 3, 4, 5, 5, 6]))
# True
The intermediate result of pairwise:
pairwise([1, 2, 3, 4, 5, 5, 6])
# [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5), (5, 6)]
A simple for loop should do it:
def check(lst):
last = lst[0]
for num in lst[1:]:
if num == last:
return True
last = num
return False
lst = [1, 2, 3, 4, 5, 5, 6]
print (check(lst)) #Prints True
Here, in each loop, I check if the current element is equal to the previous element.
The convolution approach suggested in scleronomic's answer is very promising, especially if you're looking for more than two consecutive elements.
However, the implementation presented in that answer might not be the most efficient, because it consists of two steps: diff() followed by convolve().
Alternative implementation
If we consider that the diff() can also be calculated using convolution, we can combine the two steps into a single convolution.
The following alternative implementation only requires a single convolution of the full signal, which is advantageous if the signal has many elements.
Note that we cannot take the absolute values of the diff (to prevent false positives, as mentioned in this comment), so we add some random noise to the unit kernel instead.
# example signal
signal = numpy.array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0])
# minimum number of consecutive elements
n_consecutive = 3
# convolution kernel for weighted moving sum (with small random component)
rng = numpy.random.default_rng()
random_kernel = 1 + 0.01 * rng.random(n_consecutive - 1)
# convolution kernel for first-order difference (similar to numpy.diff)
diff_kernel = [1, -1]
# combine the kernels so we only need to do one convolution with the signal
combined_kernel = numpy.convolve(diff_kernel, random_kernel, mode='full')
# convolve the signal to get the moving weighted sum of differences
moving_sum_of_diffs = numpy.convolve(signal, combined_kernel, mode='valid')
# check if moving sum is zero anywhere
result = numpy.any(moving_sum_of_diffs == 0)
See the DSP guide for a detailed discussion of convolution.
Timing
The difference between the two implementations boils down to this:
def original(signal, unit_kernel):
return numpy.convolve(numpy.abs(numpy.diff(signal)), unit_kernel, mode='valid')
def alternative(signal, combined_kernel):
return numpy.convolve(signal, combined_kernel, mode='valid')
where unit_kernel = numpy.ones(n_consecutive - 1) and combined_kernel is defined above.
Comparison of these two functions, using timeit, shows that alternative() can be several times faster, for small kernel sizes (i.e. small value of n_consecutive). However, for large kernel sizes the advantage becomes negligible, because the convolution becomes dominant (compared to the diff).
Notes:
For large kernel sizes I would prefer the original two-step approach, as I think it is easier to understand.
Due to numerical issues it may be necessary to replace numpy.any(moving_sum_of_diffs == 0) by numpy.any(numpy.abs(moving_sum_of_diffs) < very_small_number), see e.g. here.
My solution for this if you want to find out whether 3 consecutive values are equal to 7. For example, a tuple of intList = (7, 7, 7, 8, 9, 1):
for i in range(len(intList) - 1):
if intList[i] == 7 and intList[i + 2] == 7 and intList[i + 1] == 7:
return True
return False