Function Failing at Large List Sizes

Function Failing at Large List Sizes - python

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?

I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Related

What Best way to find unique sublists of a given length that are present in a list?

I have built a function that finds all of the unique sublists, of length i, present in a given list.
For example if you have list=[0,1,1,0,1] and i=1, you just get [1,0]. If i=2, you get [[0,1],[1,1],[1,0]], but not [0,0] because while it is a possible combination of 1 and 0, it is not present in the given list. The code is listed below.
While the code functions, I do not believe it is the most efficient. It relies on finding all possible sublists and testing for the presence of each one, which becomes impractical at i > 4 (for say a list length of 100). I was hoping I could get help in finding a more efficient method for computing this. I am fully aware that this is probably not a great way to do this, but with what little knowledge I have its the first thing that I could come up with.
The code I have written:
def present_sublists (l,of_length):
"""
takes a given list of 1s and 0s and returns all the unique sublist of that
string that are of a certain length
"""
l_str=[str(int) for int in l] #converts entries in input to strings
l_joined="".join(l_str) #joins input into one strings, i.e. "101010"
sublist_sets=set(list(itertools.combinations(l_joined,of_length)))
#uses itertools to get all possible combintations of substrings, and then set
#properties to removes duplicates
pos_sublists=list(sublist_sets) #returns the set to a list
sublists1=[]
for entry in pos_sublists: #returns the entries to a list
sublists1.append(list(entry))
for entry in sublists1: #returns the "1"s and "0" to 1s and 0s
for entry2 in entry:
entry[entry.index(entry2)]=int(entry2)
present_sublists=[]
for entry in sublists1: #tests whether the possible sublist is
#present in the input list
for x in range(len(l) - len(entry) + 1):
if entry not in present_sublists:
if l[x: x + len(entry)] == entry:
present_sublists.append(entry)
output=present_sublists
return output

Given your code and sample, look like you want all the unique contiguous sub-sequences of the given input, if so you don't need to compute all combinations, neither shifting around between strings, list, set and back from string, let alone looping multiple times over the thing, using the slice notation is more that enough to get the desire result
>>> [0,1,2,3,4][0:2]
[0, 1]
>>> [0,1,2,3,4][1:3]
[1, 2]
>>> [0,1,2,3,4][2:4]
[2, 3]
>>> [0,1,2,3,4][3:5]
[3, 4]
>>>
An appropriate use of the indexes from the slice get us all the contiguous sub-sequences of any given size (2 in the example)
Now to make this more automatic, we make an appropriate for loop
>>> seq=[0,1,2,3,4]
>>> size=2
>>> for i in range(len(seq)-size+1):
print(seq[i:i+size])
[0, 1]
[1, 2]
[2, 3]
[3, 4]
>>>
Now that we know how to get all the sub-sequences we care about, we focus on getting only the unique ones, for that of course we use a set but a list can't be in a set, so we need something that can, so a tuple is the answer (which is basically an immutable list), and that is everything you need, lets put it all together:
>>> def sub_sequences(seq,size):
"""return a set with all the unique contiguous sub-sequences of the given size of the given input"""
seq = tuple(seq) #make it into a tuple so it can be used in a set
if size>len(seq) or size<0: #base/trivial case
return set() #or raise an exception like ValueError
return {seq[i:i+size] for i in range(len(seq)-size+1)} #a set comprehension version of the previous mentioned loop
>>> sub_sequences([0,1,2,3,4],2)
{(0, 1), (1, 2), (2, 3), (3, 4)}
>>>
>>> #now lets use your sample
>>>
>>> sub_sequences([0,1,1,0,1],2)
{(0, 1), (1, 0), (1, 1)}
>>> sub_sequences([0,1,1,0,1],3)
{(1, 0, 1), (1, 1, 0), (0, 1, 1)}
>>> sub_sequences([0,1,1,0,1],4)
{(1, 1, 0, 1), (0, 1, 1, 0)}
>>> sub_sequences([0,1,1,0,1],5)
{(0, 1, 1, 0, 1)}
>>>

Let's label the bits 0, 1, 2, 3, .....
Let's also define a function f(len, n) where f(len, n) is defined to be set of all the strings of length len that occur in the first n bits.
So
f(0, n) = {''} since you can always make the empty string
f(len, 0) = set() if len > 0
So what is the value of f(len, n) if len > 0 and n > 0? It contains everything in f(len, n - 1), plus in contains everything in f(len - 1, n - 1) with l[n-1] appended to it.
You now have everything you need to find f(of_length, len(l)) reasonably efficientlyt.

To stick to your function footprint I would suggest something like:
Iterate through each sublist and put them into a set() to ensure the uniqueness
The sublists needs to be converted to tuples since lists cannot be hashed therefore cannot be put into sets as they are
Convert the resulted tuples in the set back to the required formats.
When creating new lists, list comprehension is the most effective and pythonic way to choose.
>>> def present_sublists(l,of_length):
... sublists = set([tuple(l[i:i+of_length]) for i in range(0,len(l)+1-of_length)])
... return [list(sublist) for sublist in sublists]
...
>>> present_sublists([0,1,1,0,1], 1)
[[0], [1]]
>>> present_sublists([0,1,1,0,1], 2)
[[0, 1], [1, 0], [1, 1]]

speed up list iteration bottleneck

I have a bottleneck in a piece of code that is ruining the performance of my code. I re-wrote the section, but, after timing it, things didn't improve.
The problem is as follows. Given a list of fixed-length-lists of integers
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
I need to append the index of each sublist to a separate list as many times as its list value at a given column index. There is a separate list for each column in the data.
For instance, for the above data, there will be three resulting lists since the sub-lists have three columns.
There are 4 sublists, so we expect the numbers 0-3 to appear in each of the final lists.
We expect the following three lists to be generated from the above data
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
I have two ways of doing this:
processed_data = list([] for _ in range(len(data[0])))
for n in range(len(data)):
sub_list = data[n]
for k, proc_list in enumerate(processed_data):
for _ in range(sub_list[k]):
proc_list.append(n)
processed_data = []
for i, col in enumerate(zip(*data)):
processed_data.append([j for j,count in enumerate(col) for _ in range(count)])
The average size of the data list is around 100,000.
Is there a way I can speed this up?

You can't improve the computational complexity of your algorithm unless you're able to tweak the output format (see below). In other words, you'll at best be able to improve the speed by a modest percentage (and the percentage will be independent of the size of the input).
I don't see any obvious implementation issues. The one idea I had was to get rid of the large number of append() calls and the overhead that is incurred by gradual list expansions by preallocating the output matrix, but #juanpa.arrivillaga suggests in their comment that append() is in fact very optimized on CPython. If you're on another interpreter, you could try it: you know that the length of the output list for column c will be equal to the sum of all the input numbers in column c. So you can just preallocate each output list by using [0] * sum_of_input_values_at_column_c, and then do proc_list[i] = n instead of proc_list.append(n) (and manually increment i). This does, however, require two passes over the input, so it might not actually be an improvement - your problem is quite memory-intensive as its core computation is extremely simple.
The reason that you can't improve the computational complexity is that it is already optimal: any algorithm needs to spend time on generating its output, so the size of the output is a lower bound for how fast the algorithm can possibly be. And in your case, the size of the output is equal to the sum of the values in your input matrix (and it's generally considered bad when you depend on the input values themselves rather than on the number of input values). And that's the number of iterations that your algorithm spends, so it is optimal. However, if the output of this function is going to reside in memory to be consumed by another function (rather than being written to a file), and you are able to make some adaptations in that function, you could instead output a matrix of generators, where each generator knows that it needs to generate sub_list[k] occurrences of n. Then, the complexity of your algorithm becomes proportional to the size of the input matrix (but consuming the output will still take the same amount of time that it would have taken to generate the full output).

Perhaps itertools can make this go faster for you by minimizing the amount of python code inside loops:
data = [[1,2,3], [3,2,1], [8,1,0], [1,3,4]]
from itertools import chain,repeat,starmap
result = [ list(chain.from_iterable(starmap(repeat,r)))
for r in map(enumerate,zip(*data)) ]
print(result)
[[0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3],
[0, 0, 1, 1, 2, 3, 3, 3],
[0, 0, 0, 1, 3, 3, 3, 3]]
If you're processing the output in the same order as the result's rows come out, you can convert this to a generator and use it directly in your main process:
iResult = ( chain.from_iterable(starmap(repeat,r))
for r in map(enumerate,zip(*data)) )
for iRow in iResult: # iRow is also an iterator
for resultItem in iRow:
# Perform your item processing here
print(resultItem, end=" ")
print()
0 1 1 1 2 2 2 2 2 2 2 2 3
0 0 1 1 2 3 3 3
0 0 0 1 3 3 3 3
This will avoid creating and storing the lists of indexes altogether (i.e. bringing that bottleneck down to zero). But that's only if you process the result sequentially

Rearrange array element based on the number sequence and represent by array id

Suppose, I have multiple arrays in one array with the number from 0 to n in multiple order.
For example,
x = [[0,2,3,5],[1,4]]
Here we have two arrays in x. There could be more than two.
I want to get rearrange all the array elements based on their number sequence. However, they will represent their array ID. The result should be like this
y = [0,1,0,0,1,0]
That means 0,2,3,5 is in array id 0. So, they will show the id in their respective sequence. Same for 1 and 4. Can anyone help me to solve this? [N.B. There could be more than two arrays. So, it will be highly appreciated if the code work for different array numbers]

You can do this by using a dictionary
x = [[0,2,3,5],[1,4]]
lst = {}
for i in range(len(x)):
for j in range(len(x[i])):
lst[x[i][j]] = i
print(lst)
You can also do this by using list, list.insert(idx, value) means value is inserted to the list at the idxth index. Here, we are traversing through all the values of x and the value x[i][j] is in the i th number array.
x = [[0,2,3,5],[1,4]]
lst = []
for i in range(len(x)):
for j in range(len(x[i])):
lst.insert(x[i][j], i)
print(lst)
Output: [0, 1, 0, 0, 1, 0]

You might also consider using np.argsort for rearranging your array values and create the index-array with list comprehension:
x = [[0,2,3,5],[1,4]]
order = np.concatenate(x).argsort()
np.concatenate([ [i]*len(e) for i,e in enumerate(x) ])[order]
array([0, 1, 0, 0, 1, 0])

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!

EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

Rearranging order of elements in python list

I have the following list in python:
[(0.12156862745098039, 0.4666666666666667, 0.7058823529411765), (1.0, 0.4980392156862745, 0.054901960784313725), (0.17254901960784313, 0.6274509803921569, 0.17254901960784313), (0.8392156862745098, 0.15294117647058825, 0.1568627450980392), (0.5803921568627451, 0.403921568627451, 0.7411764705882353), (0.5490196078431373, 0.33725490196078434, 0.29411764705882354), (0.8901960784313725, 0.4666666666666667, 0.7607843137254902), (0.4980392156862745, 0.4980392156862745, 0.4980392156862745), (0.7372549019607844, 0.7411764705882353, 0.13333333333333333), (0.09019607843137255, 0.7450980392156863, 0.8117647058823529)]
It contains of multiple tuples.
How can I rearrange it so that all the elements at even number positions are moved to the end of the list? Not really sure how to approach this.

Use slicing, and specify a step value of 2 for alternate values.
So for example:
l = [0,1,2,3,4,5,6]
print(l[1::2] + l[::2])
Result is:
[1, 3, 5, 0, 2, 4, 6]
That is, all the values at odd indices followed by all the values at even indices, with the index counting from 0.

You can simply append a list containing only the even elements to a list containing only the odd elements. The even and odd elements are extracted using array slicing.
If you consider the first element to be even (because the index, 0, is even)
new = data[1::2] + data[::2]
If you consider the first element to be odd (it's at position 1 and 1 is odd), you would reverse the order
data[::2] + data[1::2]
And for an example
data = [0,1,2,3,4,5]
new = data[1::2] + data[::2]
# [1, 3, 5, 0, 2, 4]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.