Flattening nested loops / decreasing complexity - complementary pairs counting algorithm - python

I was recently trying to solve some task in Python and I have found the solution that seems to have the complexity of O(n log n), but I believe it is very inefficient for some inputs (such as first parameter being 0 and pairs being very long list of zeros).
It has also three levels of for loops. I believe it can be optimized, but at the moment I cannot optimize it more, I am probably just missing something obvious ;)
So, basically, the problem is as follows:
Given list of integers (values), the function needs to return the number of indexes' pairs that meet the following criteria:
lets assume single index pair is a tuple like (index1, index2),
then values[index1] == complementary_diff - values[index2] is true,
Example:
If given a list like [1, 3, -4, 0, -3, 5] as values and 1 as complementary_diff, the function should return 4 (which is the length of the following list of indexes' pairs: [(0, 3), (2, 5), (3, 0), (5, 2)]).
This is what I have so far, it should work perfectly most of the time, but - as I said - in some cases it could run very slowly, despite the approximation of its complexity O(n log n) (it looks like pessimistic complexity is O(n^2)).
def complementary_pairs_number (complementary_diff, values):
value_key = {} # dictionary storing indexes indexed by values
for index, item in enumerate(values):
try:
value_key[item].append(index)
except (KeyError,): # the item has not been found in value_key's keys
value_key[item] = [index]
key_pairs = set() # key pairs are unique by nature
for pos_value in value_key: # iterate through keys of value_key dictionary
sym_value = complementary_diff - pos_value
if sym_value in value_key: # checks if the symmetric value has been found
for i1 in value_key[pos_value]: # iterate through pos_values' indexes
for i2 in value_key[sym_value]: # as above, through sym_values
# add indexes' pairs or ignore if already added to the set
key_pairs.add((i1, i2))
key_pairs.add((i2, i1))
return len(key_pairs)
For the given example it behaves like that:
>>> complementary_pairs_number(1, [1, 3, -4, 0, -3, 5])
4
If you see how the code could be "flattened" or "simplified", please let me know.
I am not sure if just checking for complementary_diff == 0 etc. is the best approach - if you think it is, please let me know.
EDIT: I have corrected the example (thanks, unutbu!).

I think this improves the complexity to O(n):
value_key.setdefault(item,[]).append(index) is faster than using
the try..except blocks. It is also faster than using a collections.defaultdict(list). (I tested this with ipython %timeit.)
The original code visits every solution twice. For each pos_value
in value_key, there is a unique sym_value associated with
pos_value. There are solutions when sym_value is also in
value_key. But when we iterate over the keys in value_key,
pos_value is eventually assigned to the value of sym_value, which
make the code repeat the calculation it has already done. So you can
cut the work in half if you can stop pos_value from equaling the
old sym_value. I implemented that with a seen = set() to keep
track of seen sym_values.
The code only cares about len(key_pairs), not the key_pairs themselves. So instead of keeping track of the pairs (with a
set), we can simply keep track of the count (with num_pairs). So we can replace the two inner for-loops with
num_pairs += 2*len(value_key[pos_value])*len(value_key[sym_value])
or half that in the "unique diagonal" case, pos_value == sym_value.
def complementary_pairs_number(complementary_diff, values):
value_key = {} # dictionary storing indexes indexed by values
for index, item in enumerate(values):
value_key.setdefault(item,[]).append(index)
# print(value_key)
num_pairs = 0
seen = set()
for pos_value in value_key:
if pos_value in seen: continue
sym_value = complementary_diff - pos_value
seen.add(sym_value)
if sym_value in value_key:
# print(pos_value, sym_value, value_key[pos_value],value_key[sym_value])
n = len(value_key[pos_value])*len(value_key[sym_value])
if pos_value == sym_value:
num_pairs += n
else:
num_pairs += 2*n
return num_pairs

You may want to look into functional programming idioms, such as reduce, etc.
Often times, nested array logic can be simplified by using functions like reduce, map, reject, etc.
For an example (in javascript) check out underscore js. I'm not terribly smart at Python, so I don't know which libraries they have available.

I think (some or all of) these would help, but I'm not sure how I would prove it yet.
1) Take values and reduce it to a distinct set of values, recording the count of each element (O(n))
2) Sort the resulting array.
(n log n)
3) If you can allocate lots of memory, I guess you might be able to populate a sparse array with the values - so if the range of values is -100 : +100, allocate an array of [201] and any value that exists in the reduced set pops a one at the value index in the large sparse array.
4) Any value that you want to check if it meets your condition now has to look at the index in the sparse array according to the x - y relationship and see if a value exists there.
5) as unutbu pointed out, it's trivially symmetric, so if {a,b} is a pair, so is {b,a}.

I think you can improve this by separating out the algebra part from the search and using smarter data structures.
Go through the list and subtract from the complementary diff for each item in the list.
resultlist[index] = complementary_diff - originallist[index]
You can use either a map or a simple loop. -> Takes of O(n) time.
See if the number in the resulting list exists in the original list.
Here, with a naive list, you would actually get O(n^2), because you can end up searching for the whole original list per item in the resulting list.
However, there are smarter ways to organize your data than this. If you have the original list sorted, your search time reduces to O(nlogn + nlogn) = O(nlogn), nlogn for the sort, and nlogn for the binary search per element.
If you wanted to be even smarter you can make your list in to a dictionary(or hash table) and then this step becomes O(n + n) = O(n), n to build the dictionary and 1 * n to search each element in the dictionary. (*EDIT: * Since you cannot assume uniqueness of each value in the original list. You might want to keep count of how many times each value appears in the original list.)
So with this now you get O(n) total runtime.
Using your example:
1, [1, 3, -4, 0, -3, 5],
Generate the result list:
>>> resultlist
[0, -2, 5, 1, 4, -4].
Now we search:
Flatten out the original list into a dictionary. I chose to use the original list's index as the value as that seems like a side data you're interested in.
>>> original_table
{(1,0), (3,1), (-4,2), (0,3), (-3,4), (5,5)}
For each element in the result list, search in the hash table and make the tuple:
(resultlist_index, original_table[resultlist[resultlist_index]])
This should look like the example solution you had.
Now you just find the length of the resulting list of tuples.
Now here's the code:
example_diff = 1
example_values = [1, 3, -4, 0, -3, 5]
example2_diff = 1
example2_values = [1, 0, 1]
def complementary_pairs_number(complementary_diff, values):
"""
Given an integer complement and a list of values count how many pairs
of complementary pairs there are in the list.
"""
print "Input:", complementary_diff, values
# Step 1. Result list
resultlist = [complementary_diff - value for value in values]
print "Result List:", resultlist
# Step 2. Flatten into dictionary
original_table = {}
for original_index in xrange(len(values)):
if values[original_index] in original_table:
original_table[values[original_index]].append(original_index)
else:
original_table[values[original_index]] = [original_index]
print "Flattened dictionary:", original_table
# Step 2.5 Search through dictionary and count up the resulting pairs.
pair_count = 0
for resultlist_index in xrange(len(resultlist)):
if resultlist[resultlist_index] in original_table:
pair_count += len(original_table[resultlist[resultlist_index]])
print "Complementary Pair Count:", pair_count
# (Optional) Step 2.5 Search through dictionary and create complementary pairs. Adds O(n^2) complexity.
pairs = []
for resultlist_index in xrange(len(resultlist)):
if resultlist[resultlist_index] in original_table:
pairs += [(resultlist_index, original_index) for original_index in
original_table[resultlist[resultlist_index]]]
print "Complementary Pair Indices:", pairs
# Step 3
return pair_count
if __name__ == "__main__":
complementary_pairs_number(example_diff, example_values)
complementary_pairs_number(example2_diff, example2_values)
Output:
$ python complementary.py
Input: 1 [1, 3, -4, 0, -3, 5]
Result List: [0, -2, 5, 1, 4, -4]
Flattened dictionary: {0: 3, 1: 0, 3: 1, 5: 5, -4: 2, -3: 4}
Complementary Pair Indices: [(0, 3), (2, 5), (3, 0), (5, 2)]
Input: 1 [1, 0, 1]
Result List: [0, 1, 0]
Flattened dictionary: {0: [1], 1: [0, 2]}
Complementary Pair Count: 4
Complementary Pair Indices: [(0, 1), (1, 0), (1, 2), (2, 1)]
Thanks!

Modified the solution provided by #unutbu:
The problem can be reduced to comparing these 2 dictionaries:
values
pre-computed dictionary for (complementary_diff - values[i])
def complementary_pairs_number(complementary_diff, values):
value_key = {} # dictionary storing indexes indexed by values
for index, item in enumerate(values):
value_key.setdefault(item,[]).append(index)
answer_key = {} # dictionary storing indexes indexed by (complementary_diff - values)
for index, item in enumerate(values):
answer_key.setdefault((complementary_diff-item),[]).append(index)
num_pairs = 0
print(value_key)
print(answer_key)
for pos_value in value_key:
if pos_value in answer_key:
num_pairs+=len(value_key[pos_value])*len(answer_key[pos_value])
return num_pairs

Related

Finding the sum of each level in a list with nested lists

I need to create a python function that takes a list of numbers (and possibly lists) and returns a list of both the nested level and the sum of that level. For example:
given a list [1,4,[3,[100]],3,2,[1,[101,1000],5],1,[7,9]] I need to count the values of all the integers at level 0 and sum them together, then count the integers at level 1 and sum them together, and so on, until I have found the sum of the deepest nested level.
The return output for the example list mentioned above should be:
[[0,11], [1,25], [2,1201]]
where the first value in each of the lists is the level, and the second value is the sum. I am supposed to use recursion or a while loop, without importing any modules.
My original idea was to create a loop that goes through the lists and finds any integers (ignoring nested lists), calculate the sum, then remove those integers from the list, turn the next highest level into integers, and repeat. However, I could not find a way to convert a list inside of a list into indivual integer values (essentially removing the 0th level and turning the 1st level into the new 0th level).
The code that I am working with now is as follows:
def sl(lst,p=0):
temp = []
lvl = 0
while lst:
if type(lst[0]) == int:
temp.append(lst[0])
lst = lst[1:]
return [lvl,sum(temp)]
elif type(lst[0]) == list:
lvl += 1
return [lvl,sl(lst[1:],p=0)]
Basically, I created a while loop to iterate through, find any integers, and append it to a temp list where I could then find the sum. But, I cannot find a way to make the loop access the next level to do the same, especially when the original list is going up and down in levels from left to right.
I would do it like this:
array = [1, 4, [3, [100]], 3, 2, [1, [101, 1000], 5], 1, [7, 9]]
sum_by_level = []
while array:
numbers = (element for element in array if not isinstance(element, list))
sum_by_level.append(sum(numbers))
array = [element for list_element in array if isinstance(list_element, list) for element in list_element]
print(sum_by_level)
print(list(enumerate(sum_by_level)))
Gives the output:
[11, 25, 1201]
[(0, 11), (1, 25), (2, 1201)]
So I sum up the non-list-elements and then take the list-elements and strip of the outer lists. I repeat this until the array is empty which means all levels where stripped off. I discarded to directly saving the level-information as it is just the index, but if you need that you can use enumerate for that (gives tuples though instead of lists).

What Best way to find unique sublists of a given length that are present in a list?

I have built a function that finds all of the unique sublists, of length i, present in a given list.
For example if you have list=[0,1,1,0,1] and i=1, you just get [1,0]. If i=2, you get [[0,1],[1,1],[1,0]], but not [0,0] because while it is a possible combination of 1 and 0, it is not present in the given list. The code is listed below.
While the code functions, I do not believe it is the most efficient. It relies on finding all possible sublists and testing for the presence of each one, which becomes impractical at i > 4 (for say a list length of 100). I was hoping I could get help in finding a more efficient method for computing this. I am fully aware that this is probably not a great way to do this, but with what little knowledge I have its the first thing that I could come up with.
The code I have written:
def present_sublists (l,of_length):
"""
takes a given list of 1s and 0s and returns all the unique sublist of that
string that are of a certain length
"""
l_str=[str(int) for int in l] #converts entries in input to strings
l_joined="".join(l_str) #joins input into one strings, i.e. "101010"
sublist_sets=set(list(itertools.combinations(l_joined,of_length)))
#uses itertools to get all possible combintations of substrings, and then set
#properties to removes duplicates
pos_sublists=list(sublist_sets) #returns the set to a list
sublists1=[]
for entry in pos_sublists: #returns the entries to a list
sublists1.append(list(entry))
for entry in sublists1: #returns the "1"s and "0" to 1s and 0s
for entry2 in entry:
entry[entry.index(entry2)]=int(entry2)
present_sublists=[]
for entry in sublists1: #tests whether the possible sublist is
#present in the input list
for x in range(len(l) - len(entry) + 1):
if entry not in present_sublists:
if l[x: x + len(entry)] == entry:
present_sublists.append(entry)
output=present_sublists
return output
Given your code and sample, look like you want all the unique contiguous sub-sequences of the given input, if so you don't need to compute all combinations, neither shifting around between strings, list, set and back from string, let alone looping multiple times over the thing, using the slice notation is more that enough to get the desire result
>>> [0,1,2,3,4][0:2]
[0, 1]
>>> [0,1,2,3,4][1:3]
[1, 2]
>>> [0,1,2,3,4][2:4]
[2, 3]
>>> [0,1,2,3,4][3:5]
[3, 4]
>>>
An appropriate use of the indexes from the slice get us all the contiguous sub-sequences of any given size (2 in the example)
Now to make this more automatic, we make an appropriate for loop
>>> seq=[0,1,2,3,4]
>>> size=2
>>> for i in range(len(seq)-size+1):
print(seq[i:i+size])
[0, 1]
[1, 2]
[2, 3]
[3, 4]
>>>
Now that we know how to get all the sub-sequences we care about, we focus on getting only the unique ones, for that of course we use a set but a list can't be in a set, so we need something that can, so a tuple is the answer (which is basically an immutable list), and that is everything you need, lets put it all together:
>>> def sub_sequences(seq,size):
"""return a set with all the unique contiguous sub-sequences of the given size of the given input"""
seq = tuple(seq) #make it into a tuple so it can be used in a set
if size>len(seq) or size<0: #base/trivial case
return set() #or raise an exception like ValueError
return {seq[i:i+size] for i in range(len(seq)-size+1)} #a set comprehension version of the previous mentioned loop
>>> sub_sequences([0,1,2,3,4],2)
{(0, 1), (1, 2), (2, 3), (3, 4)}
>>>
>>> #now lets use your sample
>>>
>>> sub_sequences([0,1,1,0,1],2)
{(0, 1), (1, 0), (1, 1)}
>>> sub_sequences([0,1,1,0,1],3)
{(1, 0, 1), (1, 1, 0), (0, 1, 1)}
>>> sub_sequences([0,1,1,0,1],4)
{(1, 1, 0, 1), (0, 1, 1, 0)}
>>> sub_sequences([0,1,1,0,1],5)
{(0, 1, 1, 0, 1)}
>>>
Let's label the bits 0, 1, 2, 3, .....
Let's also define a function f(len, n) where f(len, n) is defined to be set of all the strings of length len that occur in the first n bits.
So
f(0, n) = {''} since you can always make the empty string
f(len, 0) = set() if len > 0
So what is the value of f(len, n) if len > 0 and n > 0? It contains everything in f(len, n - 1), plus in contains everything in f(len - 1, n - 1) with l[n-1] appended to it.
You now have everything you need to find f(of_length, len(l)) reasonably efficientlyt.
To stick to your function footprint I would suggest something like:
Iterate through each sublist and put them into a set() to ensure the uniqueness
The sublists needs to be converted to tuples since lists cannot be hashed therefore cannot be put into sets as they are
Convert the resulted tuples in the set back to the required formats.
When creating new lists, list comprehension is the most effective and pythonic way to choose.
>>> def present_sublists(l,of_length):
... sublists = set([tuple(l[i:i+of_length]) for i in range(0,len(l)+1-of_length)])
... return [list(sublist) for sublist in sublists]
...
>>> present_sublists([0,1,1,0,1], 1)
[[0], [1]]
>>> present_sublists([0,1,1,0,1], 2)
[[0, 1], [1, 0], [1, 1]]

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?
I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Accessing grouped items in arrays

I'm new to Python and have a list of numbers. e.g.
5,10,32,35,64,76,23,53....
and I've grouped them into fours (5,10,32,35, 64,76,23,53 etc..) using the code from this post.
def group_iter(iterator, n=2, strict=False):
""" Transforms a sequence of values into a sequence of n-tuples.
e.g. [1, 2, 3, 4, ...] => [(1, 2), (3, 4), ...] (when n == 2)
If strict, then it will raise ValueError if there is a group of fewer
than n items at the end of the sequence. """
accumulator = []
for item in iterator:
accumulator.append(item)
if len(accumulator) == n: # tested as fast as separate counter
yield tuple(accumulator)
accumulator = [] # tested faster than accumulator[:] = []
# and tested as fast as re-using one list object
if strict and len(accumulator) != 0:
raise ValueError("Leftover values")
How can I access the individual arrays so that I can perform functions on them. For example, I'd like to get the average of the first values of every group (e.g. 5 and 64 in my example numbers).
Let's say you have the following tuple of tuples:
a=((5,10,32,35), (64,76,23,53))
To access the first element of each tuple, use a for-loop:
for i in a:
print i[0]
To calculate average for the first values:
elements=[i[0] for i in a]
avg=sum(elements)/float(len(elements))
Ok, this is yielding a tuple of four numbers each time it's iterated. So, convert the whole thing to a list:
L = list(group_iter(your_list, n=4))
Then you'll have a list of tuples:
>>> L
[(5, 10, 32, 35), (64, 76, 23, 53), ...]
You can get the first item in each tuple this way:
firsts = [tup[0] for tup in L]
(There are other ways, of course.)
You've created a tuple of tuples, or a list of tuples, or a list of lists, or a tuple of lists, or whatever...
You can access any element of any nested list directly:
toplist[x][y] # yields the yth element of the xth nested list
You can also access the nested structures by iterating over the top structure:
for list in lists:
print list[y]
Might be overkill for your application but you should check out my library, pandas. Stuff like this is pretty simple with the GroupBy functionality:
http://pandas.sourceforge.net/groupby.html
To do the 4-at-a-time thing you would need to compute a bucketing array:
import numpy as np
bucket_size = 4
n = len(your_list)
buckets = np.arange(n) // bucket_size
Then it's as simple as:
data.groupby(buckets).mean()

Python: Fast extraction of intersections among all possible 2-combinations in a large number of lists

I have a dataset of ca. 9K lists of variable length (1 to 100K elements). I need to calculate the length of the intersection of all possible 2-list combinations in this dataset. Note that elements in each list are unique so they can be stored as sets in python.
What is the most efficient way to perform this in python?
Edit I forgot to specify that I need to have the ability to match the intersection values to the corresponding pair of lists. Thanks everybody for the prompt response and apologies for the confusion!
If your sets are stored in s, for example:
s = [set([1, 2]), set([1, 3]), set([1, 2, 3]), set([2, 4])]
Then you can use itertools.combinations to take them two by two, and calculate the intersection (note that, as Alex pointed out, combinations is only available since version 2.6). Here with a list comrehension (just for the sake of the example):
from itertools import combinations
[ i[0] & i[1] for i in combinations(s,2) ]
Or, in a loop, which is probably what you need:
for i in combinations(s, 2):
inter = i[0] & i[1]
# processes the intersection set result "inter"
So, to have the length of each one of them, that "processing" would be:
l = len(inter)
This would be quite efficient, since it's using iterators to compute every combinations, and does not prepare all of them in advance.
Edit: Note that with this method, each set in the list "s" can actually be something else that returns a set, like a generator. The list itself could simply be a generator if you are short on memory. It could be much slower though, depending on how you generate these elements, but you wouldn't need to have the whole list of sets in memory at the same time (not that it should be a problem in your case).
For example, if each set is made from a function gen:
def gen(parameter):
while more_sets():
# ... some code to generate the next set 'x'
yield x
with open("results", "wt") as f_results:
for i in combinations(gen("data"), 2):
inter = i[0] & i[1]
f_results.write("%d\n" % len(inter))
Edit 2: How to collect indices (following redrat's comment).
Besides the quick solution I answered in comment, a more efficient way to collect the set indices would be to have a list of (index, set) instead of a list of set.
Example with new format:
s = [(0, set([1, 2])), (1, set([1, 3])), (2, set([1, 2, 3]))]
If you are building this list to calculate the combinations anyway, it should be simple to adapt to your new requirements. The main loop becomes:
with open("results", "wt") as f_results:
for i in combinations(s, 2):
inter = i[0][1] & i[1][1]
f_results.write("length of %d & %d: %d\n" % (i[0][0],i[1][0],len(inter))
In the loop, i[0] and i[1] would be a tuple (index, set), so i[0][1] is the first set, i[0][0] its index.
As you need to produce a (N by N/2) matrix of results, i.e., O(N squared) outputs, no approach can be less than O(N squared) -- in any language, of course. (N is "about 9K" in your question). So, I see nothing intrinsically faster than (a) making the N sets you need, and (b) iterating over them to produce the output -- i.e., the simplest approach. IOW:
def lotsofintersections(manylists):
manysets = [set(x) for x in manylists]
moresets = list(manysets)
for s in reversed(manysets):
moresets.pop()
for z in moresets:
yield s & z
This code's already trying to add some minor optimization (e.g. by avoiding slicing or popping off the front of lists, which might add other O(N squared) factors).
If you have many cores and/or nodes available and are looking for parallel algorithms, it's a different case of course -- if that's your case, can you mention the kind of cluster you have, its size, how nodes and cores can best communicate, and so forth?
Edit: as the OP has casually mentioned in a comment (!) that they actually need the numbers of the sets being intersected (really, why omit such crucial parts of the specs?! at least edit the question to clarify them...), this would only require changing this to:
L = len(manysets)
for i, s in enumerate(reversed(manysets)):
moresets.pop()
for j, z in enumerate(moresets):
yield L - i, j + 1, s & z
(if you need to "count from 1" for the progressive identifiers -- otherwise obvious change).
But if that's part of the specs you might as well use simpler code -- forget moresets, and:
L = len(manysets)
for i xrange(L):
s = manysets[i]
for j in range(i+1, L):
yield i, j, s & manysets[z]
this time assuming you want to "count from 0" instead, just for variety;-)
Try this:
_lists = [[1, 2, 3, 7], [1, 3], [1, 2, 3], [1, 3, 4, 7]]
_sets = map( set, _lists )
_intersection = reduce( set.intersection, _sets )
And to obtain the indexes:
_idxs = [ map(_i.index, _intersection ) for _i in _lists ]
Cheers,
José María García
PS: Sorry I misunderstood the question

Categories

Resources