I have two lists of over 100,000 tuples in each. The first tuple list has two strings in it, the latter has five. Each tuple within the first list has a tuple with a common value in the other list. For example
tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')]
I have a function that is able to remove redundant tuple values and match on the information that is important:
def match_comment_and_thread_data(tuple1, tuple2):
i = 0
out_thread_tuples = [(b, c, d, e) for a, b, c, d, e in tuple2]
print('Out Thread Tuples Done')
final_list = [x + y for x in tuple2 for y in tuple1 if x[0] == y[0]]
return final_list
which ought to return:
final_list = [('a','1','222','###','HHH'), ('b','2','QWER','TY','GFD'), ('c','3','dsfs','sfs','sfs')]
However, the lists are insanely long. Is there any way to get around the computational time commitment of for loops when comparing and matching tuple values?
By using dictionary, this can be done in O(n)
dict1 = dict(tuple1)
final_list = [(tup[1],dict[tup[1]])+ tup[1:] for tup in tuple2]
tuple1 = [('a','1'), ('b','2'), ('c','3')]
tuple2 = [('$$$','a','222','###','HHH'), ('ASA','b','QWER','TY','GFD'), ('aS','3','dsfs','sfs','sfs')]
def match_comment_and_thread_data(tuple1, tuple2):
i = 0
out_thread_dict = dict([(b, (c, d, e)) for a, b, c, d, e in tuple2])
final_list = [x + out_thread_dict.get(x[0],out_thread_dict.get(x[1])) for x in tuple1]
return final_list
by using a dictionary instead your lookup time is O(1) ... you still have to visit each item in list1 ... but the match is fast... although you need alot more values than 3 to get the benefits
Related
I have a string from user. It must containt a comma to split by it and assign to variables. But what if user miss a comma? Surely I can check len of splitted string in if-else branches, but maybe there is another way, I mean assignment during list has a values. For example
a, b, c, d, e = list(range(3)) # 'a' and 'b' are None or not exists
You could do something like this:
>>> alist=list(range(3))
>>> alist
[0, 1, 2]
>>> a,b,c,*d=alist
>>> a,b,c
(0, 1, 2)
>>> d
[]
If there are no more elements, d is an empty list. It uses the unpacking operator *. Not the best possible solution for large lists, so I would still define a function for that. For small cases, it works well. (You could assume that is d==[], there are no more elements in alist)For example, you could add:
return False if not d else return True
You are free to extend the list with None values before extraction.
li = list(range(3))
expected_size = 5
missing_size = expected_size - len(li)
none_li = [None] * missing_size
new_li = li + none_li
a, b, c, d, e = new_li
I want to check if all lists in a list of lists are equal.
One example for which I succeeded is a lists of two lists l2 for which
all([a == b for a, b in zip(*l2)])
correctly returns True if l2 = [[1,2],[1,2]] and Falsewhen l2 = [[1,2],[1,666]].
I expected to be able to directly use this code in the case in which the list of lists l has more lists in it by using the same code, but it seems to not work.
For example, when
l=[[1,2],[1,2],[1,2]]
all([a == b for a, b in zip(*l)])
returns the following error:
ValueError: too many values to unpack (expected 2)
I do not understand why this is the case as the result of zip(*l) looks like it should work:
list(zip(*l))
>> [(1, 1, 1), (2, 2, 2)]
Using the observation a == b and a == c imply a == c. You should test the first list with the other lists.
def equalLists(lists):
return not lists or all(lists[0] == b for b in lists[1:])
>>> equalLists([])
True
>>> equalLists([1,2],[1,2])
True
>>> equalLists([1,2],[1,2],[1,2])
True
>>> equalLists([1,2],[1,2],[1,3])
False
You could create a set of tuples and check the size is 1, otherwise they are not all the same:
len(set(tuple(elem) for elem in l)) == 1
This will work for a list of lists of any length. It will also be more efficient than linear time comparisons.
(You have to convert to a tuple first because a list is not hashable and a set requires its members to be hashable.)
Your method (and the other answers here) don't consider that if the lists' lengths vary, zip will shorten them to the length of the shortest:
all(a == b for a,b in zip([1,2], [1,2,3]))
>>> True
Firstly note that it's not necessary to construct a list in all like all([...]) as this adds an extra iteration after list creation, whereas as I've done above uses a generator which evaluates as it goes along.
If each list has hashable elements, I'd exploit set to calculate the distinct elements and check there's only 1:
len(set(tuple(x) for x in l)) == 1
If the elements aren't hashable, but do have the equals method defined on them (unlike your examples, since int is hashable) I'd compare each list to the first, possibly using a generator if you want to avoid comparing the first to itself:
li = iter(l)
first = next(li)
all(x == first for x in li)
This still makes use of python's built-in list equals method and won't do more comparisons than any zip methods in the case that all the lists are equal.
The only case where the above is inefficient is if you have a list of long lists, where most but not all are equal. In that case it's possible a zip method would be quicker:
from itertools import zip_longest
all(len(set(x)) == 1 for x in zip_longest(*l))
Here I used zip_longest for the case the list lengths are unequal. If you knw the lengths are equal you can use zip. By default it fills values with None from the shorter lists once they 'run out' in the iterator, so only use this if your lists have no legitimate Nones! (In that case you can set zip_longest(..., fillvalue="<something not in the lists>").
Equivalent for non-hashable list elements (with equals method):
all(all(i == x[0] for i in x[1:]) for x in zip_longest(*l))
l=[[1,2],[1,2],[1,2]]
all([a == b == c for a, b, c in zip(*l)])
You're telling zip() there's 2 values to unpack but zip has 3 lists to unpack.
You have three item after zip (1,1,1)
Use
l=[[1,2],[1,2],[1,2]]
all([a == b for a, b, c in zip(*l)])
or
l=[[1,2],[1,2],[1,2]]
all([a == b for a, b, *_ in zip(*l)])
# OR all([a == b for a, b, _ in zip(*l)])
# OR all([i[0] == i[1] for i in zip(*l)])
Edit as per comment.
To test all sub elements are equal
all([len(set(i)) == 1 for i in zip(*l)])
I have two list . i want to compare with each other with the list index[1][2][3] of "a" of each list with other list index[1][2][3] of "b" .If its a match then ignore , if not then return the whole list.
a = [['Eth1/1/13', 'Marketing', 'connected', '10', 'full', 'a-1000'], ['Eth1/1/14', 'NETFLOW02', 'connected', '10', 'full', '100']]
b = [['Eth1/1/13', 'NETFLOW02', 'connected', '15', 'full', '100'], ['Eth1/1/14', 'Marketing', 'connected', '10', 'full', 'a-1000']]
Desired Output :
Diff a:
Eth1/1/14 NETFLOW02 connected 10 full 100
Diff b:
Eth1/1/13 NETFLOW02 connected 15 full 100
What i am trying :
p = [i for i in a if i not in b]
for item in p:
print item[0]
print "\n++++++++++++++++++++++++++++++\n"
q = [i for i in b if i not in a]
for item in q:
print item[0]
tried below but only managed to match index 1 of inner list , index 2 and 3 still need to be matched..
[o for o in a if o[1] not in [n[1] for n in b]
I am not getting the expected output.Any idea how to do this ?
for sublista in a:
if not any(sublista[1:4] == sublistb[1:4] for sublistb in b):
print(sublista)
You need an inner loop so that each sub-list from list a can be compared to each sub-list in list b. The inner loop is accomplished with a generator expression. Slices are used to to compare only a portion of the sub-lists. The built-in function any consumes the generator expression; it is lazy and will return True with the first True equivalency comparison. This will print each sub-list in a that does not have a match in b - to print each sub-list in b that does not have a match in a, put b in the outer loop and a in the inner loop.
Here is an equivalent Without using a generator expression or any:
for sublista in a:
equal = False
for sublistb in b:
if sublista[1:4] == sublistb[1:4]:
break
else:
print(sublista)
Sometimes it is nice to use operator.itemgetter so you can use names for the slices which can make the code more intelligible.:
import operator
good_stuff = operator.itemgetter(1,2,3)
for sublista in a:
if not any(good_stuff(sublista) == good_stuff(sublistb) for sublistb in b):
print(sublista)
itertools.product conveniently generates pairs and can be used as a substitute for the nested loops above. The following uses a dictionary (defaultdict) to hold comparison results for each sublist in a and b, then checks to see if there were matches - it does both the a to b and b to a comparisons.
import itertools, collections
pairs = itertools.product(a, b)
results = collections.defaultdict(list)
for sub_one, sub_two in pairs:
comparison = good_stuff(sub_one) == good_stuff(sub_two)
results[tuple(sub_one)].append(comparison)
results[tuple(sub_two)].append(comparison)
for sublist, comparisons in results.items():
if any(comparisons):
continue
print(sublist)
# or
from pprint import pprint
results = [sublist for sublist, comparisons in results.items() if not any(comparisons)]
pprint(results)
for v in a,b:
for items in v:
if 'NETFLOW02' in items:
print('\t'.join(items))
I'm not sure this is ok for your purpose but you seems to want to capture the results of a network interface called NETFLOW02 from these two lists.
I'm sure there's probably a reason this is unacceptable but you could also expand this to include other keywords in longer lists, well, any length of lists that are nested as far as explained in your question. To do this, you would need to create another list, hypothetically keywords = ['NETFLOW02','ETH01']
Then we simply iterate this list also.
results = []
for v in a,b:
for item in v:
for kw in keywords:
if kw in item:
results.append(item)
print('\t'.join(item))
print(results)
I stack with the following problem, I need to finding maximum between equal positions between lists. Map function works pretty well, but how to make it work for the list of the lists? using map(max,d) gave the max of the every list. The problem is that the number of the lists in the list is variable. Any suggestions are welcome!
Input for the problem is d not an a,b,c, d - is a list of the lists, and the comparison is pairwise per position in the list.
a = [0,1,2,6]
b = [5,1,0,7]
c = [3,8,0,8]
map(max,a,b,c)
# [5,8,2,8]
d = [a,b,c]
map(max,d)
[6,7,8]
a = [0,1,2,6]
b = [5,1,0,7]
c = [3,8,0,8]
print [max(itm) for itm in zip(a, b, c)]
or even shorter:
print map(max, zip(a, b, c))
How about this:
max(map(max,d))
After entering a command I am given data, that I then transform into a list. Once transformed into a list, how do I copy ALL of the data from that list [A], and save it - so when I enter a command and am given a second list of data [B], I can compare the two; and have data that is the same from the two lists cancel out - so what is not similar between [A] & [B] is output. For example...
List [A]
1
2
3
List [B]
1
2
3
4
Using Python, I now want to compare the two lists to each other, and then output the differences.
Output = 4
Hopefully this makes sense!
You can use set operations.
a = [1,2,3]
b = [1,2,3,4]
print set(b) - set(a)
to output the data in list format you can use the following print statement
print list(set(b) - set(a))
>>> b=[1,2,3,4]
>>> a=[1,2,3]
>>> [x for x in b if x not in a]
[4]
for element in b:
if element in a:
a.remove(element)
This answer will return a list not a set, and should take duplicates into account. That way [1,2,1] - [1,2] returns [1] not [].
Try itertools.izip_longest
import itertools
a = [1,2,3]
b = [1,2,3,4]
[y for x, y in itertools.izip_longest(a, b) if x != y]
# [4]
You could easily modify this further to return a duple for each difference, where the first item in the duple is the position in b and the second item is the value.
[(i, pair[1]) for i, pair in enumerate(itertools.izip_longest(a, b)) if pair[0] != pair[1]]
# [(3, 4)]
For entering the data use a loop:
def enterList():
result = []
while True:
value = raw_input()
if value:
result.append(value)
else:
return result
A = enterList()
B = enterList()
For comparing you can use zip to build pairs and compare each of them:
for a, b in zip(A, B):
if a != b:
print a, "!=", b
This will truncate the comparison at the length of the shorter list; use the solution in another answer given here using itertools.izip_longest() to handle that.