Comparing items within a list with each other - python

If I have a list
lst = [1, 2, 3, 4, 5]
and I want to show that two items exist one of which is larger than the other by 1, can I do this without specifying which items in the list?
ie. without having to do something like:
lst[1] - lst[0] == 1
a general code that works for any int items in the lst

You can pair the numbers if the one less than the number is in the list:
new = [(i, i - 1) for i in lst if i - 1 in lst]

This one: makes set of the list for faster member checks; then short circuiting checks if i + 1 exists in that set for each i in the list (I iterate over list instead of the newly created set because it should be slightly faster). As soon as it is proven that any i + 1 also is in the list, the function exits with True return value, False otherwise.
def has_n_and_n_plus_1(lst):
lset = set(lst)
return any(i + 1 in lset for i in lst)
Testing:
>>> has_n_and_n_plus_1([6,2,7,11,42])
True
>>> has_n_and_n_plus_1([6,2,9,11,42])
False
The all tricks in 1 basket brain-teaser one:
from operator import sub
from itertools import starmap, tee
a, b = tee(sorted(lst))
next(b, None)
exists = 1 in starmap(sub, zip(b, a))
What this code does is: sort the list in increasing order; then do the pairwise iteration of a, b = lst[i], lst[i + 1], then starmaps each b, a into the sub operator resulting in b - a; and then checks with in operator if that resulting iterator contains any 1.

You could zip the list with itself shifted by one.
>>> lst = [1,2,3,4,5]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 4), (4, 5)]
This assumes that the list is ordered. If it is not, then you could sort it first and then filter it to exclude non matches (perhaps including the indexes in the original list if that is important). So if it's a more complex list of integers this should work:
>>> lst = [99,12,13,44,15,16,45,200]
>>> lst.sort()
>>> [(x,y) for (x,y) in zip(lst, lst[1:]) if x + 1 == y]
[(12, 13), (15, 16), (44, 45)]
The following is the equivalent using functions. The use of izip from itertools ensure the list is only iterated over once when we are looking for matches with the filter function:
>>> from itertools import izip
>>> lst = [99,12,13,44,15,16,45,200]
>>> lst.sort()
>>> filter(lambda (x,y): x+1==y, izip(lst, lst[1:]))
[(12, 13), (15, 16), (44, 45)]
The same could be written using for comprehensions, but personally I prefer using functions.

Related

How do you merge two lists and return them as a tuple in a new list?

rules: If one list is shorter than the other, the last element of the shorter list should be repeated as often as necessary. If one or both lists are empty, the empty list should be returned.
merge([0, 1, 2], [5, 6, 7])
should return [(0, 5), (1, 6), (2, 7)]
merge([2, 1, 0], [5, 6])
should return [(2, 5), (1, 6), (0, 6)]
merge([ ], [2, 3])
should return []
this is what I've written so far
def merge(a, b):
mergelist = []
for pair in zip(a, b):
for item in pair :
mergelist.append(item )
return mergelist
print(merge([0, 1, 2], [5, 6]))
Thanks for asking the question.
I tried to amend your code as it is always easier to understand our own code.
Please find modifications
def merge(a, b):
mergelist = []
if not a or not b:
return []
elif len(a) > len(b):
occ = len(a)-len(b)
b.extend([b[len(b)-1] for i in range(occ)])
elif len(a) < len(b):
occ = len(b)-len(a)
a.extend([a[len(a)-1] for i in range(occ)])
for pair in zip(a, b):
mergelist.append(pair)
return mergelist
print(merge(l,l1))
You need to manually append each tuple in the return list as you need to check if the length of the second list accordingly. This is one way of solving this
def merge(l1, l2):
new = []
for i in range(len(l1)):
if i > len(l2)-1:
s2 = l2[len(l2)-1] # use the last element of second list if there are no more elements
else:
s2 = l2[i]
new.append(l1[i], s2)
return new
"""
>>> merge([0,1,2],[5,6,7])
[(0, 5), (1, 6), (2, 7)]
>>> merge([2,1,0],[5,6])
[(2, 5), (1, 6), (0, 6)]
>>> merge([],[2,3])
[]
"""
This is actually somewhat tricky.
You would think something simple like this would work:
def merge(a, b):
# use iterator to keep iterations state after zip
a, b = iter(a), iter(b)
rtrn = list(zip(a, b))
try:
taila, tailb = rtrn[-1]
except IndexError: # one or both empty
return rtrn
# only one of these loops will run, draining the longer input list
rtrn.extend((ai, tailb) for ai in a)
rtrn.extend((taila, bi) for bi in b)
return rtrn
Here the trick is to use an iterator, not an iterable. An iterator keeps its state. So after the zip, both iterators should still point at the place where zip stopped.
However, this does not work if b is the shorter list. Because then zip will have removed one value from a and will discard it. You have to be careful to avoid this.
The easiest way is to just materialize two lists and deal with the length differences explicitely.
def merge(a, b):
# ensure that we have lists, not anything else like iterators, sets, etc
a, b = list(a), list(b)
rtrn = list(zip(a, b))
try:
taila, tailb = rtrn[-1]
except IndexError: # one or both empty
return rtrn
rtrnlen = len(rtrn)
# only one of these loops will run, draining the longer input list
# You could also use itertools.zip_longest for this
rtrn.extend((ai, tailb) for ai in a[rtrnlen:])
rtrn.extend((taila, bi) for bi in b[rtrnlen:])
return rtrn
I'd use zip_longest:
from itertools import zip_longest
def merge(a, b):
return list(a and b and zip_longest(a, b, fillvalue=min(a, b, key=len)[-1]))
Same thing, different style:
def merge(a, b):
if a and b:
short = min(a, b, key=len)
return list(zip_longest(a, b, fillvalue=short[-1]))
return []
from itertools import zip_longest
def merge(a,b):
if len(a) > len(b):
return list((zip_longest(a,b,fillvalue=b[-1])))
else:
return list((zip_longest(a,b,fillvalue=a[-1])))`
for example
a = [2,3,5]
b = [1,2]
merge(a,b)
[(2, 1), (3, 2), (5, 2)]
Link to documentation for zip_longest
https://docs.python.org/3/library/itertools.html#itertools.zip_longest

Fast removal of consecutive duplicates in a list and corresponding items from another list

My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please

Converting a nested list to a nested tuple

please how can i loop through a nested list get a nested list of tuples out of it for instance loop through pot to get rslt
pot = [[1,2,3,4],[5,6,7,8]]
I tried
b = []
for i in pot:
for items in i:
b = zip(pot[0][0:],pot[0][1:])
but didnt get the desired output Thanks
Desired Result =
rslt = [[(1,2),(3,4)],[(5,6),(7,8)]]
Based on the grouper recipe in the itertools documentation, you might try something like this (assuming your sublists are the length you have indicated):
>>> def grouper(iterable, n):
args = [iter(iterable)] * n # creates a list of n references to the same iterator object (which is exhausted after one iteration)
return zip(*args)
Now you can test it out:
>>> pot = [[1,2,3,4],[5,6,7,8]]
>>> rslt = []
>>> for sublist in pot:
rslt.append(grouper(sublist, 2))
>>> rslt
[[(1, 2), (3, 4)], [(5, 6), (7, 8)]]
you can also use a list comprehension:
[[(a, b), (c, d)] for a, b, c, d in l]

Removing specific tuples from List

I've got a list
a = [(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)]
If I say that:
for x in a:
if x[0]==1:
print x
I get the expected result : (1,2) (1,4) (1,8) (1,10) (1,6)
However I want to remove all the occurrences of all the tuples in the format (1,x),So
for x in a:
if x[0]==1:
a.remove(x)
I thought that all the occurences should be removed.However when i say
Print a
I get [(1,4),(2,6),(3,6),(1,6)]
Not all the tuples were removed. How do I do it.??
Thanks
I'd use list comprehension:
def removeTuplesWithOne(lst):
return [x for x in lst if x[0] != 1]
a = removeTuplesWithOne([(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)])
For me it's more pythonic than built-in filter function.
P.S. This function does not change your original list, it creates new one. If your original list is huge, i'd probably use generator expression like so:
def removeTuplesWithOne(lst):
return (x for x in lst if x[0] != 1)
This isn't the same approach as yours but should work
a = filter(lambda x: x[0] != 1, a)
You can use list comprehension like this, to filter out the items which have 1 as the first element.
>>> original = [(1, 2), (1, 4), (2, 6), (1, 8), (3, 6), (1, 10), (1, 6)]
>>> [item for item in original if item[0] != 1]
[(2, 6), (3, 6)]
This creates a new list, rather than modifying the existing one. 99% of the time, this will be fine, but if you need to modify the original list, you can do that by assigning back:
original[:] = [item for item in original if item[0] != 1]
Here we use slice assignment, which works by replacing every item from the start to the end of the original list (the [:]) with the items from the list comprehension. If you just used normal assignment, you would just change what the name original pointed to, not actually modify the list itself.
You can do it with a generator expression if you're dealing with huge amounts of data:
a = [(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)]
# create a generator
a = ((x,y) for x, y in a if x == 1)
# simply convert it to a list if you need to...
>>> print list(a)
[(1, 2), (1, 4), (1, 8), (1, 10), (1, 6)]

Find an element in a list of tuples

I have a list 'a'
a= [(1,2),(1,4),(3,5),(5,7)]
I need to find all the tuples for a particular number. say for 1 it will be
result = [(1,2),(1,4)]
How do I do that?
If you just want the first number to match you can do it like this:
[item for item in a if item[0] == 1]
If you are just searching for tuples with 1 in them:
[item for item in a if 1 in item]
There is actually a clever way to do this that is useful for any list of tuples where the size of each tuple is 2: you can convert your list into a single dictionary.
For example,
test = [("hi", 1), ("there", 2)]
test = dict(test)
print test["hi"] # prints 1
Read up on List Comprehensions
[ (x,y) for x, y in a if x == 1 ]
Also read up up generator functions and the yield statement.
def filter_value( someList, value ):
for x, y in someList:
if x == value :
yield x,y
result= list( filter_value( a, 1 ) )
[tup for tup in a if tup[0] == 1]
for item in a:
if 1 in item:
print item
The filter function can also provide an interesting solution:
result = list(filter(lambda x: x.count(1) > 0, a))
which searches the tuples in the list a for any occurrences of 1. If the search is limited to the first element, the solution can be modified into:
result = list(filter(lambda x: x[0] == 1, a))
Or takewhile, ( addition to this, example of more values is shown ):
>>> a= [(1,2),(1,4),(3,5),(5,7),(0,2)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,a))
[(1, 2), (1, 4)]
>>>
if unsorted, like:
>>> a= [(1,2),(3,5),(1,4),(5,7)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,sorted(a,key=lambda x: x[0]==1)))
[(1, 2), (1, 4)]
>>>
Using filter function:
>>> def get_values(iterables, key_to_find):
return list(filter(lambda x:key_to_find in x, iterables))
>>> a = [(1,2),(1,4),(3,5),(5,7)]
>>> get_values(a, 1)
>>> [(1, 2), (1, 4)]
>>> [i for i in a if 1 in i]
[(1, 2), (1, 4)]
if you want to search tuple for any number which is present in tuple then you can use
a= [(1,2),(1,4),(3,5),(5,7)]
i=1
result=[]
for j in a:
if i in j:
result.append(j)
print(result)
You can also use if i==j[0] or i==j[index] if you want to search a number in particular index

Categories

Resources