Reverse lexicographical using heapq

Reverse lexicographical using heapq - python

Essentially I am looking for an efficient way to implement custom comparators using heapq.
For instance x = [('a',5),('c',3),('d',2),('e',1)]
I could heapify it heapq.heapify(x) then pop the min value heapq.heappop(x) which would return ('a', 5). How can I make it such that it returns in reverse lexicographical order ('e', 1)?
I know the convention when it comes to numbers is simply multiply the first element of the tuple by -1. Are there simple tricks like that when it comes to strings? I know I could potentially implement a map from a to z ... z to a but it sounds cumbersome.

For numbers you would do something like this:
import heapq
x = [(1, 5), (3, 3), (4, 2), (5, 1)]
x = [(-a, b) for a, b in x]
heapq.heapify(x)
result = heapq.heappop(x)
result = (-result[0], result[1])
Similarly, I would do this with letters:
import heapq
x = [('a',5), ('c',3), ('d',2), ('e',1)]
x = [(-ord(a), b) for a, b in x]
heapq.heapify(x)
result = heapq.heappop(x)
result = (chr(-result[0]), result[1])
You may want to treat similarly also the second element of each tuple

Related

I'm not able to understand this code in tuple

init_tuple = [(0, 1), (1, 2), (2, 3)]
result = sum(n for _, n in init_tuple)
print(result)
The output for this code is 6. Could someone explain how it worked?

Your code extracts each tuple and sums all values in the second position (i.e. [1]).
If you rewrite it in loops, it may be easier to understand:
init_tuple = [(0, 1), (1, 2), (2, 3)]
result = 0
for (val1, val2) in init_tuple:
result = result + val2
print(result)

The expression (n for _, n in init_tuple) is a generator expression. You can iterate on such an expression to get all the values it generates. In that case it reads as: generate the second component of each tuple of init_tuple.
(Note on _: The _ here stands for the first component of the tuple. It is common in python to use this name when you don't care about the variable it refers to (i.e., if you don't plan to use it) as it is the case here. Another way to write your generator would then be (tup[1] for tup in init_tuple))
You can iterate over a generator expression using for loop. For example:
>>> for x in (n for _, n in init_tuple):
>>> print(x)
1
2
3
And of course, since you can iterate on a generator expression, you can sum it as you have done in your code.

To get better understanding first look at this.
init_tuple = [(0, 1), (1, 2), (2, 3)]
sum = 0
for x,y in init_tuple:
sum = sum + y
print(sum)
Now, you can see that what above code does is that it calculate sum of second elements of tuple, its equivalent to your code as both does same job.
for x,y in init_tuple:
x hold first value of tuple and y hold second of tuple, in first iteration:
x = 0, y = 1,
then in second iteration:
x = 1, y = 2 and so on.
In your case you don't need first element of tuple so you just use _ instead of using variable.

Comparing items within a list with each other

If I have a list
lst = [1, 2, 3, 4, 5]
and I want to show that two items exist one of which is larger than the other by 1, can I do this without specifying which items in the list?
ie. without having to do something like:
lst[1] - lst[0] == 1
a general code that works for any int items in the lst

You can pair the numbers if the one less than the number is in the list:
new = [(i, i - 1) for i in lst if i - 1 in lst]

This one: makes set of the list for faster member checks; then short circuiting checks if i + 1 exists in that set for each i in the list (I iterate over list instead of the newly created set because it should be slightly faster). As soon as it is proven that any i + 1 also is in the list, the function exits with True return value, False otherwise.
def has_n_and_n_plus_1(lst):
lset = set(lst)
return any(i + 1 in lset for i in lst)
Testing:
>>> has_n_and_n_plus_1([6,2,7,11,42])
True
>>> has_n_and_n_plus_1([6,2,9,11,42])
False
The all tricks in 1 basket brain-teaser one:
from operator import sub
from itertools import starmap, tee
a, b = tee(sorted(lst))
next(b, None)
exists = 1 in starmap(sub, zip(b, a))
What this code does is: sort the list in increasing order; then do the pairwise iteration of a, b = lst[i], lst[i + 1], then starmaps each b, a into the sub operator resulting in b - a; and then checks with in operator if that resulting iterator contains any 1.

You could zip the list with itself shifted by one.
>>> lst = [1,2,3,4,5]
>>> zip(lst, lst[1:])
[(1, 2), (2, 3), (3, 4), (4, 5)]
This assumes that the list is ordered. If it is not, then you could sort it first and then filter it to exclude non matches (perhaps including the indexes in the original list if that is important). So if it's a more complex list of integers this should work:
>>> lst = [99,12,13,44,15,16,45,200]
>>> lst.sort()
>>> [(x,y) for (x,y) in zip(lst, lst[1:]) if x + 1 == y]
[(12, 13), (15, 16), (44, 45)]
The following is the equivalent using functions. The use of izip from itertools ensure the list is only iterated over once when we are looking for matches with the filter function:
>>> from itertools import izip
>>> lst = [99,12,13,44,15,16,45,200]
>>> lst.sort()
>>> filter(lambda (x,y): x+1==y, izip(lst, lst[1:]))
[(12, 13), (15, 16), (44, 45)]
The same could be written using for comprehensions, but personally I prefer using functions.

Removing specific tuples from List

I've got a list
a = [(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)]
If I say that:
for x in a:
if x[0]==1:
print x
I get the expected result : (1,2) (1,4) (1,8) (1,10) (1,6)
However I want to remove all the occurrences of all the tuples in the format (1,x),So
for x in a:
if x[0]==1:
a.remove(x)
I thought that all the occurences should be removed.However when i say
Print a
I get [(1,4),(2,6),(3,6),(1,6)]
Not all the tuples were removed. How do I do it.??
Thanks

I'd use list comprehension:
def removeTuplesWithOne(lst):
return [x for x in lst if x[0] != 1]
a = removeTuplesWithOne([(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)])
For me it's more pythonic than built-in filter function.
P.S. This function does not change your original list, it creates new one. If your original list is huge, i'd probably use generator expression like so:
def removeTuplesWithOne(lst):
return (x for x in lst if x[0] != 1)

This isn't the same approach as yours but should work
a = filter(lambda x: x[0] != 1, a)

You can use list comprehension like this, to filter out the items which have 1 as the first element.
>>> original = [(1, 2), (1, 4), (2, 6), (1, 8), (3, 6), (1, 10), (1, 6)]
>>> [item for item in original if item[0] != 1]
[(2, 6), (3, 6)]
This creates a new list, rather than modifying the existing one. 99% of the time, this will be fine, but if you need to modify the original list, you can do that by assigning back:
original[:] = [item for item in original if item[0] != 1]
Here we use slice assignment, which works by replacing every item from the start to the end of the original list (the [:]) with the items from the list comprehension. If you just used normal assignment, you would just change what the name original pointed to, not actually modify the list itself.

You can do it with a generator expression if you're dealing with huge amounts of data:
a = [(1,2),(1,4),(2,6),(1,8),(3,6),(1,10),(1,6)]
# create a generator
a = ((x,y) for x, y in a if x == 1)
# simply convert it to a list if you need to...
>>> print list(a)
[(1, 2), (1, 4), (1, 8), (1, 10), (1, 6)]

Adding-up selected portion of vectors in a list, and outputting the total weight vector (using python)

I have the following vectors in a list:
Q= [(0, 0.2815994630432826), (2, 0.678491614691639), (3, 0.678491614691639)]
I tried summing-up only the floating parts
(ie Q= 0.2815994630432826 + 0.678491614691639 + 0.678491614691639) as the first parts are indices which I do not want. See code below:
aba=[]
for doc in corpus_tfidf:
con = round(np.sum(doc),2)
aba.append(con)
print aba
Here is the result I got: (6.64)
My code added up the indices and then the floats. My intention was to add-up only the float and out-put the total. Any ideas?, thanks in advance.
Note:(The for loop is because Q is just one out of hundreds of documents with such vectors).

Something like this? (This requires that the elements you want to sum up are in the second position in the tuples)
>>> Q= [(0, 0.2815994630432826), (2, 0.678491614691639), (3, 0.678491614691639)]
>>> from operator import itemgetter
>>> sum(map(itemgetter(1), Q))
1.6385826924265605
Otherwise, you could just build up a list and sum that up.
>>> sum([val for _, val in Q])
1.6385826924265605

You could use,
reduce(lambda x, y: x + y, map(lambda x : x[1], Q))
for getting the sum of floating points in Q.
Map gets the second part of the tuple alone in a separate list and reduce calculates the sum of the elements of this new list.
Or simply, you could do this too
sum([x[1] for x in Q])
Or this one,
reduce(lambda x, y : (0, x[1] + y[1]), Q)[1]

I think the fastest and easier way to go for you would involve using sum function and list comprehension
>>> Q
[(0, 0.2815994630432826), (2, 0.678491614691639), (3, 0.678491614691639)]
>>> sum([i[1] for i in Q])
1.6385826924265605
>>>

checking if combination already exists from list comprehension

As part of learning Python I have set myself some challenges to see the various ways of doing things. My current challenge is to create a list of pairs using list comprehension. Part one is to make a list of pairs where (x,y) must not be the same(x not equal y) and order matters((x,y) not equal (y,x)).
return [(x,y) for x in listOfItems for y in listOfItems if not x==y]
Using my existing code is it possible to modify it so if (x,y) already exists in the list as (y,x) exclude it from the results? I know I could compare items after words, but I want to see how much control you can have with list comprehension.
I am using Python 2.7.

You should use a generator function here:
def func(listOfItems):
seen = set() #use set to keep track of already seen items, sets provide O(1) lookup
for x in listOfItems:
for y in listOfItems:
if x!=y and (y,x) not in seen:
seen.add((x,y))
yield x,y
>>> lis = [1,2,3,1,2]
>>> list(func(lis))
[(1, 2), (1, 3), (1, 2), (2, 3), (1, 2), (1, 3), (1, 2), (2, 3)]

def func(seq):
seen_pairs = set()
all_pairs = ((x,y) for x in seq for y in seq if x != y)
for x, y in all_pairs:
if ((x,y) not in seen_pairs) and ((y,x) not in seen_pairs):
yield (x,y)
seen_pairs.add((x,y))
Alternatively, you can also use generator expression (here: all_pairs) which is like list comprehension, but lazy evaluated. They are very helpful, especially when iterating over combinations, products etc.

Using product and ifilter as well as the unique_everseen recipe from itertools
>>> x = [1, 2, 3, 1, 2]
>>> x = product(x, x)
>>> x = unique_everseen(x)
>>> x = ifilter(lambda z: z[0] != z[1], x)
>>> for y in x:
... print y
...
(1, 2)
(1, 3)
(2, 1)
(2, 3)
(3, 1)
(3, 2)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reverse lexicographical using heapq - python

Related

I'm not able to understand this code in tuple

Comparing items within a list with each other

Removing specific tuples from List

Adding-up selected portion of vectors in a list, and outputting the total weight vector (using python)

checking if combination already exists from list comprehension

Categories

Resources