Returing lists of tuple's keys and values - python

I can understand zip() function is used to construct a list of tuples like this:
x = ['a', 'b', 'c']
y = ['x', 'y', 'z', 'l']
lstTupA = zip(x,y)
lstTupA would be [('a', 'x'), ('b', 'y'), ('c', 'z')].
lstA, lstB = zip(*lstTupA)
The above operation extracts the keys in the list of tuples to lstA and values in the list of tuples to lstB.
lstA was ('a', 'b', 'c') and lstB was ('x', 'y', 'z').
My query is this: Why are lstA and lstB tuples instead of lists? a, b and c are homogeneous and so are x, y and z. It's not logical to group them as tuples, is it?
Ideally lstA, lstB = zip(*lstTupA) should have assigned ['a', 'b', 'c'] to lstA and ['x', 'y', 'z'] to lstB (lists) right?
Some one please clarify!
Thanks.

"It's not logical to group them as tuples, is it?"
Yes. It is logical.
There are two kinds of built-in sequences. Lists and tuples.
The zip() function has n arguments, that defines the cardinality of the tuple to be fixed at n.
A list would only be appropriate if other arguments were somehow, magically, appended or not appended to the resulting sequence. This would mean sequences of variable length, not defined by the number of arguments to zip(). That would be a rather complex structure to build with a single function call.

zip is simply defined to behave this way:
In [2]: help(zip)
Help on built-in function zip in module __builtin__:
zip(...)
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
--> Return a list of tuples <--, where each tuple contains the i-th element
from each of the argument sequences. The returned list is truncated
in length to the length of the shortest argument sequence.

What *lstTupA does in lstA, lstB = zip(*lstTupA) (or generally the * operator) i to flattening an iterable. So doing zip(*lstTupA) is equal to zip(lstTupA[0], lstTupA[1], ...) and these items are tuples passed to zip and that's exactly the reason why lstA and lstB are tuples.

zip doesn't know what is on the left hand side of the equal sign. As far as it know, lstTupA = zip(x,y) and lstA, lstB = zip(*lstTupA) are the same thing. zip is defined to do one thing and it is constant in doing that one thing. You have decided to break apart the list of tuples in the second statement, so you are the one that is adding extra context to the second statement.

Ideally lstA, lstB = zip(*lstTupA) should have assigned ['a', 'b', 'c'] to lstA and ['x', 'y', 'z'] to lstB (lists) right?
No, that is not right. Remember, that zip returns a list of tuples, that's exactly the way you expect it to behave when you say
lstTupA would be [('a', 'x'), ('b', 'y'), ('c', 'z')].
So, why would it return something different in the case of zip(*lstTupA)? It would still return the list of tuples, in this case [('a', 'b', 'c'), ('x', 'y', 'z')]. By performing assignment to lstA and lstB, you simply extract the tuples from the list.

Yes you have to do something stupid like
[list(t) for t in zip(*lst)]
Just to get lists.
What the 'pythonistas' rushing to defend the braindead choice of lists of tuples fail to remember is that tuples cannot be assigned to. Which makes zip(*m) useless for matrices or anything else where you want to alter items later.

Related

How is does zip(*) generate n-grams?

I am reviewing some notes on n-grams, and I came accross a couple of interesting functions. First there's this one to generate bigrams:
def bigrams(word):
return sorted(list(set(''.join(bigram)
for bigram in zip(word,word[1:]))))
def bigram_print(word):
print("The bigrams of", word, "are:")
print(bigrams(word))
bigram_print("ababa")
bigram_print("babab")
After doing some reading and playing on my own with Python I understand why this works. However, when looking at this function, I am very puzzled by the use of zip(*word[i:]) here. I understand that the * is an unpacking operator (as explained here), but I really am getting tripped up by how it's working in combination with the list comprehension here. Can anyone explain?
def ngrams(word, n):
return sorted(list(set(''.join(ngram)
for ngram in zip(*[word[i:]
for i in range(n)]))))
def ngram_print(word, n):
print("The {}-grams of {} are:".format(n, word))
print(ngrams(word, n))
for n in [2, 3, 4]:
ngram_print("ababa", n)
ngram_print("babab", n)
print()
The following example should explain how this works. I have added code and a visual representation of it.
Intuition
The core idea is to zip together multiple versions of the same list where each of them starts from the next subsequent element.
Lets say L is a list of words/elements ['A', 'B', 'C', 'D']
Then, what's happening here is that L, L[1:], L[2:] get zipped which means the first elements of each of these (which are the 1st, 2nd, and 3rd elements of L) get clubbed together and second elements get clubbed together and so on..
Visually this can be shown as:
The statement we are worried about -
zip ( * [L[i:] for i in range(n)])
#|___||_______||________________________|
# | | |
# zip unpack versions of L with subsequent 0 to n elements skipped
Code example
l = ['A','B','C','D']
print('original list: '.ljust(27),l)
print('list skipping 1st element: ',l[1:])
print('list skipping 2 elements: '.ljust(27),l[2:])
print('bi-gram: '.ljust(27), list(zip(l,l[1:])))
print('tri-gram: '.ljust(27), list(zip(l,l[1:],l[2:])))
original list: ['A', 'B', 'C', 'D']
list skipping 1st element: ['B', 'C', 'D']
list skipping 2 elements: ['C', 'D']
bi-gram: [('A', 'B'), ('B', 'C'), ('C', 'D')]
tri-gram: [('A', 'B', 'C'), ('B', 'C', 'D')]
As you can see, you are basically zipping the same list but with one skipped. This zips (A, B) and (B, C) ... together for bigrams.
The * operator is for unpacking. When you change the i value to skip elements, you are basically zipping a list of [l[0:], l[1:], l[2:]...]. This is passed to the zip() and unpacked inside it with *.
zip(*[word[i:] for i in range(n)] #where word is the list of words
Alternate to list comprehension
The above list comprehension is equivalent to -
n = 3
lists = []
for i in range(3):
print(l[i:]) #comment this if not needed
lists.append(l[i:])
out = list(zip(*lists))
print(out)
['A', 'B', 'C', 'D']
['B', 'C', 'D']
['C', 'D']
[('A', 'B', 'C'), ('B', 'C', 'D')]
If you break down
zip(*[word[i:] for i in range(n)])
You get:
[word[i:] for i in range(n)]
Which is equivalent to:
[word[0:], word[1:], word[2:], ... word[n-1:]]
Which are each strings that start from different positions in word
Now, if you apply the unpacking * operator to it:
*[word[0:], word[1:], word[2:], ... word[n-1:]]
You get each of the lists word[0:], word[1:] etc passed to zip()
So, zip is getting called like this:
zip(word[0:], word[1:], word[2:], ... word[n-1:])
Which - according to how zip works - would create n-tuples, with each entry coming from one of the corresponding arguments:
[(words[0:][0], words[1:][0]....),
(words[0:][1], words[1:][1]....)
...
If you map the indexes, you'll see that these values correspond to the n-gram definitions for word

What does a list sort(key=str.lower) do?

Can you explain the second line with explain?
spam = ['a', 'z', 'A', 'Z']
spam.sort(key=str.lower)
print(spam)
spam is a list, and lists in python have a built-in sort function that changes the list order from low to high values.
e.g.
Nums = [2,1,3]
Nums.sort()
Nums
Output
[1,2,3]
The key parameter of sort is a function that is applied to each element of the list before comparison in the sorting algorithm. str.lower is a function that returns characters in lowercase.
e.g.
A -> a
So, the second line sorts spam by the lowercase values of its elements.
Which should result in
[a,A,z,Z]
spam.sort(key=str.lower)
is sorting your spam list alphabetically as it's case-insensitive. So if you change your 3rd line to
print(spam)
you get
['a', 'A', 'z', 'Z']
without the key in your sort the sorted list would be
['A', 'Z', 'a', 'z']
spam.sort(key=str.lower)
sort() is a default function for sorting in python and it can take some sorting function as a parameter.
here key = str.lower is the sorting function which means that it should parse every string to lower case before sorting.
which means that this is case in sensitive search.
It performs case insensitive sorting.
Let's modify your example a bit, to include another entry "a":
spam = ['a', 'z', 'A', 'Z','a']
Naturally, you'd expect first "a" and second "a" to occur together. But they don't when you give key=str.lower, because of the properties of string ordering. Instead, a plain list.sort call will give you:
spam = ['a', 'z', 'A', 'Z','a']
spam.sort()
print(spam)
output
['A', 'Z', 'a', 'a', 'z']
On the other hand, specifying str.lower gives you this:
spam = ['a', 'z', 'A', 'Z','a']
spam.sort(key=str.lower)
print(spam)
output:
['a', 'A', 'a', 'z', 'Z']
Here, the original list elements are sorted with respect to their lowercased equivalents.
Basically, when str.lower is done then the data in spam becomes 'a','z','a','z' due to which when sorting is done then both 'a' will come first and then 'z' ... but as the actual spam is not altered and only while sorting the data's case was changed so you get the output as 'a' 'A' 'z' 'Z' ..
same result you will get when you do key=str.upper.
I had never programming in Python, but what i see:
spam is array of strings
spam.sort = sorts array, key - is current value, you can do smth.
str.lower - means you will translate each string to lower case (L => l)
and last line means you returns that array (maybe...)

Cleanest way to iterate over pair of iterables of different lengths, wrapping the shorter iterable? [duplicate]

This question already has answers here:
How to zip two differently sized lists, repeating the shorter list?
(15 answers)
Closed 5 years ago.
If I have two iterables of different lengths, how can I most cleanly pair them, re-using values from the shorter one until all values from the longer are consumed?
For example, given two lists
l1 = ['a', 'b', 'c']
l2 = ['x', 'y']
It would be desirable to have a function fn() resulting in pairs:
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
I found I could write a function to perform this as such
def fn(l1, l2):
if len(l1) > len(l2):
return [(v, l2[i % len(l2)]) for i, v in enumerate(l1)]
return [(l1[i % len(l1)], v) for i, v in enumerate(l2)]
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
>>> l2 = ['x', 'y', 'z', 'w']
>>> fn(l1,l2)
[('a', 'x'), ('b', 'y'), ('c', 'z'), ('a', 'w')]
However, I'm greedy and was curious what other methods exist? so that I may select the most obvious and elegant and be wary of others.
itertools.zip_longest as suggested in many similar questions is very close to my desired use case as it has a fillvalue argument which will pad the longer pairs. However, this only takes a single value, instead of wrapping back to the first value in the shorter list.
As a note: in my use case one list will always be much shorter than the other and this may allow a short-cut, but a generic solution would be exciting too!
You may use itertools.cycle() with zip to get the desired behavior.
As the itertools.cycle() document says, it:
Make an iterator returning elements from the iterable and saving a copy of each. When the iterable is exhausted, return elements from the saved copy.
For example:
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['x', 'y']
>>> from itertools import cycle
>>> zip(l1, cycle(l2))
[('a', 'x'), ('b', 'y'), ('c', 'x')]
Since in your case, length of l1 and l2 could vary, your generic fn() should be like:
from itertools import cycle
def fn(l1, l2):
return zip(l1, cycle(l2)) if len(l1) > len(l2) else zip(cycle(l1), l2)
Sample Run:
>>> l1 = ['a', 'b', 'c']
>>> l2 = ['x', 'y']
# when second parameter is shorter
>>> fn(l1, l2)
[('a', 'x'), ('b', 'y'), ('c', 'x')]
# when first parameter is shorter
>>> fn(l2, l1)
[('x', 'a'), ('y', 'b'), ('x', 'c')]
If you're not sure which one is the shortest, next it.cycle the longest len of the two lists:
def fn(l1, l2):
return (next(zip(itertools.cycle(l1), itertoools.cycle(l2))) for _ in range(max((len(l1), len(l2)))))
>>> list(fn(l1, l2))
[('a', 'x'), ('a', 'x'), ('a', 'x')]
itertools.cycle will repeat the list infinitely. Then, zip the two infinite lists together to get the cycle that you want, but repeated infinitely. So now, we need to trim it to the right size. max((len(l1), len(l2))) will find the longest length of the two lists, then next the infinite iterable until you get to the right length. Note that this returns a generator, so to get the output you want use list to eat the function.

How to get elements out from list of lists when having the same position

I have a list of lists where I want to extract the element from each list at same position. How do I do so? As an example. I have like:
L = [[A,B,C,D][B,C,D,E][C,D,E,F]]
Now I want all the letters from position 0 which would give me:
A, B, C - > L[0][0], L[1][0], L[2][0]
I tried to use:
[row[0] for row in L]
and
L[:-1][0]
But none of them works for me.
The reason this is happening to you is because of the way you made your list.
[[A,B,C,D][B,C,D,E][C,D,E,F]]
You have to separate the list (i.e you forgot the commas in between each list). Change your list to something like this
[[A,B,C,D],[B,C,D,E],[C,D,E,F]]
Also, when testing this it doesn't work as its not in quotation marks, but i'm guessing there's a reason for that.
Hope I could help :3
You are very close. Try this,
[v[0] for i, v in enumerate(L)]
This should work,
I don't understand why its not work for you:
[row[0] for row in L]
output:
['A','B', 'C']
Transpose your list with zip.
>>> L = [['A','B','C','D'],['B','C','D','E'],['C','D','E','F']]
>>> t = zip(*L) # list(*zip(L)) in Python 3
t[position] will give you all the elements for a specific position.
>>> t[0]
('A', 'B', 'C')
>>> t[1]
('B', 'C', 'D')
>>> t[2]
('C', 'D', 'E')
>>> t[3]
('D', 'E', 'F')
By the way, your attempted solution should have worked.
>>> [row[0] for row in L]
['A', 'B', 'C']
If you only care about one specific index, this is perfectly fine. If you want the information for all indices, transposing the whole list of lists with zip is the way to go.

Custom Python Sort: Double Precedence

I want to make a custom sort for my Python code that uses the built in sort() function, but can sort a list based on two values. The list I want sorted is structured as a list of tuples, which each contain 2 integers. What I want to sort to do is sort the list of tuples based on each of their first integers, but if two first integers are tied, it refers to their second integer, which is unique and therefore will not be the same. I want to use the speed of the built in sort() function, but be able to sort in this way. Any and all help is GREATLY APPRECIATED!
Built in sorted does this.
>>> l = [(1, 1), (1, 2), (2, 5), (2, 4)]
>>> sorted(l)
[(1, 1), (1, 2), (2, 4), (2, 5)]
The difference between sort() and sorted() is that sort() modifies the given list (and therefore, any other lists that are sharing its structure), while sorted() accepts an iterable, and returns a brand new list object.
For instance:
>>> a = list("alphabet")
>>> a
['a', 'l', 'p', 'h', 'a', 'b', 'e', 't']
>>> b = a
>>> b
['a', 'l', 'p', 'h', 'a', 'b', 'e', 't']
>>> b.sort()
>>> #this has modified the shared structure
>>> a
['a', 'a', 'b', 'e', 'h', 'l', 'p', 't']
As opposed to sorted()
>>> c = list("alphabet")
>>> d = c
>>> sorted(d)
['a', 'a', 'b', 'e', 'h', 'l', 'p', 't']
>>> c
['a', 'l', 'p', 'h', 'a', 'b', 'e', 't']
sorted() is safer.
You have just described the exact behavior of the list.sort() method, so once you have the tuples in a list, simply call the list's sort method (l.sort) with no arguments and it will be put in the desired order.
When more complex sorts are required you can pass a "key function" as a named argument, key. The function is applied to each element of the list to generate a sort key, then the elements are sorted in the order of their sort keys.
The sorted built-in function is convenient when you require a sorted copy of a list - it simply saves you the trouble of creating a copy then calling its sort method.

Categories

Resources