Making my dictionary comprehension more efficient - python

I have a list of strings. For example:
lst = ['aa bb cc', 'dd ee ff gg']
Each string in the list is known to contain 2 or more whitespace delimited tokens.
I want to build a dictionary keyed by the last token with the first token as its value.
The following dictionary comprehension achieves this:
d = {e.split()[-1]: e.split()[0] for e in lst}
This gives me:
{'cc': 'aa', 'gg': 'dd'}
...which is exactly what I want.
However, this means that the element e will have its split() function called twice per iteration over lst.
I can't help thinking that there must be a way to avoid this but I just can't figure it out.
Any ideas?

Using map:
d = {v[-1]: v[0] for v in map(str.split, lst)}

You can use walrus operator. But list comprehension is not meant to efficient, it's just shorter(I think)
lst = ['aa bb cc', 'dd ee ff gg']
d={j[-1]:j[0] for i in lst if (j:=i.split())}
print(d)

As an alternative to other answers, you can make use of unpacking to get rid of the indexing, IF the elements in your list are always going to have at least 2 strings separated by spaces (which as you say, they do), otherwise this won't work:
d = {last: first for first, *_, last in map(str.split, lst)}
print(d)
# {'cc': 'aa', 'gg': 'dd'}

If you use a function it would be much cleaner as you are carrying out an additional instruction, not just creating a list. Since you are looking to create a single holder of split values you will need to ensure there is some way to access index 0 or -1.
But it can be done like so:
d = {e[0]: e[-1] for e in [item.split() for item in lst]}
Should work for you.
Updated answer for python 3.8 +
I just recalled a recent addition on this. You can also achieve this without the inner comprehension:
d = {parts[0]: parts[-1] for e in lst if (parts := e.split())}
https://www.python.org/dev/peps/pep-0572/

Related

Executing join of dictionary of list items

I have a dictionary of the form dict[keyA][key1] where 'key1' is a dictionary of lists. i.e., keyA is a dictionary of dictionaries of lists. Below is a sample of how the dictionary could be created.
dict = { 'keyA': { 'keyA1': ['L1','L2',...], 'keyA2': ['L','L',...], ... },
'keyB': { 'keyB1': ['L1','L2',...], 'key...': ['L','L',..], ...}
}
I need to join the values of the lists together and would like to do this with a construct like:
newStr = ' A B C '.join(val for val in (dict[keyA][k] for k in dict[keyA]))
This fails with an error that val is a 'list' vs. a string.
when I resolve val via 2 for loops I get a list of strings as I would expect the above to provide.
Simple example that works for a one entry in the outer dictionary and prints a list of strings
for k in dict[keyA]:
for val in dict[keyA][k]:
print(val)
Example that does not work and prints a 'list':
for val in (dict[keyA][k] for k in dict[keyA]): print(val)
output from failing test above (note the enclosing brackets in the output). If I 'type' this value, it indicates that the value is indeed a list:
['some text', 'some more text', ....]
The working nested 'for' loops above produces the above text on separate lines without the brackets as expected, the output of which should work in the join to give the desired results....
Can anyone explain what I am missing here?
Your syntax for the "nested" comprehension isn't quite correct.
If you separate out the second portion, you'll see what's tripping it up:
>>> [_dict['keyA'][k] for k in _dict['keyA']]
[['L1', 'L2', 'L3'], ['Q1', 'Q2', 'Q3']]
The order for a nested comprehension isn't intuitive (IMO) and reads left-to-right in order of descending loops, instead of in an unrolling fashion which I think most people would initially assume.
In any case, you just need to adjust your comprehension:
>>> ' A B C '.join(val for key in _dict['keyA'] for val in _dict['keyA'][key])
'L1 A B C L2 A B C L3 A B C Q1 A B C Q2 A B C Q3'
Or using dict.items:
(Note: _ is used as a placeholder/throwaway here, since you don't actually need the "key" loop variable in this case)
>>> ' A B C '.join(val for _, v in _dict['keyA'].items() for val in v)
'L1 A B C L2 A B C L3 A B C Q1 A B C Q2 A B C Q3'
Also, as an aside, avoid using python built-ins as variable names, or you won't be able to use that built-in later on in your code.

Convert if-else to concise list comprehension

Suppose there exists a string 'Hello World', and I wish to use a dictionary to get a mapping of elements and their frequencies, this following code does suffice, however if I need to use list comprehension, how can I use 'if', 'else'? Please provide your solutions
for i in s:
if i in d:
d[i]=d[i]+1
else:
d[i]=1
You can use a dictionary comprehension (it doesn't make sense to use a list comprehension to build a dictionary):
s = 'Hello world'
d = {char: s.count(char) for char in set(s)}
The set(s) is a set of the unique characters in your string, and the comprehension creates a dictionary with the character as key and the number of occurences (using str.count) as value.
But you don't need to use a comprehension at all, python comes with "batteries included". In this case the battery is collections.Counter:
import collections
collections.Counter(s)
In case you really want to use a list comprehension (my personal opinion: you don't and shouldn't!) you need to work with side-effects, for example:
s = 'Hello world'
d = {}
[d.__setitem__(i, d[i]+1) if i in d else d.__setitem__(i, 1) for i in s]
print(d)
The __setitem__ calls are necessary because d[i] = 1 or d[i] = d[i] + 1 are assignments and therefore forbidden in comprehensions. But __setitem__ is the functional alternative.

Get related dictionaries from lists

I have two list of different dictionaries (ListA and ListB).
All dictionaries in listA have field "id" and "external_id"
All dictionaries in listB have field "num" and "external_num"
I need to get all pairs of dictionaries where value of external_id = num and value of external_num = id.
I can achieve that using this code:
for dictA in ListA:
for dictB in ListB:
if dictA["id"] == dictB["external_num"] and dictA["external_id"] == dictB["num"]:
But I saw many beautiful python expressions, and I guess it is possible to get that result more pythonic style, isn't it?
I something like:
res = [A, B for A, B in listA, listB if A['id'] == B['extnum'] and A['ext'] == B['num']]
You are pretty close, but you aren't telling Python how you want to connect the two lists to get the pairs of dictionaries A and B.
If you want to compare all dictionaries in ListA to all in ListB, you need itertools.product:
from itertools import product
res = [A, B for A, B in product(ListA, ListB) if ...]
Alternatively, if you want pairs at the same indices, use zip:
res = [A, B for A, B in zip(ListA, ListB) if ...]
If you don't need the whole list building at once, note that you can use itertools.ifilter to pick the pairs you want:
from itertools import ifilter, product
for A, B in ifilter(lambda (A, B): ...,
product(ListA, ListB)):
# do whatever you want with A and B
(if you do this with zip, use itertools.izip instead to maximise performance).
Notes on Python 3.x:
zip and filter no longer return lists, therefore itertools.izip and itertools.ifilter no longer exist (just as range has pushed out xrange) and you only need product from itertools; and
lambda (A, B): is no longer valid syntax; you will need to write the filtering function to take a single tuple argument lambda t: and e.g. replace A with t[0].
Firstly, for code clarity, I actually would probably go with your first option - I don't think using for loops is particularly un-Pythonic, in this case. However, if you want to try using a list comprehension, there are a few things to be aware of:
Each item returned by the list comprehension needs to be just a singular item. Trying to return A, B is going to give you a SyntaxError. However, you can return either a list or a tuple (or anything else, that is a single object), so something like res = [(A,B) for...] would start working.
Another concern is how you're iterating over these lists - from you first snippet of code, it appears you don't make any assumptions about these lists lining up, meaning: you seem to be ok if the 2nd item in listA matches the 14th item in listB, so long as they match on the appropriate fields. That's perfectly reasonable, but just be aware that means you will need two for loops no matter how you try to do it*. And you still need your comparisons. So, as a list comprehension, you might try:
res = [(A, B) for A in listA for B in listB if A['id']==B['extnum'] and A['extid']==B['num']]
Then, in res, you'll have 0 or more tuples, and each tuple will contain the respective dictionaries you're interested in. To use them:
for tup in res:
A = tup[0]
B = tup[1]
#....
or more concisely (and Pythonically):
for A,B in res:
#...
since Python is smart enough to know that it's yielding an item (the tuple) that has 2 elements, and so it can directly assign them to A and B.
EDIT:* in retrospect, it isn't completely true that you need two forloops, and if your lists are big enough, it may be helpful, performance-wise, to make an intermediate dictionary such as this:
# make a dictionary with key=tuple, value=dictionary
interim = {(A['id'], A['extid']): A for A in listA}
for B in listB:
tup = (B['extnum'], B['num']) ## order matters! match-up with A
if tup in interim:
A = interim[tup]
print(A, B)
and, if the id-extid pair isnot expected to be unique across all items in listA, then you'd want to look into collections.defaultdict with a list... but I'm not sure this still fits in the 'more Pythonic' category anymore.
I realize this is likely overkill for the question you asked, but I couldn't let my 'two for loops' statement stand, since it's not entirely true.

Sort lines in a string, group data

I'm trying to group string like a map output.
Ex:
String = "
a,a
a,b
a,c
b,a
b,b
b,c"
Op:
a a,b,c
b a,b,c
Is this kind of output possible in a single step??
use the builtin sorted:
In [863]: st=sorted(String.split())
Out[863]: ['aa', 'ab', 'ba', 'bb']
to print it:
In [865]: print '\n'.join(st)
aa
ab
ba
bb
list.sort sorts the list in place and returns None, that's why when you print(lines.sort()) it shows nothing! show your list by lines.sort(); prnit(lines) ;)
Note that list.sort() sorts the list in-place, and does not return a new list. That's why
print(lines.sort())
is printing None. Try:
lines.sort() # This modifies lines to become a sorted version
print(lines)
Alternatively, there is the built-in sorted() function, which returns a sorted copy of the list, leaving the original unmodified. Use it like this:
print(sorted(list))
Because so far the other answers focus on sorting, I want to contribute this for the grouping issue:
String = """
a a
a b
a c
b a
b b
b c"""
pairs = sorted(line.split() for line in String.split('\n') if line.strip() )
from operator import itemgetter
from itertools import groupby
for first, grouper in groupby(pairs, itemgetter(0)):
print first, "\t", ', '.join(second for first, second in grouper)
Out:
a a, b, c
b a, b, c

How to separate one list in two via list comprehension or otherwise

If have a list of dictionary items like so:
L = [{"a":1, "b":0}, {"a":3, "b":1}...]
I would like to split these entries based upon the value of "b", either 0 or 1.
A(b=0) = [{"a":1, "b":1}, ....]
B(b=1) = [{"a":3, "b":2}, .....]
I am comfortable with using simple list comprehensions, and i am currently looping through the list L two times.
A = [d for d in L if d["b"] == 0]
B = [d for d in L if d["b"] != 0]
Clearly this is not the most efficient way.
An else clause does not seem to be available within the list comprehension functionality.
Can I do what I want via list comprehension?
Is there a better way to do this?
I am looking for a good balance between readability and efficiency, leaning towards readability.
Thanks!
update:
thanks everyone for the comments and ideas! the most easiest one for me to read is the one by Thomas. but i will look at Alex' suggestion as well. i had not found any reference to the collections module before.
Don't use a list comprehension. List comprehensions are for when you want a single list result. You obviously don't :) Use a regular for loop:
A = []
B = []
for item in L:
if item['b'] == 0:
target = A
else:
target = B
target.append(item)
You can shorten the snippet by doing, say, (A, B)[item['b'] != 0].append(item), but why bother?
If the b value can be only 0 or 1, #Thomas's simple solution is probably best. For a more general case (in which you want to discriminate among several possible values of b -- your sample "expected results" appear to be completely divorced from and contradictory to your question's text, so it's far from obvious whether you actually need some generality;-):
from collections import defaultdict
separated = defaultdict(list)
for x in L:
separated[x['b']].append(x)
When this code executes, separated ends up with a dict (actually an instance of collections.defaultdict, a dict subclass) whose keys are all values for b that actually occur in dicts in list L, the corresponding values being the separated sublists. So, for example, if b takes only the values 0 and 1, separated[0] would be what (in your question's text as opposed to the example) you want as list A, and separated[1] what you want as list B.

Categories

Resources