Executing join of dictionary of list items

Executing join of dictionary of list items - python

I have a dictionary of the form dict[keyA][key1] where 'key1' is a dictionary of lists. i.e., keyA is a dictionary of dictionaries of lists. Below is a sample of how the dictionary could be created.
dict = { 'keyA': { 'keyA1': ['L1','L2',...], 'keyA2': ['L','L',...], ... },
'keyB': { 'keyB1': ['L1','L2',...], 'key...': ['L','L',..], ...}
}
I need to join the values of the lists together and would like to do this with a construct like:
newStr = ' A B C '.join(val for val in (dict[keyA][k] for k in dict[keyA]))
This fails with an error that val is a 'list' vs. a string.
when I resolve val via 2 for loops I get a list of strings as I would expect the above to provide.
Simple example that works for a one entry in the outer dictionary and prints a list of strings
for k in dict[keyA]:
for val in dict[keyA][k]:
print(val)
Example that does not work and prints a 'list':
for val in (dict[keyA][k] for k in dict[keyA]): print(val)
output from failing test above (note the enclosing brackets in the output). If I 'type' this value, it indicates that the value is indeed a list:
['some text', 'some more text', ....]
The working nested 'for' loops above produces the above text on separate lines without the brackets as expected, the output of which should work in the join to give the desired results....
Can anyone explain what I am missing here?

Your syntax for the "nested" comprehension isn't quite correct.
If you separate out the second portion, you'll see what's tripping it up:
>>> [_dict['keyA'][k] for k in _dict['keyA']]
[['L1', 'L2', 'L3'], ['Q1', 'Q2', 'Q3']]
The order for a nested comprehension isn't intuitive (IMO) and reads left-to-right in order of descending loops, instead of in an unrolling fashion which I think most people would initially assume.
In any case, you just need to adjust your comprehension:
>>> ' A B C '.join(val for key in _dict['keyA'] for val in _dict['keyA'][key])
'L1 A B C L2 A B C L3 A B C Q1 A B C Q2 A B C Q3'
Or using dict.items:
(Note: _ is used as a placeholder/throwaway here, since you don't actually need the "key" loop variable in this case)
>>> ' A B C '.join(val for _, v in _dict['keyA'].items() for val in v)
'L1 A B C L2 A B C L3 A B C Q1 A B C Q2 A B C Q3'
Also, as an aside, avoid using python built-ins as variable names, or you won't be able to use that built-in later on in your code.

Related

Making my dictionary comprehension more efficient

I have a list of strings. For example:
lst = ['aa bb cc', 'dd ee ff gg']
Each string in the list is known to contain 2 or more whitespace delimited tokens.
I want to build a dictionary keyed by the last token with the first token as its value.
The following dictionary comprehension achieves this:
d = {e.split()[-1]: e.split()[0] for e in lst}
This gives me:
{'cc': 'aa', 'gg': 'dd'}
...which is exactly what I want.
However, this means that the element e will have its split() function called twice per iteration over lst.
I can't help thinking that there must be a way to avoid this but I just can't figure it out.
Any ideas?

Using map:
d = {v[-1]: v[0] for v in map(str.split, lst)}

You can use walrus operator. But list comprehension is not meant to efficient, it's just shorter(I think)
lst = ['aa bb cc', 'dd ee ff gg']
d={j[-1]:j[0] for i in lst if (j:=i.split())}
print(d)

As an alternative to other answers, you can make use of unpacking to get rid of the indexing, IF the elements in your list are always going to have at least 2 strings separated by spaces (which as you say, they do), otherwise this won't work:
d = {last: first for first, *_, last in map(str.split, lst)}
print(d)
# {'cc': 'aa', 'gg': 'dd'}

If you use a function it would be much cleaner as you are carrying out an additional instruction, not just creating a list. Since you are looking to create a single holder of split values you will need to ensure there is some way to access index 0 or -1.
But it can be done like so:
d = {e[0]: e[-1] for e in [item.split() for item in lst]}
Should work for you.
Updated answer for python 3.8 +
I just recalled a recent addition on this. You can also achieve this without the inner comprehension:
d = {parts[0]: parts[-1] for e in lst if (parts := e.split())}
https://www.python.org/dev/peps/pep-0572/

Matching a list of dictionaries A to list C with list B having common properties of A and C in Python?

I have three lists of dictionaries, A, B and C. They look like:
A = [{propA1: valueA1}, {propA1: valueA2}, ...]
B = [{propB1: valueB1, propB2: valueB2}, {propB1: valueB3, propB2: value4}, ...]
C = [{propC1: valueC1}, {propC1: valueC2}, ...]
propA1 and propB1 are same properties but different name, propB2 and propC1 are same properties as well.
However, propA1 and propB1 do not always have same values, but I am only interested in the "set intersect" of array [valueA1, valueA2, ...] and [valueB1, valueB2, ...], here is the goal: I want to return all propB2 from B such that their propB1 counterpart (in the same dictionary) match with propA1 in A. Then I will use that propB2 set to match with propC1 in C.
What I have tried:
propB2_match = set()
for elementB in B:
for elementA in A:
if elementB['propB1'] == elementA['propA1']:
propB2_match(elementB['propB2'])
break
At the end of this loop, I have propB2_match containing all of propB2 that I can use to match with propC1.
However, as you can see from the loop, this is an expensive O(n^2) loop. I am wondering if there is a way to handle this with O(n)? If not, is there any pythonic optimization can be done on it?
Note: I do not want to put it in a database and use relational database SQL to handle the join operation.

If I understand correctly, you are trying to do a essentially do a JOIN on A and B where columns A['propA1'] == B['propB1'].
Here's one way using defaultdict that's O(len(A)+len(B)):
from collections import defaultdict
A = [{'pA1': 'vA1'}, {'pA1': 'vA2'}]
B = [{'pB1': 'vA1', 'pB2': 'vB2'}, {'pB1': 'vB3', 'pB2': 'v4'}]
# Key by the value you want to group on
kA = [(x['pA1'],x) for x in A]
kB = [(x['pB1'],x) for x in B]
# Combine the lists
kAB = kA+kB
# Map each unique key to a list of elements that have that key
results = defaultdict(list)
for x in kAB:
results[x[0]].append(x[1])
for x in results:
print results[x]
Outputs:
[{'pA1': 'vA2'}]
[{'pB1': 'vB3', 'pB2': 'v4'}]
[{'pA1': 'vA1'}, {'pB1': 'vA1', 'pB2': 'vB2'}]
At this point you could merge each list of dicts into a single dict or whatever you need, and use the result to JOIN with the third list C.

Get related dictionaries from lists

I have two list of different dictionaries (ListA and ListB).
All dictionaries in listA have field "id" and "external_id"
All dictionaries in listB have field "num" and "external_num"
I need to get all pairs of dictionaries where value of external_id = num and value of external_num = id.
I can achieve that using this code:
for dictA in ListA:
for dictB in ListB:
if dictA["id"] == dictB["external_num"] and dictA["external_id"] == dictB["num"]:
But I saw many beautiful python expressions, and I guess it is possible to get that result more pythonic style, isn't it?
I something like:
res = [A, B for A, B in listA, listB if A['id'] == B['extnum'] and A['ext'] == B['num']]

You are pretty close, but you aren't telling Python how you want to connect the two lists to get the pairs of dictionaries A and B.
If you want to compare all dictionaries in ListA to all in ListB, you need itertools.product:
from itertools import product
res = [A, B for A, B in product(ListA, ListB) if ...]
Alternatively, if you want pairs at the same indices, use zip:
res = [A, B for A, B in zip(ListA, ListB) if ...]
If you don't need the whole list building at once, note that you can use itertools.ifilter to pick the pairs you want:
from itertools import ifilter, product
for A, B in ifilter(lambda (A, B): ...,
product(ListA, ListB)):
# do whatever you want with A and B
(if you do this with zip, use itertools.izip instead to maximise performance).
Notes on Python 3.x:
zip and filter no longer return lists, therefore itertools.izip and itertools.ifilter no longer exist (just as range has pushed out xrange) and you only need product from itertools; and
lambda (A, B): is no longer valid syntax; you will need to write the filtering function to take a single tuple argument lambda t: and e.g. replace A with t[0].

Firstly, for code clarity, I actually would probably go with your first option - I don't think using for loops is particularly un-Pythonic, in this case. However, if you want to try using a list comprehension, there are a few things to be aware of:
Each item returned by the list comprehension needs to be just a singular item. Trying to return A, B is going to give you a SyntaxError. However, you can return either a list or a tuple (or anything else, that is a single object), so something like res = [(A,B) for...] would start working.
Another concern is how you're iterating over these lists - from you first snippet of code, it appears you don't make any assumptions about these lists lining up, meaning: you seem to be ok if the 2nd item in listA matches the 14th item in listB, so long as they match on the appropriate fields. That's perfectly reasonable, but just be aware that means you will need two for loops no matter how you try to do it*. And you still need your comparisons. So, as a list comprehension, you might try:
res = [(A, B) for A in listA for B in listB if A['id']==B['extnum'] and A['extid']==B['num']]
Then, in res, you'll have 0 or more tuples, and each tuple will contain the respective dictionaries you're interested in. To use them:
for tup in res:
A = tup[0]
B = tup[1]
#....
or more concisely (and Pythonically):
for A,B in res:
#...
since Python is smart enough to know that it's yielding an item (the tuple) that has 2 elements, and so it can directly assign them to A and B.
EDIT:* in retrospect, it isn't completely true that you need two forloops, and if your lists are big enough, it may be helpful, performance-wise, to make an intermediate dictionary such as this:
# make a dictionary with key=tuple, value=dictionary
interim = {(A['id'], A['extid']): A for A in listA}
for B in listB:
tup = (B['extnum'], B['num']) ## order matters! match-up with A
if tup in interim:
A = interim[tup]
print(A, B)
and, if the id-extid pair isnot expected to be unique across all items in listA, then you'd want to look into collections.defaultdict with a list... but I'm not sure this still fits in the 'more Pythonic' category anymore.
I realize this is likely overkill for the question you asked, but I couldn't let my 'two for loops' statement stand, since it's not entirely true.

Sort lines in a string, group data

I'm trying to group string like a map output.
Ex:
String = "
a,a
a,b
a,c
b,a
b,b
b,c"
Op:
a a,b,c
b a,b,c
Is this kind of output possible in a single step??

use the builtin sorted:
In [863]: st=sorted(String.split())
Out[863]: ['aa', 'ab', 'ba', 'bb']
to print it:
In [865]: print '\n'.join(st)
aa
ab
ba
bb
list.sort sorts the list in place and returns None, that's why when you print(lines.sort()) it shows nothing! show your list by lines.sort(); prnit(lines) ;)

Note that list.sort() sorts the list in-place, and does not return a new list. That's why
print(lines.sort())
is printing None. Try:
lines.sort() # This modifies lines to become a sorted version
print(lines)
Alternatively, there is the built-in sorted() function, which returns a sorted copy of the list, leaving the original unmodified. Use it like this:
print(sorted(list))

Because so far the other answers focus on sorting, I want to contribute this for the grouping issue:
String = """
a a
a b
a c
b a
b b
b c"""
pairs = sorted(line.split() for line in String.split('\n') if line.strip() )
from operator import itemgetter
from itertools import groupby
for first, grouper in groupby(pairs, itemgetter(0)):
print first, "\t", ', '.join(second for first, second in grouper)
Out:
a a, b, c
b a, b, c

How to get Python to tell equal integers apart

Have a bit of a problem distinguishing between identical integers.
In the following (which is obviously a trivial case) a, b, c are integers. I wish to create a dicionary, diction, which will contain {a: 'foo', b: 'bar', c: 'baz'}
diction = {}
for i in (a, b, c):
j = ('foo', 'bar', 'baz')[(a, b, c).index(i)]
diction[i] = j
All runs very nicely until, for example, a and b are the same: the third line will give index 0 for both a and b, resulting in j = 'foo' for each case.
I know lists can be copied by
list_a = [1, 2, 3]
list_b = list(list_a)
or
list_b = list_a[:]
So, is there any way of maybe doing this with my identical integers?
(I tried making one a float, but the value remains the same , so that doesn't work.)

To create a dictionary from two different iterables, you can use the following code:
d = dict(zip((a, b, c), ('foo', 'bar', 'baz')))
where zip is used to combine both iterables in a list of tuples that can be passed to the dictionary constructor.
Note that if a==b, then the 'foo' will be overwritten with 'bar', since the values are added to the dictionary in the same order they are in the iterable as if you were using this code:
d[a] = 'foo'
d[b] = 'bar'
d[c] = 'baz'
This is just the standard behaviour of a dictionary, when a new value is assigned to a key that is already known, the value is overwritten.
If you prefer to keep all values in a list, then you can use a collections.defaultdict as follows:
from collections import defaultdict
d = defaultdict(list)
for key, value in zip((a, b, c), ('foo', 'bar', 'baz')):
d[key].append(value)

You can't distinguish between identical objects.

You can tell them apart if they do not fall between -5 and 256
See also "is" operator behaves unexpectedly with integers
http://docs.python.org/c-api/int.html
The current implementation keeps an array of integer objects for all
integers between -5 and 256, when you create an int in that range you
actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
In [30]: a = 257
In [31]: a is 257
Out[31]: False
In [32]: a = 256
In [33]: a is 256
Out[33]: True
You may have to roll your own dictionary like object that implements this sort of behavior though... and it still wouldn't be able to do anything between -5 and 256. I'd need to do more digging to be sure though.

If a and b have the same value then you can't expect them to point to different positions in dictionary if used as keys. Key values in dictionaries must be unique.
Also if you have two sequences the simplest way to make a dictionary out of them is to zip them together:
tup = (a,b,c)
val = ('foo', 'bar', 'baz')
diction = dict(zip(tup, val))

All of the answers so far are correct - identical keys can't be re-used in a dictionary. If you absolutely have to try to do something like this, but can't ensure that a, b, and c have distinct values you could try something like this:
d = dict(zip((id(k) for k in (a,b,c)), ('foo', 'bar', 'baz')))
When you go to look up your values though, you'll have to remember to do so like this:
d[id(a)]
That might help, but I am not certain what you're actually after here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Executing join of dictionary of list items - python

Related

Making my dictionary comprehension more efficient

Matching a list of dictionaries A to list C with list B having common properties of A and C in Python?

Get related dictionaries from lists

Sort lines in a string, group data

How to get Python to tell equal integers apart

Categories

Resources