Python consolidate dictionary keys and values using comprehension - python

How do I CONSOLIDATE the following using python COMPREHENSION
FROM (list of dicts)
[
{'server':'serv1','os':'Linux','archive':'/my/folder1'}
,{'server':'serv2','os':'Linux','archive':'/my/folder1'}
,{'server':'serv3','os':'Linux','archive':'/my/folder2'}
,{'server':'serv4','os':'AIX','archive':'/my/folder1'}
,{'server':'serv5','os':'AIX','archive':'/my/folder1'}
]
TO (list of dicts with tuple as key and list of 'server#'s as value
[
{('Linux','/my/folder1'):['serv1','serv2']}
,('Linux','/my/folder2'):['serv3']}
.{('AIX','/my/folder1'):['serv4','serv5']}
]

the need to be able to set default values to your dictionary and to have the same key several times may make a dict-comprehension a bit clumsy. i'd prefer something like this:
a defaultdict may help:
from collections import defaultdict
lst = [
{'server':'serv1','os':'Linux','archive':'/my/folder1'},
{'server':'serv2','os':'Linux','archive':'/my/folder1'},
{'server':'serv3','os':'Linux','archive':'/my/folder2'},
{'server':'serv4','os':'AIX','archive':'/my/folder1'},
{'server':'serv5','os':'AIX','archive':'/my/folder1'}
]
dct = defaultdict(list)
for d in lst:
key = d['os'], d['archive']
dct[key].append(d['server'])
if you prefer to have a standard dictionary in the end (actually i do not really see a good reason for that) you could use dict.setdefault in order to create an empty list where the key does not yet exist:
dct = {}
for d in lst:
key = d['os'], d['archive']
dct.setdefault(key, []).append(d['server'])
the documentation on defaultdict (vs. setdefault):
This technique is simpler and faster than an equivalent technique
using dict.setdefault()

It's difficult to achieve with list comprehension because of the accumulation effect. However, it's possible using itertools.groupby on the list sorted by your keys (use the same key function for both sorting and grouping).
Then extract the server info in a list comprehension and prefix by the group key. Pass the resulting (group key, server list) to dictionary comprehension and here you go.
import itertools
lst = [
{'server':'serv1','os':'Linux','archive':'/my/folder1'}
,{'server':'serv2','os':'Linux','archive':'/my/folder1'}
,{'server':'serv3','os':'Linux','archive':'/my/folder2'}
,{'server':'serv4','os':'AIX','archive':'/my/folder1'}
,{'server':'serv5','os':'AIX','archive':'/my/folder1'}
]
sortfunc = lambda x : (x['os'],x['archive'])
result = {k:[x['server'] for x in v] for k,v in itertools.groupby(sorted(lst,key=sortfunc),key = sortfunc)}
print(result)
I get:
{('Linux', '/my/folder1'): ['serv1', 'serv2'], ('AIX', '/my/folder1'): ['serv4', 'serv5'], ('Linux', '/my/folder2'): ['serv3']}
Keep in mind that it's not because it can be written in one line that it's more efficient. The defaultdict approach doesn't require sorting for instance.

Related

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!
ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

Is there any equivalent of in-place slice assignment for dict objects?

In Python, the content of a list instance can be transformed nicely using a list comprehension—for example:
lst = [item for item in lst if item.startswith( '_' )]
Furthermore, if you want to change lst in place (so that any existing references to the old lst now see the updated content) then you can use the following slice-assignment idiom:
lst[:] = [item for item in lst if item.startswith( '_' )]
My question is: is there any equivalent to this for dict objects? Sure, you can transform content using a dict comprehension—for example:
dct = { key:value for key,value in dct.items() if key.startswith( '_' ) }
but this is not an in-place change—the name dct gets re-bound to a new dict instance. Since dct{:} = ... is not legal syntax, the best I can come up with to change a dict in-place is:
tmp = { key:value for key,value in dct.items() if key.startswith( '_' ) }
dct.clear()
dct.update(tmp)
Any time I see a simple operation require three lines and a variable called tmp, I tend to think it's not very Pythonic. Is there any slicker way to do this?
edit: I guess the filtering and the assignment are separate issues. Really this question is about the assignment—I want to allow arbitrary transformations of the dict content, rather than necessarily just selective removal of some keys.

How to convert a list of tuples containing two lists into dictionary of key value pairs?

I have a list like this-
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
I want something like
[ {['produce_send']:['consume_recv']},{['Send']:['Recv']},{['sender2']:['receiver2']}
How to do this?
You can not use list as the key of dictionary.
This Article explain the concept,
https://wiki.python.org/moin/DictionaryKeys
To be used as a dictionary key, an object must support the hash function (e.g. through hash), equality comparison (e.g. through eq or cmp), and must satisfy the correctness condition above.
And
lists do not provide a valid hash method.
>>> d = {['a']: 1}
TypeError: unhashable type: 'list'
If you want to specifically differentiate the key values you can use tuple as they hash able
{ (i[0][0], ): (i[1][0], ) for i in send_recv_pairs}
{('Send',): ('Recv',),
('produce_send',): ('consume_recv',),
('sender2',): ('receiver2',)}
You can't have lists as keys, only hashable types - strings, numbers, None and such.
If you still want to use a dictionary knowing that, then:
d={}
for tup in send_recv_pairs:
d[tup[0][0]]=tup[1]
If you want the value to be string as well, use tup[1][0] instead of tup[1]
As a one liner:
d={tup[0][0]]:tup[1] for tup in list} #tup[1][0] if you want values as strings
You can check it over here, in the second way of creating distionary.
https://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/
A Simple way of doing it,
First of all, your tuple is tuple of lists, so better change it to tuple of strings (It makes more sense I guess)
Anyway simple way of working with your current tuple list can be like :
mydict = {}
for i in send_recv_pairs:
print i
mydict[i[0][0]]= i[1][0]
As others pointed out, you cannot use list as key to dictionary. So the term i[0][0] first takes the first element from the tuple - which is a list- and then the first element of list, which is the only element anyway for you.
Do you mean like this?
send_recv_pairs = [(['produce_send'], ['consume_recv']),
(['Send'], ['Recv']),
(['sender2'], ['receiver2'])]
send_recv_dict = {e[0][0]: e[1][0] for e in send_recv_pairs}
Resulting in...
>>> {'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
As mentioned in other answers, you cannot use a list as a dictionary key as it is not hashable (see links in other answers).
You can therefore just use the values in your lists (assuming they stay as simple as in your example) to create the following two possibilities:
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
result1 = {}
for t in send_recv_pairs:
result1[t[0][0]] = t[1]
# without any lists
result2 = {}
for t in send_recv_pairs:
result2[t[0][0]] = t[1][0]
Which respectively gives:
>>> result1
{'produce_send': ['consume_recv'], 'Send': ['Recv'], 'sender2': ['receiver2']}
>>> result2
{'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
Try like this:
res = { x[0]: x[1] for x in pairs } # or x[0][0]: x[1][0] if you wanna store inner values without list-wrapper
It's for Python 3 and when keys are unique. If you need collect list of values per key, instead of single value, than you may use something like itertools.groupby or map+reduce. Wrote about this in comments and I'll provide example.
And yes, list cannot store key-values, only dict's, but maybe it's just typo in question.
You can not use list as the dictionary key, but instead you may type-cast it as tuple to create the dict object.
Below is the sample example using a dictionary comprehension:
>>> send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
>>> {tuple(k): v for k, v in send_recv_pairs}
{('sender2',): ['receiver2'], ('produce_send',): ['consume_recv'], ('Send',): ['Recv']}
For details, take a look at: Why can't I use a list as a dict key in python?
However if your nested tuple pairs were not list, but any other hashable object pairs, you may have type-casted it to dict for getting the desired result. For example:
>>> my_list = [('key1', 'value1'), ('key2', 'value2')]
>>> dict(my_list)
{'key1': 'value1', 'key2': 'value2'}

Converting name-value pair into Python dictionary

I have a long list of name-value pairs in Python 3 that represent a single row from a database. Since the number of attributes is fairly large per row, I'm wondering if there is a faster or more pythonic way to convert this into a dict than the following:
name_value_pairs = [{'Name':'id', 'Value':1}, {'Name':'age', 'Value':22}]
for pair in name_value_pairs:
result[pair['Name']] = pair['Value']
Use a dictionary comprehension:
result = dict( (item['Name'], item['Value']) for item in name_value_pairs)
I would suggest the newer dictionary comprehension syntax:
{item['Name']: item['Value'] for item in name_value_pairs}
as already mentioned in the comments by #instabrite.
It's been available since Python 2.7 (2010) and is faster. Some profiling:
Using the dict() constructor: dict( (item['Name'], item['Value']) for item in name_value_pairs)
>>> import timeit
>>> timeit.timeit("dict( (item['Name'], item['Value']) for item in name_value_pairs)",
... setup="name_value_pairs = [{'Name':'id', 'Value':1}, {'Name':'age', 'Value':22}]")
0.6748202270264301
Using dict comprehensions: {item['Name']: item['Value'] for item in name_value_pairs}
>>> timeit.timeit("{item['Name']: item['Value'] for item in name_value_pairs}",
... setup="name_value_pairs = [{'Name':'id', 'Value':1}, {'Name':'age', 'Value':22}]")
0.2638828816925525
As you can see, the dict comprehension is 2.5x faster. And imho, more Pythonic and more readable.

Slowness on iterating over namedtuple and dicts for defined pairs

This code runs quickly on the sample data, but when iterating over a large file, it seems to run slowly, perhaps because of the nested for loops. Is there any other reason why iterating over items in a defaultdict is slow?
import itertools
sample_genes1={"0002":["GENE1", "GENE2", "GENE3", "GENE4"],
"0003":["GENE1", "GENE2", "GENE3", "GENE6"],
"0202":["GENE4", "GENE2", "GENE1", "GENE7"]}
def get_common_gene_pairs(genelist):
genedict={}
for k,v in sample_genes1.items():
listofpairs=[]
for i in itertools.combinations(v,2):
listofpairs.append(i)
genedict[k]=listofpairs
return genedict
from collections import namedtuple,defaultdict
def get_gene_pair_pids(genelist):
i=defaultdict(list)
d=get_common_gene_pairs(sample_genes1)
Pub_genes=namedtuple("pair", ["gene1", "gene2"])
for p_id,genepairs in d.iteritems():
for p in genepairs:
thispair=Pub_genes(p[0], p[1])
if thispair in i.keys():
i[thispair].append(p_id)
else:
i[thispair]=[p_id,]
return i
if __name__=="__main__":
get_gene_pair_pids(sample_genes1)
One big problem: this line:
if thispair in i.keys():
doesn't take advantage of dictionary search, it's linear search. Drop the keys() call, let the dictionary do its fast lookup:
if thispair in i:
BUT since i is a default dict which creates a list when key doesn't exist, just replace:
if thispair in i.keys():
i[thispair].append(p_id) # i is defaultdict: even if thispair isn't in the dict, it will create a list and append p_id.
else:
i[thispair]=[p_id,]
by this simple statement:
i[thispair].append(p_id)
(it's even faster because there's only one hashing of p_id)
to sum it up:
don't do thispair in i.keys(): it's slow, in python 2, or 3, defaultdict or not
you have defined a defaultdict, but your code just assumed a dict, which works but slower.
Note: without defaultdict you could have just removed the .keys() or done this with a simple dict:
i.setdefault(thispair,list)
i[thispair].append(p_id)
(here the default item depends on the key)
Aside:
def get_common_gene_pairs(genelist):
genedict={}
for k,v in sample_genes1.items(): # should be genelist, not the global variable, you're not using the genelist parameter at all
And you're not using the values of sample_genes1 at all in your code.
Adding to Jean's answer, your get_common_gene_pairs function could be optimized by using dict comprehension as:
def get_common_gene_pairs(genelist):
return {k : list(itertools.combinations(v,2)) for k,v in genelist.items()}
list.append() is much more time consuming compared to it's list comprhension counter part. Also, you don't have to iterate over itertools.combinations(v,2) in order to convert it to list. Type-casting it to list does that.
Here is the comparison I made between list comprehension and list.append() in answer to Comparing list comprehensions and explicit loops, in case you are interested in taking a look.

Categories

Resources