Sorting python dictionary by the keys - python

>>> the_values = [u'abc', u'kfc', u'listproperty', u'models', u'new', u'newer', u'note', u'order', u'tag', u'test', u'type']
>>> the_keys = [1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1]
d2 = dict(zip(the_keys, the_values))
>>> d2
{1: u'type', 2: u'tag'}
Can you give me a clue why only "type" and "tag" are taken?
I am trying to sort the_values by the_keys.
I noticed that switching the order of the_values and the_keys works:
>>> d2 = dict(zip(the_values, the_keys))
>>> d2
{u'abc': 1, u'models': 2, u'note': 1, u'tag': 2, u'kfc': 2, u'newer': 1, u'listproperty': 1, u'test': 1, u'new': 1, u'type': 1, u'order': 1}
Thanks.

Keys must be unique, so using 1 and 2 as the only keys means you can only have two values associated with them.
When you're creating the dictionary, you're setting the key to correspond to a value. So first, 1 -> abc, 2 -> kfc. But then you're continually overriding the keys to give them different values. In the end, only the latest values for the keys are kept (1 -> type, 2-> tag).

As others have said, keys must be unique. In order to get around that, you must not use a dictionary.
>>> [x[1] for x in sorted(zip(the_keys, the_values), key=operator.itemgetter(0))]
[u'abc', u'listproperty', u'new', u'newer', u'note', u'order', u'test', u'type', u'kfc', u'models', u'tag']

Because a dictionary, by definition, has unique keys. (Otherwise, how would it know which value to return when you look up the key?) The dict constructor iterates through the key-value pairs and assigns the value to the corresponding key in the dictionary, overwriting the previous value for keys that are already present.

Related

how append (key,value) with loop on python

I want to create a new dict with a loop but I don't find the way to push key and value in loop with append. I try something like this but I'm still searching the good way.
frigo = {"mangue" : 2, "orange" : 8, "cassoulet" : 1, "thon" : 2, "coca" : 8, "fenouil" : 1, "lait" : 3}
new_frigo = {}
for i, (key, value) in enumerate(frigo.items()):
print(i, key, value)
new_frigo[i].append{key,value}
There's already a python function for that:
new_frigo.update(frigo)
No need for a loop! dict.update(other_dict) just goes and adds all content of the other_dict to the dict.
Anyway, if you wanted for some reason to do it with a loop,
for key, value in frigo.items():
new_frigo[key] = value
would do that. Using an i here makes no sense - a dictionary new_frigo doesn't have indices, but keys.
You can use update to append the key and values in the dictionary as follows:
frigo = {"mangue": 2, "orange": 8, "cassoulet": 1, "thon": 2, "coca": 8, "fenouil": 1, "lait": 3}
new_frigo = {}
for (key, value) in frigo.items():
new_frigo.update({key:value})
print(new_frigo)
Result:
{'mangue': 2, 'orange': 8, 'cassoulet': 1, 'thon': 2, 'coca': 8, 'fenouil': 1, 'lait': 3}

connecting two dictionaries and storing it into an RDD

I have a dictionary users with 1748 elements as (showing only the first 12 elements)-
defaultdict(int,
{'470520068': 1,
'2176120173': 1,
'145087572': 3,
'23047147': 1,
'526506000': 1,
'326311693': 1,
'851106379': 4,
'161900469': 1,
'3222966471': 1,
'2562842034': 1,
'18658617': 1,
'73654065': 4,})
and another dictionary partition with 452743 elements as(showing first 42 elements)-
{'609232972': 4,
'975151075': 4,
'14247572': 4,
'2987788788': 4,
'3064695250': 2,
'54097674': 3,
'510333371': 0,
'34150587': 4,
'26170001': 0,
'1339755391': 3,
'419536996': 4,
'2558131184': 2,
'23068646': 6,
'2781517567': 3,
'701206260771905541': 4,
'754263126': 4,
'33799684': 0,
'1625984816': 4,
'4893416104': 3,
'263520530': 3,
'60625681': 4,
'470528618': 3,
'4512063372': 6,
'933683112': 3,
'402379005': 4,
'1015823005': 2,
'244673821': 0,
'3279677882': 4,
'16206240': 4,
'3243924564': 6,
'2438275574': 6,
'205941266': 3,
'330723222': 1,
'3037002897': 0,
'75454729': 0,
'3033154947': 6,
'67475302': 3,
'922914019': 6,
'2598199242': 6,
'2382444216': 3,
'1388012203': 4,
'3950452641': 5,}
The keys in users(all unique) are all in partition and also are repeated with different values(and also partition contains some extra keys which is not of our use). What I want is a new dictionary final which connects the keys of users matching with those of partition with the values of partition, i.e. if I have '145087572' as a key in users and the same key has been repeated twice or thrice in partition with different values as: {'145087572':2, '145087572':3,'145087572':7} then I should get all these three elements in the new dictionary final. Also I have to store this dictionary as a key:value RDD.
Here's what I tried:
user_key=list(users.keys())
final=[]
for x in user_key:
s={x:partition.get(x) for x in partition}
final.append(s)
After running this code my laptop stops to respond (the code still shows [*]) and I have to restart it. May I know that is there any problem with my code and a more efficient way to do this.
First dictionary cannot hold duplicate keys, duplicate key's value will be ovewritten by the last value of same key.
Now lets analyze your code
user_key=list(users.keys()) # here you get all the keys say(1,2,3)
final=[]
for x in user_key: #you are iterating over the keys so x will be 1, 2, 3
s={x:partition.get(x) for x in partition} #This is the reason for halting
''' breaking the above line this is what it looks like.
s = {}
for x in partition:
s[x] = partition.get(x)
isn't the outer forloop and inner forloop is using the same variable x
so basically instead of iterating over the keys of users you are
iterating over the keys of partition table,
as x is updated inside inner foorloop(so x contains the keys of partition
table).
'''
final.append(s)
Now the reason for halting is (say you have 10 keys in users dictionary).
so outer forloop will iterate 10 times and for the 10 times
Inner forloop will iterate over whole partition keys and make a copy
which is causing memory error and eventually your system gets hung due to out of memory.
I think this will work for you
store partition data in a python defaultdict(list)
from collections import defaultdict
user_key = users.keys()
part_dict = defaultdict(list)
# partition = [[key1, value], [key2, value], ....]
# store your parition data in this way (list inside list)
for index in parition:
if index[0] not in part_dict:
part_dict[index[0]] = index[1]
else:
part_dict[index[0]].append(index[1])
# part_dict = {key1:[1,2,3], key2:[1,2,3], key3:[4,5],....}
final = []
for x in user_keys:
for values in part_dict[x]:
final.append([x, values])
# if you want your result of dictionary format(I don't think it's required) then you ca use
# final.append({x:values})
# final = [{key1: 1}, {key2: 2}, ....]
# final = [[key1, 1], [key1, 2], [key1, 3], .....]
The above code is not tested, some minor changes may be required

How to check check a key's value within an if statement

I hope you are all well.
This is how my data looks:
dictionary1 = {2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
data1 = [[2876, 5423],[2312, 4532],[953997, 5643]...]
I am trying to run a statement that looks like this:
for y in data1:
if y[0] in dictionary1 and dictionary1[y[0]] == 1:
dictionary1[y[1]] = 2
Presumably this would create a new dataset looking like this:
dictionary1 = {5423: 2, 953997: 2, 2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
What am I doing wrong? Is dictionary1[y[0]] == 1 the correct way to check a key's value?
Thank you everyone.
Dictionary comprehension converts the list of lists to a dictionary:
dict1 = {t[0]:t[1:] for t in dictionary1}
Then it should be easy to do what you want:
for y in data1:
if y in dict1 and dict1[y] ==1:
dictionary1[y] = 2
You can use dict.get(key, default) to avoid an exception for missing values, and provide a safe default. This reduces your loop to a single condition:
#!python3
dictionary1 = {2876: 1, 9212: 1, 953997: 1, 9205: 1, 9206: 1, 9207: 1, 9208: 1, 9209: 1, 9210: 1, 9211: 1, 6908: 1, 1532: 1, 945237: 1, 6532: 2, 6432: 4}
data1 = [[2876, 5423],[2312, 4532],[953997, 5643]]
for x,y in data1:
if dictionary1.get(x, 0) == 1:
dictionary1[y] = 2
print(dictionary1)
You could use dict.update(other) to bulk-overwrite the values in dictionary1 with a one-liner dict comprehension:
dictcompr = {b:2 for a,b in data1 if dictionary1.get(a,0) == 1}
dictionary1.update(dictcompr)
And then you can combine them into one single, unholy, unmaintainable, barely-readable mess:
dictionary1.update({b:2 for a,b in data1 if dictionary1.get(a,0) == 1})
Update:
To delete all keys having a value of 1, you have some choices:
for k,v in dictionary1.items():
if v == 1:
del dictionary1[k]
# Versus:
d2 = dict(filter(lambda item: item[1] != 1, dictionary1.items()))
dictionary1 = d2
# or
dictionary1.clear()
dictionary1.update(d2)
Frankly, for your purposes the for loop is probably better. The filter approach can take the lambda as a parameter, to configure what gets filtered. Using clear()/update() is a win if you expect multiple references to the dictionary. That is, A = B = dictionary1. In this case, clear/update would keep the same underlying object, so the linkage still holds. (This is also true of the for loop - the benefit is solely for the filter which requires a temporary.)
please try this,
for y in data1:
if y[0] in dictionary1.keys() and dictionary1.keys() == y[0]:
dictionary1[y[1]] = 2
u can simply use
for y in data1:
if dictionary1.has_key(y[0]):
dictionary1[y[1]] = 2
Hope this is what u r looking for .

list union with duplicates

I need to unite two lists in Python3,where duplicates can exist,and for one set of these the resulting list will contain as many as max in both lists.An example might clarify it:
[1,2,2,5]( some operator)[2,5,5,5,9]=[1,2,2,5,5,5,9]
Ideas?
You can use the collections.Counter class:
>>> from collections import Counter
>>> combined = Counter([1,2,2,5]) | Counter([2,5,5,5,9])
>>> list(combined.elements())
[1, 2, 2, 5, 5, 5, 9]
It functions as a multiset (an unordered collection where each element can appear multiple times). The | operator gives you the union of the multisets, where each element appears max(apperances_in_counter1, appearances_in_counter2) times.
This class was added in Python 2.7 and 3.1.
Why use lists in the first place? That data looks like a dict to me:
[1,2,2,5] -> {1: 1, 2: 2, 5: 1}
[2,5,5,5,9] -> {2: 1, 5: 3, 9: 1}
Then it's simple:
keys = set(a.keys()+b.keys())
vals = [max(a.get(n, 0), b.get(n, 0)) for n in keys]
d = dict(zip(keys, vals))
print d
Result:
{1: 1, 2: 2, 5: 3, 9: 1}
Convert arrays to dictionaries with a[key] = count
Create new dictionary with rules c[key] = a.get(key, 0) > b.get(key, 0) and a[key] or b[key]. You need to iterate through both keys in a and in b dicts.
Expand dictionary, result += [value] * key

Why list as key dictionary, will still show itself as tuple as key dictionary

When I define a dictionary which use list as key
collections.defaultdict(list)
When I print it out, it shows itself is using tuple as key.
May I know why?
import collections
tuple_as_dict_key = collections.defaultdict(tuple)
tuple_as_dict_key['abc', 1, 2] = 999
tuple_as_dict_key['abc', 3, 4] = 999
tuple_as_dict_key['abc', 5, 6] = 888
# defaultdict(<type 'tuple'>, {('abc', 5, 6): 888, ('abc', 1, 2): 999, ('abc', 3, 4): 999})
print tuple_as_dict_key
list_as_dict_key = collections.defaultdict(list)
list_as_dict_key['abc', 1, 2] = 999
list_as_dict_key['abc', 3, 4] = 999
list_as_dict_key['abc', 5, 6] = 888
# defaultdict(<type 'list'>, {('abc', 5, 6): 888, ('abc', 1, 2): 999, ('abc', 3, 4): 999})
# Isn't it should be defaultdict(<type 'list'>, {['abc', 5, 6]: 888, ...
print list_as_dict_key
The parameter to defaultdict is not the type of the key, it is a function that creates default data. Your test cases don't exercise this because you're filling the dict with defined values and not using any defaults. If you were to try to get the value list_as_dict_key['abc', 7, 8] it would return an empty list, since that is what you defined as a default value and you never set the value at that index.
When you're adding values to your dictionary you're doing it the same way in both cases and they're treated as a tuple. What you're passing to the constructor is the default value for any keys that are not present. Your default value in this case happens to be of type "type", but that has absolutely nothing to do with how other keys are treated.
There's a nice article explaining the answer to why you can't use a list as key here.
Dictionary keys can only be immutable types. Since a list is a mutable type it must be converted to an immutable type such as a tuple to be used as a dictionary key, and this conversion is being done automatically.
defaultdict is not setting the key as a list. It's setting the default value.
>>> from collections import defaultdict
>>> d1 = collections.defaultdict(list)
>>> d1['foo']
[]
>>> d1['foo'] = 37
>>> d1['foo']
37
>>> d1['bar']
[]
>>> d1['bar'].append(37)
>>> d1['bar']
[37]
The way that you're getting a tuple as the key type is normal dict behaviour:
>>> d2 = dict()
>>> d2[37, 19, 2] = [14, 19]
>>> d2
{(37, 19, 2): [14, 19]}
The way Python works with subscripting is that a is a, a, b is a tuple, a:b is a slice object. See how it works with a list:
>>> mylist = [1, 2, 3]
>>> mylist[4, 5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
It's taken 4, 5 as a tuple. The dict has done the same.

Categories

Resources