I have a rather algorithmic question. I’m using Python so that would be a bonus but in essence it’s about how to approach a problem.
I have a nested dictionary where for each customer, there are their relations with another customers. Basically if they enjoy same things, they are more compatible. The way this is calculated is that, for first customer the compatibility is calculated with every other customers, but for the second customer’s loop first one is skipped because it had been calculated already. You end up a dictionary something like this
Custdict={‘1’:{‘2’:1,’3’:0,’4’:3},’2’:{‘3’:1,’4’:2}…}
So for the last customer let’s say 10th, there is no entry for it as a key/value pair since it’s calculated in previous ones. My question is, how can I obtain this data from previous ones and add them as key/values in later ones. So above dictionary should become
Custdict={‘1’:{‘2’:1,’3’:0,’4’:3},’2’:{‘1’:1,‘3’:1,’4’:2}…}
I did something online search to see if there is such algorithm but couldn’t find anything
As a simple solution, you can just iterate over all values in the dictionary, and for each customer-customer pair (i, j), you also set the value for (j, i).
from collections import defaultdict
cust = {
1: {
2: 1,
3: 0
},
2: {
3: 1
}
}
new_cust = defaultdict(dict)
for customer in cust.keys():
for neighbour in cust[customer].keys():
new_cust[neighbour][customer] = cust[customer][neighbour]
new_cust[customer][neighbour] = cust[customer][neighbour]
print(dict(new_cust))
# prints {2: {1: 1, 3: 1}, 1: {2: 1, 3: 0}, 3: {1: 0, 2: 1}}
Related
the output for the code:
dict={k:v for k in range(1,4) for v in range(1,3) }
print(dict)
out put is:
{1: 2, 2: 2, 3: 2}
but thought the output should be:
{1: 1, 2: 1, 3: 1}
why is it taking 2 for the value of v.
Python lets you use the same key multiple times in a dictionary comprehension, but obviously the final dictionary can only contain the key once. The associated value is the last one you specified, as per the Python reference manual, 6.2.7 Dictionary Displays:
When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced.
I am not used to code with Python, but I have to do this one with it. What I am trying to do, is something that would reproduce the result of SQL statment like this :
SELECT T2.item, AVG(T1.Value) AS MEAN FROM TABLE_DATA T1 INNER JOIN TABLE_ITEMS T2 ON T1.ptid = T2.ptid GROUP BY T2.item.
In Python, I have two lists of dictionnaries, with the common key 'ptid'. My dctData contains around 100 000 pdit and around 7000 for the dctItems. Using a comparator like [i for i in dctData for j in dctItems if i['ptid'] == j['ptid']] is endless:
ptid = 1
for line in lines[6:]: # Skipping header
data = line.split()
for d in data:
dctData.append({'ptid' : ptid, 'Value': float(d)})
ptid += 1
dctData = [{'ptid':1,'Value': 0}, {'ptid':2,'Value': 2}, {'ptid':3,'Value': 2}, {'ptid':4,'Value': 5}, {'ptid':5,'Value': 3}, {'ptid':6,'Value': 2}]
for line in lines[1:]: # Skipping header
data = line.split(';')
dctItems.append({'ptid' : int(data[1]), 'item' : data[3]})
dctItems = [{'item':21, 'ptid':1}, {'item':21, 'ptid':2}, {'item':21, 'ptid':6}, {'item':22, 'ptid':2}, {'item':22, 'ptid':5}, {'item':23, 'ptid':4}]
Now, what I would like to get for result, is a third list that would present the average values according to each item in dctItems dictionnary, while the link between the two dictionnaries would be based on the 'pdit' value.
Where for example with the item 21, it would calculate the mean value of 1.3 by getting the values (0, 2, 2) of the ptid 1, 2 and 6:
And finally, the result would look something like this, where the key Value represents the mean calculated :
dctResults = [{'id':21, 'Value':1.3}, {'id':22, 'Value':2.5}, {'id':23, 'Value':5}]
How can I achieve this?
Thanks you all for your help.
Given those data structures that you use, this is not trivial, but it will become much easier if you use a single dictionary mapping items to their values, instead.
First, let's try to re-structure your data in that way:
values = {entry['ptid']: entry['Value'] for entry in dctData}
items = {}
for item in dctItems:
items.setdefault(item['item'], []).append(values[item['ptid']])
Now, items has the form {21: [0, 2, 2], 22: [2, 3], 23: [5]}. Of course, it would be even better if you could create the dictionary in this form in the first place.
Now, we can pretty easily calculate the average for all those lists of values:
avg = lambda lst: float(sum(lst))/len(lst)
result = {item: avg(values) for item, values in items.items()}
This way, result is {21: 1.3333333333333333, 22: 2.5, 23: 5.0}
Or if you prefer your "list of dictionaries" style:
dctResult = [{'id': item, 'Value': avg(values)} for item, values in items.items()]
This question already has answers here:
Most Pythonic Way to Build Dictionary From Single List
(2 answers)
Closed 9 years ago.
How do i add key,value to a dictionary in python?
I defined an empty dictionary and now I want to pass a bunch of keys from a list and set their value as 1. From what I did it creates me every iteration a new dictionary, but I want to append the key,value so eventually I will recieve only one dictionary.
this is my code:
def count_d(words):
count_dict={}
words_set= set(words)
for i in words_set:
count_dict[i]= 1
print (count_dict)
What you are looking for is dict.fromkeys along with dict.update.
From the docs
Create a new dictionary with keys from seq and values set to value.
Example Usage
>>> d = {}
>>> d.update(dict.fromkeys(range(10), 1))
>>> d
{0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}
Note, you need an dict.update along with dict.fromkeys, so as to update the dictionary in-place
Instead, if you want to create a dictionary and assign use the notation
>>> d = dict.fromkeys(range(10), 1)
>>> d
{0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}
Unfortunately, for in-place update, you need to create a throw-away dictionary before passing to the dict.update method. A non-intuitive method is to leverage itertools
from itertools import izip, repeat
d.update(izip(xrange(10), repeat(1)))
The same idea can be extended to OrderedDict which is often used, as an alternate for OrderedSet as standard library does not provide one.
the fromkeys method does what you want:
count_dict = dict.fromkeys(words_set, 1)
This gives count_dict it's keys from words_set, each with value of 1.
More info on fromkeys here
How do i add key,value to a dictionary in python?
Exactly as you are:
count_dict[i] = 1
From what I did it creates me every iteration a new dictionary
No it doesn't. It keeps appending to the same iteration. The problem is that you print the dictionary-so-far on every iteration.
So, you were very close to what you wanted. I believe that all you want to do is unindent the print statement, so it only happens once at the end, instead of each time through the loop:
def count_d(words):
count_dict={}
words_set= set(words)
for i in words_set:
count_dict[i]= 1
print (count_dict)
Of course it will probably be a lot more useful to return count_dict, not print it. And Abhijit's answer shows a much simpler way to do this. (Another simple way to do it: count_dict = dict(collections.Counter(words_set)). But I think dict.from_keys is more meaningful in this case, despite your variable names.)
But I think this is the part you were actually struggling with.
I am working on a dictionary structure where I have a dictionary of documents and each document has a dictionary of words (where each key is word_id (integer) and values are counts) such that:
document_dict = { "doc1": {1:2, 2:10, 10:2, 100: 1}, "doc2": {10:2, 20:10, 30:2, 41: 19},...}
Note that the inner dictionaries are pretty sparse, so even though I have 250K words, I don't expect to have more than 1K keys per document.
In each iteration, I need to sum up a dict of words:counts to one of the documents, e.g. I need to union a new dict of {1:2, 2:10, 10:2, 120: 1} to "doc1": {1:2, 2:10, 10:2, 100: 1}.
Right now, my implementation runs quite fast, however after 2 hours it runs out of memory (I am using a 40GB server).
The way I was summing up the keys was something like this:
Assume that new_dict is the new word:count pairs that I want to add to doc1 such as:
new_dict = {1:2, 2:10, 10:2, 120: 1}
doc1 = {1:2, 2:10, 10:2, 100: 1}
for item in new_dict:
doc1[item] = doc1.get(item, 0) + new_dict[item]
Then since it was simply impossible to run the code with dictionaries because my dicts get quite large in a very short time, I tried to implement dictionaries as a list of 2 lists: e.g. doc1 = [[],[]] where first list keeps the keys and second key keeps the values.
Now when I want to union 2 structure like this, I first try to get the index of each item of new_dict in doc1. If I successfully obtain an index, it means the key is already in the doc1 so I can just update the corresponding value. Otherwise, it is not in the doc1 yet, so I am append()ing the new key and value to the end of the lists. However this approach runs extremely slow (in dict version, I was able to process up to 600K documents in 2 hours, now I could only processed 250K documents in 15 hours).
So my question is: If I want to use a dictionary structure (key, val) pairs where I need to union keys of 2 dicts and sum their values in each iteration, is there a way to implement this more space efficiently?
It's not necessarily more space efficient, but I would suggest switching to a disk-based dictionary by using the shelve module so you don't have to have the entire dictionary in memory at once.
They're very easy to use since they support the familiar dictionary interface, as shown below:
import shelve
document_dict = shelve.open('document_dict', writeback=True)
document_dict.update({"doc1": {1:2, 2:10, 10:2, 100: 1},
"doc2": {10:2, 20:10, 30:2, 41: 19},
"doc3": {1:2, 2:10, 10:2, 100: 1},})
new_dict = {1:2, 2:10, 10:2, 120: 1}
doc = document_dict.get("doc3", {}) # get current value, if any
for item in new_dict:
doc[item] = doc.get(item, 0) + new_dict[item] # update version in memory
document_dict["doc3"] = doc # write modified (or new) entry to disk
document_dict.sync() # clear cache
print document_dict
document_dict.close()
Output:
{'doc2': {41: 19, 10: 2, 20: 10, 30: 2},
'doc3': {120: 1, 1: 4, 2: 20, 100: 1, 10: 4},
'doc1': {1: 2, 2: 10, 100: 1, 10: 2}}
Let us consider a dictionary:
sample_dict={1:'r099',2:'g444',3:'t555',4:'f444',5:'h666'}
I want to re-order this dictionary in an order specified by a list containing the order of the dictionary keys that I desire. Let us say the desired order list is:
desired_order_list=[5,2,4,3,1]
So, I want my dictionary to appear like this:
{5:'h666',2:'g444',4:'f444',3:'t555',1:'r099'}
If I can get a list of values that is fine too. Meaning, the result can be this:
['h666','g444','f444','t555','r099']
How do I achieve this in the least complex way possible?
Answer for Python 3.6+
Guido has assured dictionaries would be ordered from Python 3.7 onwards, and they already were as an experimental feature in 3.6. The answer has already been expanded on in Fastest way to sort a python 3.7+ dictionary.
In this case, building a new dict with simple dictionary comprehension based on the items contained in the desired_order_list will do the trick.
sample_dict = {1: 'r099', 2: 'g444', 3: 't555', 4: 'f444', 5: 'h666'}
print(sample_dict)
>>> {1: 'r099', 2: 'g444', 3: 't555', 4: 'f444', 5: 'h666'}
desired_order_list = [5, 2, 4, 3, 1]
reordered_dict = {k: sample_dict[k] for k in desired_order_list}
print(reordered_dict)
>>> {5: 'h666', 2: 'g444', 4: 'f444', 3: 't555', 1: 'r099'}
If you're using an OrderedDict, you can do
for key in [5,2,4,3,1]:
my_ordered_dict[key] = my_ordered_dict.pop(key)
This reinserts everything in your ordered dict in the sequence you want, such that later you can do
my_ordered_dict.values()
And get the list you suggested in the question.
If you wrap the reinsertion in a try: ...; except KeyError: pass, you can reorder an OrderedDict even if not all the keys in your list are present.
Python dictionaries are unordered.
Use OrderedDict instead.
Using an OrderedDict or Eli's solution will probably be a good way to go, but for reference here is a simple way to obtain the list of values you want:
[sample_dict[k] for k in desired_order_list]
If you aren't completely sure that every element from desired_order_list will be a key in sample_dict, use either [sample_dict.get(k) ...] or [... for k in desired_order_list if k in sample_dict]. The first method will put None in for missing keys, the second method will only include values from the keys are are in the dict.
What is the meaning of reordering the dictionary for you? Dictionaries are unordered data structures by their nature - they are used for lookup rather than order.
Do you want to iterate over the dictionary in some specific order? Then just use your desired_order_list:
for key in desired_order_list:
# d is the dictionary
# do stuff with d[key]
As others have mentioned, Python has an OrderedDict (in 2.7 or 3.x), but I don't think it's what you need here. "Reordering" it is just too inefficient. It's much better to just carry your dictionary along with the list of keys in desired order, together.
If you still insist on an OrderedDict, just create a new OrderedDict, inserting the value into it in the order of desired_order_list.
The existing answers more than cover the question except in the special case when the list is incomplete and we want to keep all the values leaving all the rest in the end: Then, after creating the dictionary, update it with the old one to add the missing values:
sample_dict = {1: 'r099', 2: 'g444', 3: 't555', 4: 'f444', 5: 'h666'}
print(reordered_dict)
# {1: 'r099', 2: 'g444', 3: 't555', 4: 'f444', 5: 'h666'}
desired_order_list = [5, 2 ]
reordered_dict = {k: sample_dict[k] for k in desired_order_list}
print(reordered_dict)
# {5: 'h666', 2: 'g444'}
reordered_dict.update(sample_dict)
print(reordered_dict)
# {5: 'h666', 2: 'g444', 1: 'r099', 3: 't555', 4: 'f444'}
wouldn't be easier just doing this way?
sample_dict={1:'r099',2:'g444',3:'t555',4:'f444',5:'h666'}
new_sample_dict={
1: sample_dict[5],
2: sample_dict[2],
3: sample_dict[4],
4: sample_dict[3],
5: sample_dict[1]
}
Use SortedDict provided by django (from django.utils.datastructures import SortedDict). SortedDict stores it's order in keyOrder attribute (which is just a list, so you can reorder it any way you like, anytime).
If you don't have django installed or have no use for django, just lift the implementation django.utils.datatstructures. It doesn't depend on any other parts of django.