Run calculation multiple times with different values - python

I have two dictionaries, that look like:
dict1 = {1: 10, 2: 23, .... 999: 12}
dict2 = {1: 42, 2: 90, .... 999: 78}
I want to perform a simple calculation: Multiply value of dict1 with value of dict2 for 1 and 2 each.
The code so far is:
dict1[1] * dict2[1]
This calculates 10*42, which is exactly what i want.
Now i want to perform this calculation for every index in the dictionary, so for 1 up to 999.
I tried:
i = {1,2,3,4,5,6 ... 999}
dict1[i] * dict2[i]
But it didnt work.

This creates a new dict with the results:
out = { i: dict1[i] * dict2[i] for i in range(1,1000) }
If you need to work with vectors and matrices take a look at the numpy module. It has data structures and a huge collection of tools for working with them.

Related

Python summing up values in a nested dictionary

I have a dictionary P which represents a dictionary within a dictionary within a dictionary. It looks something like this.
P={key1:{keyA:{value1: 1, value2:3}, keyB:{value1:3,value2:4}},
key2:{keyA:{value1: 1, value2:3}, keyB:{value1:3,value2:4}}, key3{...:{...:}}}
What I am trying to do is to write each value of value1,value 2 in terms of their percentages of the totalPopulation from whichever is there base key.
For example key1 should look like
key1:{keyA:{value1: 1/(1+3+3+4), value2:3/(1+3+3+4)}, keyB:
{value1:3/(1+3+3+4),value2:4/(1+3+3+4)}
What I am not sure about is how to iterate over this dictionary and only collect the innermost values of a certain key so I can then sum up all the values and divide each value by that sum.
This can be done in single line using dict comprehension and map like this:
#from __future__ import division # use for Python 2.x
p = {"key1":{"keyA":{"value1": 1, "value2":3}, "keyB":{"value1":3,"value2":4}}}
p = {kOuter:{kInner:{kVal: vVal/sum(map(lambda x: sum(x.values()), vOuter.values())) for kVal, vVal in vInner.iteritems()} for kInner, vInner in vOuter.iteritems()} for kOuter, vOuter in p.iteritems()}
A more readable version of above :
p = {
kOuter:{
kInner:{
kVal: vVal/sum(map(lambda x: sum(x.values()), vOuter.values())) for kVal, vVal in vInner.iteritems()
}
for kInner, vInner in vOuter.iteritems()
}
for kOuter, vOuter in p.iteritems()
}
OUTPUT
>>> p
>>>
{'key1': {'keyB': {'value2': 0.36363636363636365, 'value1': 0.2727272727272727}, 'keyA': {'value2': 0.2727272727272727, 'value1': 0.09090909090909091}}}
The only problem with this is that the sum is calculated repeatedly, you can fix that by calculating the sum for each of your key1, key2... before this dict comprehension and use the stored values instead, like this :
keyTotals = {kOuter:sum(map(lambda x: sum(x.values()), vOuter.values())) for kOuter, vOuter in p.iteritems()}
and then you can simply access the sums calculated above by keys, like this:
p = {kOuter:{kInner:{kVal: vVal/keyTotals[kOuter] for kVal, vVal in vInner.iteritems()} for kInner, vInner in vOuter.iteritems()} for kOuter, vOuter in p.iteritems()}
test = {"key1":{"keyA":{"value1": 1, "value2":3}, "keyB":{"value1":3,"value2":4}}}
for a in test:
s = 0
for b in test[a]:
for c in test[a][b]:
s += test[a][b][c]
print(s)
for b in test[a]:
for c in test[a][b]:
test[a][b][c] = test[a][b][c] / s
This should do what you want. I've only included "key1" in this example.

How do I split a dictionary into specific number of smaller dictionaries using python?

I am trying to split up a large dictionary into n number of smaller dictionaries. Each dictionary entry contains a webaddress, and the point of splitting up the dictionary is so webscraping of these addresses can be spread over several computers.
The dictionary is in the form:
{
u'25637293':
[u'Methyldopa',u'http://www.ncbi.nlm.nih.gov/pubmed/25637293', 43579],
u'25672666':
[u'Furosemide', u'http://www.ncbi.nlm.nih.gov/pubmed/25672666', 40750]
}
with 13000 key/value pairs.
The last entry in the value is an index from 0 to 13000
This is what I have tried. (although I have probably overcomplicated things)
1) Create a list of 13000 values
2) Split this by n amount
3) Ensure that the dictionary has an entry of 1-13000
4) Iterate through the list. if (i in list == the entry of dictionary) then the web address can be extracted for scraping (last part not included in the code)
smalldict={}
#create a list from 0-13000 and split it into dictionaries of n number
def chunks(l, n):
n = max(1, n)
return [l[i:i + n] for i in range(0, len(l), n)]
#here I am inserting the values for the number of computers and how many dictionaries the big dictionary needs to be divided into
number = len(dictionary)
#entry for the number of dictionaries to divide it into
computers =4
#this is the 'name' of the computer that is running the script
compno = 1
#-1 because of 0 indexing
compm=compno-1
listlength = number/computers
divider= range(number)
division = chunks(divider, listlength)
for entry in dictionary:
#get all of the values from the value
value=dictionary[entry]
#specify the smaller dictionary that will be created
for i in division[compm]:
#if the number up to 13000 is in the dictionary
if i == value[2]
smalldict[value[1]]=value
I would have thought len(smalldict) would have been 13000/4 (since len(dictionary) is 13000, and len(division[0]) when there is only one list in the division) but it returns a few hundred only. It is not splitting as it supposed to.
I've been working on it for a lot of days. Can anyone help?
Easy. Do it this way. For example, we have a dictionary with five keys and we want to split it into two dictionaries with even sizes.
>>> d = {'key1': 1, 'key2': 2, 'key3': 3, 'key4': 4, 'key5': 5}
>>> d1 = dict(d.items()[len(d)/2:])
>>> d2 = dict(d.items()[:len(d)/2])
>>> print d1
{'key1': 1, 'key5': 5, 'key4': 4}
>>> print d2
{'key3': 3, 'key2': 2}
This gist worked for me: https://gist.github.com/miloir/2196917
I re-wrote it with itertools.cycle.
import itertools
def split_dict(x, chunks):
i = itertools.cycle(range(chunks))
split = [dict() for _ in range(chunks)]
for k, v in x.items():
split[next(i)][k] = v
return split

Link two dictionaries lists and calculate average values

I am not used to code with Python, but I have to do this one with it. What I am trying to do, is something that would reproduce the result of SQL statment like this :
SELECT T2.item, AVG(T1.Value) AS MEAN FROM TABLE_DATA T1 INNER JOIN TABLE_ITEMS T2 ON T1.ptid = T2.ptid GROUP BY T2.item.
In Python, I have two lists of dictionnaries, with the common key 'ptid'. My dctData contains around 100 000 pdit and around 7000 for the dctItems. Using a comparator like [i for i in dctData for j in dctItems if i['ptid'] == j['ptid']] is endless:
ptid = 1
for line in lines[6:]: # Skipping header
data = line.split()
for d in data:
dctData.append({'ptid' : ptid, 'Value': float(d)})
ptid += 1
dctData = [{'ptid':1,'Value': 0}, {'ptid':2,'Value': 2}, {'ptid':3,'Value': 2}, {'ptid':4,'Value': 5}, {'ptid':5,'Value': 3}, {'ptid':6,'Value': 2}]
for line in lines[1:]: # Skipping header
data = line.split(';')
dctItems.append({'ptid' : int(data[1]), 'item' : data[3]})
dctItems = [{'item':21, 'ptid':1}, {'item':21, 'ptid':2}, {'item':21, 'ptid':6}, {'item':22, 'ptid':2}, {'item':22, 'ptid':5}, {'item':23, 'ptid':4}]
Now, what I would like to get for result, is a third list that would present the average values according to each item in dctItems dictionnary, while the link between the two dictionnaries would be based on the 'pdit' value.
Where for example with the item 21, it would calculate the mean value of 1.3 by getting the values (0, 2, 2) of the ptid 1, 2 and 6:
And finally, the result would look something like this, where the key Value represents the mean calculated :
dctResults = [{'id':21, 'Value':1.3}, {'id':22, 'Value':2.5}, {'id':23, 'Value':5}]
How can I achieve this?
Thanks you all for your help.
Given those data structures that you use, this is not trivial, but it will become much easier if you use a single dictionary mapping items to their values, instead.
First, let's try to re-structure your data in that way:
values = {entry['ptid']: entry['Value'] for entry in dctData}
items = {}
for item in dctItems:
items.setdefault(item['item'], []).append(values[item['ptid']])
Now, items has the form {21: [0, 2, 2], 22: [2, 3], 23: [5]}. Of course, it would be even better if you could create the dictionary in this form in the first place.
Now, we can pretty easily calculate the average for all those lists of values:
avg = lambda lst: float(sum(lst))/len(lst)
result = {item: avg(values) for item, values in items.items()}
This way, result is {21: 1.3333333333333333, 22: 2.5, 23: 5.0}
Or if you prefer your "list of dictionaries" style:
dctResult = [{'id': item, 'Value': avg(values)} for item, values in items.items()]

Summing up numbers in a defaultdict(list)

I've been experimenting trying to get this to work and I've exhausted every idea and web search. Nothing seems to do the trick. I need to sum numbers in a defaultdict(list) and i just need the final result but no matter what i do i can only get to the final result by iterating and returning all sums adding up to the final. What I've been trying generally,
d = { key : [1,2,3] }
running_total = 0
#Iterate values
for value in d.itervalues:
#iterate through list inside value
for x in value:
running_total += x
print running_total
The result is :
1,3,6
I understand its doing this because its iterating through the for loop. What i dont get is how else can i get to each of these list values without using a loop? Or is there some sort of method iv'e overlooked?
To be clear i just want the final number returned e.g. 6
EDIT I neglected a huge factor , the items in the list are timedealta objects so i have to use .seconds to make them into integers for adding. The solutions below make sense and I've tried similar but trying to throw in the .seconds conversion in the sum statement throws an error.
d = { key : [timedelta_Obj1,timedelta_Obj2,timedelta_Obj3] }
I think this will work for you:
sum(td.seconds for sublist in d.itervalues() for td in sublist)
Try this approach:
from datetime import timedelta as TD
d = {'foo' : [TD(seconds=1), TD(seconds=2), TD(seconds=3)],
'bar' : [TD(seconds=4), TD(seconds=5), TD(seconds=6), TD(seconds=7)],
'baz' : [TD(seconds=8)]}
print sum(sum(td.seconds for td in values) for values in d.itervalues())
You could just sum each of the lists in the dictionary, then take one final sum of the returned list.
>>> d = {'foo' : [1,2,3], 'bar' : [4,5,6,7], 'foobar' : [10]}
# sum each value in the dictionary
>>> [sum(d[i]) for i in d]
[10, 6, 22]
# sum each of the sums in the list
>>> sum([sum(d[i]) for i in d])
38
If you don't want to iterate or to use comprehensions you can use this:
d = {'1': [1, 2, 3], '2': [3, 4, 5], '3': [5], '4': [6, 7]}
print(sum(map(sum, d.values())))
If you use Python 2 and your dict has a lot of keys it's better you use imap (from itertools) and itervalues
from itertools import imap
print sum(imap(sum, d.itervalues()))
Your question was how to get the value "without using a loop". Well, you can't. But there is one thing you can do: use the high performance itertools.
If you use chain you won't have an explicit loop in your code. chain manages that for you.
>>> data = {'a': [1, 2, 3], 'b': [10, 20], 'c': [100]}
>>> import itertools
>>> sum(itertools.chain.from_iterable(data.itervalues()))
136
If you have timedelta objects you can use the same recipe.
>>> data = {'a': [timedelta(minutes=1),
timedelta(minutes=2),
timedelta(minutes=3)],
'b': [timedelta(minutes=10),
timedelta(minutes=20)],
'c': [timedelta(minutes=100)]}
>>> sum(td.seconds for td in itertools.chain.from_iterable(data.itervalues()))
8160

How can I mimic a space efficient version of dict in Python?

I am working on a dictionary structure where I have a dictionary of documents and each document has a dictionary of words (where each key is word_id (integer) and values are counts) such that:
document_dict = { "doc1": {1:2, 2:10, 10:2, 100: 1}, "doc2": {10:2, 20:10, 30:2, 41: 19},...}
Note that the inner dictionaries are pretty sparse, so even though I have 250K words, I don't expect to have more than 1K keys per document.
In each iteration, I need to sum up a dict of words:counts to one of the documents, e.g. I need to union a new dict of {1:2, 2:10, 10:2, 120: 1} to "doc1": {1:2, 2:10, 10:2, 100: 1}.
Right now, my implementation runs quite fast, however after 2 hours it runs out of memory (I am using a 40GB server).
The way I was summing up the keys was something like this:
Assume that new_dict is the new word:count pairs that I want to add to doc1 such as:
new_dict = {1:2, 2:10, 10:2, 120: 1}
doc1 = {1:2, 2:10, 10:2, 100: 1}
for item in new_dict:
doc1[item] = doc1.get(item, 0) + new_dict[item]
Then since it was simply impossible to run the code with dictionaries because my dicts get quite large in a very short time, I tried to implement dictionaries as a list of 2 lists: e.g. doc1 = [[],[]] where first list keeps the keys and second key keeps the values.
Now when I want to union 2 structure like this, I first try to get the index of each item of new_dict in doc1. If I successfully obtain an index, it means the key is already in the doc1 so I can just update the corresponding value. Otherwise, it is not in the doc1 yet, so I am append()ing the new key and value to the end of the lists. However this approach runs extremely slow (in dict version, I was able to process up to 600K documents in 2 hours, now I could only processed 250K documents in 15 hours).
So my question is: If I want to use a dictionary structure (key, val) pairs where I need to union keys of 2 dicts and sum their values in each iteration, is there a way to implement this more space efficiently?
It's not necessarily more space efficient, but I would suggest switching to a disk-based dictionary by using the shelve module so you don't have to have the entire dictionary in memory at once.
They're very easy to use since they support the familiar dictionary interface, as shown below:
import shelve
document_dict = shelve.open('document_dict', writeback=True)
document_dict.update({"doc1": {1:2, 2:10, 10:2, 100: 1},
"doc2": {10:2, 20:10, 30:2, 41: 19},
"doc3": {1:2, 2:10, 10:2, 100: 1},})
new_dict = {1:2, 2:10, 10:2, 120: 1}
doc = document_dict.get("doc3", {}) # get current value, if any
for item in new_dict:
doc[item] = doc.get(item, 0) + new_dict[item] # update version in memory
document_dict["doc3"] = doc # write modified (or new) entry to disk
document_dict.sync() # clear cache
print document_dict
document_dict.close()
Output:
{'doc2': {41: 19, 10: 2, 20: 10, 30: 2},
'doc3': {120: 1, 1: 4, 2: 20, 100: 1, 10: 4},
'doc1': {1: 2, 2: 10, 100: 1, 10: 2}}

Categories

Resources