Faster way to add values to existing dictionary? - python

I have a dictionary of dictionaries of dictionaries
#Initialize the dictionary
myDict=dict()
for f in ncp:
myDict[f]={}
for t in ncp:
myDict[f][t] = {}
And now I go through and add a value to the lowest level (which happens to be a dictionary key and value of None), like so, but my current method is very slow
for s in subsetList:
stIndex = 0
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))):
myDict[f][t]['st_'+str(stIndex)]=None
stIndex+=1
I try to do it with principles of comprehension, but I fail miserably because the examples I find for comprehension are creating the dictionary, not iterating through an already existing one to add. My attempt to do so wont even 'compile':
myDict[f][t]['st_'+str(stIndex)]
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))) = None

I would write your code like this:
myDict = {}
for i, s in enumurate(subsetList):
tpl = ('st_%d' % (i,), None) # Used to create a new {'st_n': None} later
x = allNodes.intersection(s)
for f in x:
myDict[f] = {}
for t in allNodes.difference(x):
myDict[f][t] = dict([tpl])
This cuts down on the number of new objects you need to create, as well as initializing myDict on-demand.

This should be faster...
from itertools import product
from collections import defaultdict
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = {}
for s in subsetList:
myDict[f][t]['st_'+str(stIndex)] = None
Or if the innermost key level is the same each time...
from itertools import product
from collections import defaultdict
innerDict = {}
for s in subsetList:
innerDict['st_'+str(stIndex)] = None
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = innerDict.copy()
But I'm not sure whether creating a copy of the innermost dictionary is faster than iterating through your subsetList and creating the new dictionary each time. You'd need to time the two options.

Answering my own question here with a theory on best approach after much trial: The final result is myDict and it is a function of 2 elements: allNodes and subsetList, both of which are effectively static tables imported from SQL at the start of my program. So, why not calculate myDict once and store it in SQL and import it also. So instead of rebuilding it every time the program runs which takes 2 minutes, it is just a couple second pyodbc read. I know its kind of a cop out, but it works for the time being.

Related

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!
ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!

Appending to list in Python dictionary [duplicate]

This question already has answers here:
Initialize List to a variable in a Dictionary inside a loop
(2 answers)
Closed 8 years ago.
Is there a more elegant way to write this code?
What I am doing: I have keys and dates. There can be a number of dates assigned to a key and so I am creating a dictionary of lists of dates to represent this. The following code works fine, but I was hoping for a more elegant and Pythonic method.
dates_dict = dict()
for key, date in cur:
if key in dates_dict:
dates_dict[key].append(date)
else:
dates_dict[key] = [date]
I was expecting the below to work, but I keep getting a NoneType has no attribute append error.
dates_dict = dict()
for key, date in cur:
dates_dict[key] = dates_dict.get(key, []).append(date)
This probably has something to do with the fact that
print([].append(1))
None
but why?
list.append returns None, since it is an in-place operation and you are assigning it back to dates_dict[key]. So, the next time when you do dates_dict.get(key, []).append you are actually doing None.append. That is why it is failing. Instead, you can simply do
dates_dict.setdefault(key, []).append(date)
But, we have collections.defaultdict for this purpose only. You can do something like this
from collections import defaultdict
dates_dict = defaultdict(list)
for key, date in cur:
dates_dict[key].append(date)
This will create a new list object, if the key is not found in the dictionary.
Note: Since the defaultdict will create a new list if the key is not found in the dictionary, this will have unintented side-effects. For example, if you simply want to retrieve a value for the key, which is not there, it will create a new list and return it.
Is there a more elegant way to write this code?
Use collections.defaultdict:
from collections import defaultdict
dates_dict = defaultdict(list)
for key, date in cur:
dates_dict[key].append(date)
dates_dict[key] = dates_dict.get(key, []).append(date) sets dates_dict[key] to None as list.append returns None.
In [5]: l = [1,2,3]
In [6]: var = l.append(3)
In [7]: print var
None
You should use collections.defaultdict
import collections
dates_dict = collections.defaultdict(list)

Multiple levels dictionary python

I spent my morning reading similar questions/answers (What is the best way to implement nested dictionaries?, Multiple levels of keys and values in Python, Python: How to update value of key value pair in nested dictionary?) but I'm still not able to solve the problem.
I have this tab dictionary with a tuple as key and I want as values: an integer, a dictionary, another dictionary and some lists. Then for every key, something like this: (str,str,str,str):{int, {}, {}, [], [] ...}
I want to be able to update these values structures and I need defaultdict because I don't know all the keys and anyway they are too much to be declared one by one manually.
I'm able to do this for a structure like this (str,str,str,str):{int} in this way:
tab=defaultdict(lambda: defaultdict(int))
tab[key][0]+=1
Or for a structure like this (str,str,str,str):{{}, {}} in this way:
tab=defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
tab[key][1][str]+=1
tab[key][2][str]+=1
But not for what I really need.
Thank you!
Ok, thank to #RemcoGerlich I'm trying to fix the problem, but I never used class before and maybe there's still something wrong in my code... Btw the int is a counter, the two dictionary have ip addresses like keys and the number of occurrences as values.
class flux(object):
def __init__(self, count_flux=0, ip_c_dict=None, ip_s_dict=None):
self.count_flux = count_flux
self.ip_c_dict = ip_c_dict if ip_c_dict is not None else {}
self.ip_s_dict = ip_s_dict if ip_s_dict is not None else {}
def log_to_dict(dir_file,dictionary):
f = gzip.open(dir_file,'r')
for line in f:
line = line.strip('\n')
if not line: break
elements = line.split(" ")
key=elements[40],elements[18],elements[41],elements[37]
dictionary[key].count_flux+=1
dictionary[key].ip_c_dict[elements[0]]+=1
dictionary[key].ip_s_dict[elements[19]]+=1
###Main
tab=defaultdict(flux)
log_to_dict('/home/-/-.txt',tab)
I would create a class for your values, it's obviously complicated.
class YourClass(object):
def __init__(self, anint=0, adict=None, anotherdict=None, somelists=None):
self.anint = anint
self.adict = adict if adict is not None else {}
self.anotherdict = anotherdict if anotherdict is not None else {}
self.somelists = somelists if somelists is not None else []
(don't use {} or [] as default arguments, that leads to them being shared between all instances).
Then you can use a defaultdict(YourClass) and also set things like tab[key].anotherdict[str] ...

How to go from a values_list to a dictionary of lists

I have a django queryset that returns a list of values:
[(client pk, timestamp, value, task pk), (client pk, timestamp, value, task pk),....,].
I am trying to get it to return a dictionary of this format:
{client pk:[[timestamp, value],[timestamp, value],...,], client pk:[list of lists],...,}
The values_list may have multiple records for each client pk. I have been able to get dictionaries of lists for client or task pk using:
def dict_from_vl(vls_list):
keys=[values_list[x][3] for x in range(0,len(values_list),1)]
values = [[values_list[x][1], values_list[x][2]] for x in range(0,len(values_list),1)]
target_dict=dict(zip(keys,values))
return target_dict
However using this method, values for the same key write over previous values as it iterates through the values_list, rather than append them to a list. So this works great for getting the most recent if the values list is sorted oldest records to newest, but not for the purpose of creating a list of lists for the dict value.
Instead of target_dict=dict(zip(keys,values)), do
target_dict = defaultdict(list)
for i, key in enumerate(keys):
target_dict[k].append(values[i])
(defaultdict is available in the standard module collections.)
from collections import defaultdict
d = defaultdict(list)
for x in vls_list:
d[x].append(list(x[1:]))
Although I'm not sure if I got the question right.
I know in Python you're supposed to cram everything into a single line, but you could do it the old fashioned way...
def dict_from_vl(vls_list):
target_dict = {}
for v in vls_list:
if v[0] not in target_dict:
target_dict[v[0]] = []
target_dict[v[0]].append([v[1], v[2]])
return target_dict
For better speed, I suggest you don't create the keys and values lists separately but simply use only one loop:
tgt_dict = defaultdict(list)
for row in vas_list:
tgt_dict[row[0]].append([row[1], row[2]])

Categories

Resources