Multiple levels dictionary python - python

I spent my morning reading similar questions/answers (What is the best way to implement nested dictionaries?, Multiple levels of keys and values in Python, Python: How to update value of key value pair in nested dictionary?) but I'm still not able to solve the problem.
I have this tab dictionary with a tuple as key and I want as values: an integer, a dictionary, another dictionary and some lists. Then for every key, something like this: (str,str,str,str):{int, {}, {}, [], [] ...}
I want to be able to update these values structures and I need defaultdict because I don't know all the keys and anyway they are too much to be declared one by one manually.
I'm able to do this for a structure like this (str,str,str,str):{int} in this way:
tab=defaultdict(lambda: defaultdict(int))
tab[key][0]+=1
Or for a structure like this (str,str,str,str):{{}, {}} in this way:
tab=defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
tab[key][1][str]+=1
tab[key][2][str]+=1
But not for what I really need.
Thank you!
Ok, thank to #RemcoGerlich I'm trying to fix the problem, but I never used class before and maybe there's still something wrong in my code... Btw the int is a counter, the two dictionary have ip addresses like keys and the number of occurrences as values.
class flux(object):
def __init__(self, count_flux=0, ip_c_dict=None, ip_s_dict=None):
self.count_flux = count_flux
self.ip_c_dict = ip_c_dict if ip_c_dict is not None else {}
self.ip_s_dict = ip_s_dict if ip_s_dict is not None else {}
def log_to_dict(dir_file,dictionary):
f = gzip.open(dir_file,'r')
for line in f:
line = line.strip('\n')
if not line: break
elements = line.split(" ")
key=elements[40],elements[18],elements[41],elements[37]
dictionary[key].count_flux+=1
dictionary[key].ip_c_dict[elements[0]]+=1
dictionary[key].ip_s_dict[elements[19]]+=1
###Main
tab=defaultdict(flux)
log_to_dict('/home/-/-.txt',tab)

I would create a class for your values, it's obviously complicated.
class YourClass(object):
def __init__(self, anint=0, adict=None, anotherdict=None, somelists=None):
self.anint = anint
self.adict = adict if adict is not None else {}
self.anotherdict = anotherdict if anotherdict is not None else {}
self.somelists = somelists if somelists is not None else []
(don't use {} or [] as default arguments, that leads to them being shared between all instances).
Then you can use a defaultdict(YourClass) and also set things like tab[key].anotherdict[str] ...

Related

How to make multiple variables from one string

I have a string containing a few variables that I would like to store.
data = '{name:ItCameFr0mmars,id:2110939,score:2088205,level:43,l
evelProgress:35,kills:18412,deaths:6821,kdr:2.70,kpg:12.03,spk:
113.42,totalGamesPlayed:1530,wins:913,loses:617,wl:0.60,playTim
e:2d 15h 1m,funds:2265,clan:TyDE,featured:No,hacker:false,follo
wing:0,followers:3,shots:117902,hits:38132,nukes:6,meleeKills:3
77,createdDate:2019-03-13,createdTime:21:38:39,lastPlayedClass:
Triggerman}'
I want to assign a variable for each bit of data. For example:
level = 43
kills = 18412
and so on.
Is there a way to do this, as each example: number would become a variable with that number stored? Also? how could I make a dictionary for it?
Here is a basic parser:
for name, val in [item.split(':', maxsplit=1) for item in data.strip("{}").split(",")]:
globals()[name] = val
print(featured)
If you want to do this in a function. Just replace globals with locals.
Usually it is better to put it into an object:
class Data():
def __init__(self, data):
for name, val in [item.split(':', maxsplit=1) for item in data.strip("{}").split(",")]:
setattr(self, name, val)
obj = Data(data)
print(obj.featured)
Why don't you make it like a dictionary like this
data = {"name":"ItCameFr0mmars","id":2110939,"score":2088205}
So you can get each value based on its key.
data["id"] will be 2110939
And if you want to print all them, you could write
for key,value in data.items():
print(key,":",value)
But I guess this is not what you wanted to do?

How to not remove duplicates automatically when using method json.loads? [duplicate]

I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)

Faster way to add values to existing dictionary?

I have a dictionary of dictionaries of dictionaries
#Initialize the dictionary
myDict=dict()
for f in ncp:
myDict[f]={}
for t in ncp:
myDict[f][t] = {}
And now I go through and add a value to the lowest level (which happens to be a dictionary key and value of None), like so, but my current method is very slow
for s in subsetList:
stIndex = 0
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))):
myDict[f][t]['st_'+str(stIndex)]=None
stIndex+=1
I try to do it with principles of comprehension, but I fail miserably because the examples I find for comprehension are creating the dictionary, not iterating through an already existing one to add. My attempt to do so wont even 'compile':
myDict[f][t]['st_'+str(stIndex)]
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))) = None
I would write your code like this:
myDict = {}
for i, s in enumurate(subsetList):
tpl = ('st_%d' % (i,), None) # Used to create a new {'st_n': None} later
x = allNodes.intersection(s)
for f in x:
myDict[f] = {}
for t in allNodes.difference(x):
myDict[f][t] = dict([tpl])
This cuts down on the number of new objects you need to create, as well as initializing myDict on-demand.
This should be faster...
from itertools import product
from collections import defaultdict
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = {}
for s in subsetList:
myDict[f][t]['st_'+str(stIndex)] = None
Or if the innermost key level is the same each time...
from itertools import product
from collections import defaultdict
innerDict = {}
for s in subsetList:
innerDict['st_'+str(stIndex)] = None
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = innerDict.copy()
But I'm not sure whether creating a copy of the innermost dictionary is faster than iterating through your subsetList and creating the new dictionary each time. You'd need to time the two options.
Answering my own question here with a theory on best approach after much trial: The final result is myDict and it is a function of 2 elements: allNodes and subsetList, both of which are effectively static tables imported from SQL at the start of my program. So, why not calculate myDict once and store it in SQL and import it also. So instead of rebuilding it every time the program runs which takes 2 minutes, it is just a couple second pyodbc read. I know its kind of a cop out, but it works for the time being.

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

How to go from a values_list to a dictionary of lists

I have a django queryset that returns a list of values:
[(client pk, timestamp, value, task pk), (client pk, timestamp, value, task pk),....,].
I am trying to get it to return a dictionary of this format:
{client pk:[[timestamp, value],[timestamp, value],...,], client pk:[list of lists],...,}
The values_list may have multiple records for each client pk. I have been able to get dictionaries of lists for client or task pk using:
def dict_from_vl(vls_list):
keys=[values_list[x][3] for x in range(0,len(values_list),1)]
values = [[values_list[x][1], values_list[x][2]] for x in range(0,len(values_list),1)]
target_dict=dict(zip(keys,values))
return target_dict
However using this method, values for the same key write over previous values as it iterates through the values_list, rather than append them to a list. So this works great for getting the most recent if the values list is sorted oldest records to newest, but not for the purpose of creating a list of lists for the dict value.
Instead of target_dict=dict(zip(keys,values)), do
target_dict = defaultdict(list)
for i, key in enumerate(keys):
target_dict[k].append(values[i])
(defaultdict is available in the standard module collections.)
from collections import defaultdict
d = defaultdict(list)
for x in vls_list:
d[x].append(list(x[1:]))
Although I'm not sure if I got the question right.
I know in Python you're supposed to cram everything into a single line, but you could do it the old fashioned way...
def dict_from_vl(vls_list):
target_dict = {}
for v in vls_list:
if v[0] not in target_dict:
target_dict[v[0]] = []
target_dict[v[0]].append([v[1], v[2]])
return target_dict
For better speed, I suggest you don't create the keys and values lists separately but simply use only one loop:
tgt_dict = defaultdict(list)
for row in vas_list:
tgt_dict[row[0]].append([row[1], row[2]])

Categories

Resources