Python Best Way to construct Dictionary from Inputs - python

I'm trying to construct a nested dictionary from user inputs. The only issue is, the user can opt to not enter some of these inputs. In these cases, I want the dictionary to completely exclude that field. For Instance:
ids = 1234
dmas = 5678
I would like the dictionary to look like this:
d = {profile:{"dma_targets":dmas, "id":ids}}
However, if user decides not to include certain input:
ids = None
dmas = 5678
I would like the dictionary to look like this:
d = {profile:{"dma_targets":dmas}}
I'm a bit stuck here, and it seems like a very simple thing, as it would be easy to do if I wanted a list instead of a dict. One of the problems I'm running into is:
x = "dma_targets":dmas
is not a valid object, so I'm having a hard time constructing the pieces of this, then adding them into the dictionary. Thanks so much!

How about a little dict comprehension?
fkeys = ['dma_targets', 'ids']
fvals = [5678, None]
d = {'profile': {k:v for (k,v) in zip(fkeys, fvals) if v is not None}}
which yields d as
{'profile': {'dma_targets': 5678}}

d = {profile:{}}
if ids:
d['profile']['ids'] = ids
if dmas:
d['profile']['dma_targets'] = dmas

If I understand correctly you want a nested dictionary with different types (python allows you to do this).
from collections import defaultdict
d = defaultdict(lambda: defaultdict(list))
d['a']['b'].append('bla')
d[15] = 15

Related

Lambda in deafaultdict in Python

I'm following the book 'Data Science from Scratch' and this is a piece of code in it:
dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1 # now dd_pair contains {2: [0, 1]}
Can someone please help me understand why and how it works?
defaultdict takes a data-type as an initializer. Let's consider we have a dictionary called "users" with "ID" being key and a list as value.
We have to check if a "ID" exists in the dictionary, if yes we append something to the list, else we put an empty list in that place.
So with a regular dictionary, we do something like:
users = {}
if "id1" not in users:
users["id1"] = []
users["id1"].append("log")
Now with defaultdict, all we have to do is to set an initialiser as:
from collections import defaultdict
users = defaultdict(list) # Any key not existing in the dictionary will get assigned a `list()` object, which is an empty list
users["id1"].append("log")
So coming to your code,
dd_pair = defaultdict(lambda: [0, 0])
This says, any key which doesn't exist in dd_pair will get a list of two elements initialised to 0 as their initial value. So if you just do print(dd_pair["somerandomkey"]) it should print [0,0].
Therefore, dd_pair[2][1] translates roughly to look like this:
dd_pair[2] = [0,0] # dd_pair looks like: {2:[0,0]}
dd_pair[2][1] = 1 # dd_pair looks like: {2:[0,1]}
Why the need for lambda, why not just use [0,0] ?
The defaultdict constructor expects a callable (The constructor actually expects a default_factory, check out Python docs). In extremely simple terms, if we do defaultdict(somevar), somevar() should be valid.
So, if you just pass [0,0] to defaultdict it'll be wrong since [0,0]() is not valid at all.
So what you need is a function which returns [0,0], which can be simply implemented using lambda:[0,0]. (To verify, just do (lambda:[0,0])() , it will return [0,0]).
One more way is to create a class for your specific type, which is better explained in this answer: https://stackoverflow.com/a/36320098/

Elegant way to set values in a nested json in python

I am setting up some values in a nested JSON. In the JSON, it is not necessary that the keys would always be present.
My sample code looks like below.
if 'key' not in data:
data['key'] = {}
if 'nested_key' not in data['key']:
data['key']['nested_key'] = some_value
Is there any other elegant way to achieve this? Simply assigning the value without if's like - data['key']['nested_key'] = some_value can sometimes throw KeyError.
I referred multiple similar questions about "getting nested JSON" on StackOverflow but none fulfilled my requirement. So I have added a new question. In case, this is a duplicate question then I'll remove this one once guided towards the right question.
Thanks
Please note that, for the insertion you need not check for the key and you can directly add it. But, defaultdict can be used. It is particularly helpful incase of values like lists.
from collections import defaultdict
data = defaultdict(dict)
data['key']['nested_key'] = some_value
defaultdict will ensure that you will never get a key error. If the key doesn't exist, it returns an empty object of the type with which you have initialized it.
List based example:
from collections import defaultdict
data = defaultdict(list)
data['key'].append(1)
which otherwise will have to be done like below:
data = {}
if 'key' not in data:
data['key'] = ['1']
else:
data['key'].append('2')
Example based on existing dict:
from collections import defaultdict
data = {'key1': 'sample'}
data_new = defaultdict(dict,data)
data_new['key']['something'] = 'nothing'
print data_new
Output:
defaultdict(<type 'dict'>, {'key1': 'sample', 'key': {'something': 'nothing'}})
You can write in one statement:
data.setdefault('key', {})['nested_value'] = some_value
but I am not sure it looks more elegant.
PS: if you prefer to use defaultdict as proposed by Jay, you can initialize the new dict with the original one returned by json.loads(), then passes it to json.dumps():
data2 = defaultdict(dict, data)
data2['key'] = value
json.dumps(data2) # print the expected dict

Faster way to add values to existing dictionary?

I have a dictionary of dictionaries of dictionaries
#Initialize the dictionary
myDict=dict()
for f in ncp:
myDict[f]={}
for t in ncp:
myDict[f][t] = {}
And now I go through and add a value to the lowest level (which happens to be a dictionary key and value of None), like so, but my current method is very slow
for s in subsetList:
stIndex = 0
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))):
myDict[f][t]['st_'+str(stIndex)]=None
stIndex+=1
I try to do it with principles of comprehension, but I fail miserably because the examples I find for comprehension are creating the dictionary, not iterating through an already existing one to add. My attempt to do so wont even 'compile':
myDict[f][t]['st_'+str(stIndex)]
for f in list(allNodes.intersection(set(s)))
for t in list(allNodes.difference(set( allNodes.intersection(s)))) = None
I would write your code like this:
myDict = {}
for i, s in enumurate(subsetList):
tpl = ('st_%d' % (i,), None) # Used to create a new {'st_n': None} later
x = allNodes.intersection(s)
for f in x:
myDict[f] = {}
for t in allNodes.difference(x):
myDict[f][t] = dict([tpl])
This cuts down on the number of new objects you need to create, as well as initializing myDict on-demand.
This should be faster...
from itertools import product
from collections import defaultdict
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = {}
for s in subsetList:
myDict[f][t]['st_'+str(stIndex)] = None
Or if the innermost key level is the same each time...
from itertools import product
from collections import defaultdict
innerDict = {}
for s in subsetList:
innerDict['st_'+str(stIndex)] = None
mydict = defaultdict(dict)
for f, t in product(ncp, repeat=2):
myDict[f][t] = innerDict.copy()
But I'm not sure whether creating a copy of the innermost dictionary is faster than iterating through your subsetList and creating the new dictionary each time. You'd need to time the two options.
Answering my own question here with a theory on best approach after much trial: The final result is myDict and it is a function of 2 elements: allNodes and subsetList, both of which are effectively static tables imported from SQL at the start of my program. So, why not calculate myDict once and store it in SQL and import it also. So instead of rebuilding it every time the program runs which takes 2 minutes, it is just a couple second pyodbc read. I know its kind of a cop out, but it works for the time being.

How do I make a list with the same name as a dictionary key?

I have a dictionary, containing several hundred entries, of format:
>>>dict
{'1620': 'aaaaaa'}
I would like to make new empty lists named '1620', etc. I have tried variations of the following but it doesn't recognize eachkey as a variable to be used when creating the list. Instead, it names the list literally "eachkey" and my key, in this example '1620', is not connected to the new list.
>>>for eachkey in dict.keys():
>>> eachkey=[]
>>>
>>>eachkey
[]
>>>'1620'
1620
Edited to add:
Maybe I could make the list at the same time as I make the dictionary? Slip it in here below? The str(eachfile[-4:]) is what I want the list named.
files=open(sys.argv[1])
dict={}
for eachfile in files:
value=open(eachfile)
key=str(eachfile[-4:])
dict[key]=value
eachfile.close()
Edit: it would be fine for me to add letters along w/ the numbers if that's what it needs.
I don't think it's possible to change the integer literal 1620 so that it gives you an object other than the integer 1620. Similarly I don't think you can change the string literal '1620' to give you a list instead of a string.
You could do it if you prefix the variable names with some letters to make them valid names. For example you could use my1620 instead of 1620. I wouldn't advise doing this, but it's possible:
>>> d = {'1620': 'aaaaaa'}
>>> for k,v in d.items():
... locals()['my'+k] = []
>>> my1620
'aaaaaa'
With a dict like this:
d1 = {'foo':'bar', '1621':'hello'}
Try doing this:
d2 = dict((k,list()) for k in d1.keys())
Now d2 is:
{'1621': [], 'foo': []}
And you can reference your lists list so:
d2['1621'].append(20)
d2['foo'].append(5)
d2['foo'].append('zig')
Which makes d2:
{'1621': [20], 'foo': [5, 'zig']}
As Gareth said, it's VERY unlikely you really want to do what you're asking to do. This is probably better.

Pythonic way to parse list of dictionaries for a specific attribute?

I want to cross reference a dictionary and django queryset to determine which elements have unique dictionary['name'] and djangoModel.name values, respectively. The way I'm doing this now is to:
Create a list of the dictionary['name'] values
Create a list of djangoModel.name values
Generate the list of unique values by checking for inclusion in those lists
This looks as follows:
alldbTests = dbp.test_set.exclude(end_date__isnull=False) #django queryset
vctestNames = [vctest['name'] for vctest in vcdict['tests']] #from dictionary
dbtestNames = [dbtest.name for dbtest in alldbTests] #from django model
# Compare tests in protocol in fortytwo's db with protocol from vc
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in vctestNames]
newTests = [vctest for vctest in vcdict if vctest['name'] not in dbtestNames]
It feels unpythonic to have to generate the intermediate list of names (lines 2 and 3 above), just to be able to check for inclusion immediately after. Am I missing anything? I suppose I could put two list comprehensions in one line like this:
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in [vctest['name'] for vctest in vcdict['tests']]]
But that seems harder to follow.
Edit:
Think of the initial state like this:
# vcdict is a list of django models where the following are all true
alldBTests[0].name == 'test1'
alldBTests[1].name == 'test2'
alldBTests[2].name == 'test4'
dict1 = {'name':'test1', 'status':'pass'}
dict2 = {'name':'test2', 'status':'pass'}
dict3 = {'name':'test5', 'status':'fail'}
vcdict = [dict1, dict2, dict3]
I can't convert to sets and take the difference unless I strip things down to just the name string, but then I lose access to the rest of the model/dictionary, right? Sets only would work here if I had the same type of object in both cases.
vctestNames = dict((vctest['name'], vctest) for vctest in vcdict['tests'])
dbtestNames = dict((dbtest.name, dbtest) for dbtest in alldbTests)
obsoleteTests = [vctestNames[key]
for key in set(vctestNames.keys()) - set(dbtestNames.keys())]
newTests = [dbtestNames[key]
for key in set(dbtestNames.keys()) - set(vctestNames.keys())]
You're working with basic set operations here. You could convert your objects to sets and just find the intersection (think Venn Diagrams):
obsoleteTests = list(set([a.name for a in alldbTests]) - set(vctestNames))
Sets are really useful when comparing two lists of objects (pseudopython):
set(a) - set(b) = [c for c in a and not in b]
set(a) + set(b) = [c for c in a or in b]
set(a).intersection(set(b)) = [c for c in a and in b]
The intersection- and difference-operations of sets should help you solve your problem more elegant.
But as you're originally dealing with dicts these examples and discussion may provide some inspirations: http://code.activestate.com/recipes/59875-finding-the-intersection-of-two-dicts

Categories

Resources