From a python dictionary to list of settings - python

I want to be able to transform a dictionary in a list of options that
can be set (with the full path), for example this should pass:
def test_dic_to_args(self):
dic = {"x1": {"x2": "val1"}, "x2": "val3"}
des = ["x1.x2:val1", "x2:val3"]
self.assertEqual(conf.dict_to_args(dic), des)
Now I started to write it and I thought it was easy, but it's more
tricky than I thought, with queues, type checking and so on..
Is there a smart way to solve this problem?
Maybe the best option is still a recursive DFS, what do you think?

If the dictionary is supposed to be arbitrarily nested, a recursive approach is most probably easiest.
def dict_to_args(d, prefix=()):
for k, v in d.iteritems():
if isinstance(v, dict):
for x in dict_to_args(v, prefix + (k,)):
yield x
else:
yield ".".join(prefix + (k,)) + ":" + v
Example:
>>> list(dict_to_args(dic))
['x2:val3', 'x1.x2:val1']

Related

Print 'key' if given 'value' matches in python dictionary

I want to check if the 'value' exists in python dictionary and if matches print the 'key'. The problem is, values are in the list.
This script will give the server name based on the provided domain name. It will query the predefined nameserver and gives output accordingly.
I have tried following but it keeps giving me the same output.
if [k for k, v in servers.iteritems() if answer in v]:
print "\nThe domain is in " + v + ".\n"
The script is as follows. Any suggestions other than the original one is welcome.
#!/usr/bin/python
import dns.resolver
import sys
servers = {
'UK':['127.0.0.1'],
'USA':['127.0.0.2','127.0.0.3','127.0.0.4'],
'AUS':['127.0.1.1','127.0.1.2']
}
website = sys.argv[1]
try:
nameserver = dns.resolver.Resolver(configure=False)
nameserver.nameservers = ['198.40.3.6','198.40.3.7']
answer = nameserver.query(website)[0]
answer = str(answer)
if [k for k, v in servers.iteritems() if answer in v]:
print "\nThe domain is in " + v + ".\n"
except Exception as e:
print str(e)
It should give the correct 'key' but it is not. It is giving the same output.
The logic of your if check and the print statement following it are faulty. There's nothing specifically incorrect about how you're finding keys (though you could do it more efficiently), but you're not using that result at all in the rest of your code, so it doesn't really matter.
Try changing your code to this:
matched_keys = [k for k, v in servers.iteritems() if answer in v] # same list comp as before
if matched_keys:
print print "\nThe domain is in " + str(matched_keys) + ".\n" # or maybe use matched_keys[0]?
The way I coded it above will print out the list of all keys that have the answer in them, if there are any. If you're sure there can only be one result, you can use matched_keys[0].
Note that if you expect to be doing this sort of check a lot, with the same set of servers, you should probably change your data structure so that you can do a more efficient check. The current one is O(M*N) where M is the number of checks you need to do and M is the number of values in the dictionary. You could turn that into O(M) by constructing a reversed dictionary:
reversed_servers = {}
for k, v in servers.iteritems():
for address in v:
reversed_servers.setdefault(address, []).append(k) # or reversed_servers[address] = k
You only need to do this setup once. Later you can do any number of efficent lookups with just reversed_servers[answer], with no loop needed.
Note that the code above sets up a dictionary containing lists of all matching keys, if there can be only one for each address (because the values are unique), then you can use the alternative version in the comment, which will map from address to key directly (without a list).
Try this:
result = [k for k in servers.keys() if answer in servers[k]]
if result:
print "\nThe domain is in " + str(servers[result[0]]) + ".\n"
To print the corresponding key, you can use result[0].

Advice on making my JSON find-all-nested-occurrences method cleaner

I am parsing unknown nested json object, I do not know the structure nor depth ahead of time. I am trying to search through it to find a value. This is what I came up with, but I find it fugly. Could anybody let me know how to make this look more pythonic and cleaner?
def find(d, key):
if isinstance(d, dict):
for k, v in d.iteritems():
try:
if key in str(v):
return 'found'
except:
continue
if isinstance(v, dict):
for key,value in v.iteritems():
try:
if key in str(value):
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
if isinstance(d, list):
for x in d:
try:
if key in x:
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
else:
if key in str(d):
return "found"
else:
return "Not Found"
It is generally more "Pythonic" to use duck typing; i.e., just try to search for your target rather than using isinstance. See What are the differences between type() and isinstance()?
However, your need for recursion makes it necessary to recurse the values of the dictionaries and the elements of the list. (Do you also want to search the keys of the dictionaries?)
The in operator can be used for both strings, lists, and dictionaries, so no need to separate the dictionaries from the lists when testing for membership. Assuming you don't want to test for the target as a substring, do use isinstance(basestring) per the previous link. To test whether your target is among the values of a dictionary, test for membership in your_dictionary.values(). See Get key by value in dictionary
Because the dictionary values might be lists or dictionaries, I still might test for dictionary and list types the way you did, but I mention that you can cover both list elements and dictionary keys with a single statement because you ask about being Pythonic, and using an overloaded oeprator like in across two types is typical of Python.
Your idea to use recursion is necessary, but I wouldn't define the function with the name find because that is a Python built-in which you will (sort of) shadow and make the recursive call less readable because another programmer might mistakenly think you're calling the built-in (and as good practice, you might want to leave the usual access to the built in in case you want to call it.)
To test for numeric types, use `numbers.Number' as described at How can I check if my python object is a number?
Also, there is a solution to a variation of your problem at https://gist.github.com/douglasmiranda/5127251 . I found that before posting because ColdSpeed's regex suggestion in the comment made me wonder if I were leading you down the wrong path.
So something like
import numbers
def recursively_search(object_from_json, target):
if isinstance(object_from_json, (basestring, numbers.Number)):
return object_from_json==target # the recursion base cases
elif isinstance(object_from_json, list):
for element in list:
if recursively_search(element, target):
return True # quit at first match
elif isinstance(object_from_json, dict):
if target in object_from_json:
return True # among the keys
else:
for value in object_from_json.values():
if recursively_search(value, target):
return True # quit upon finding first match
else:
print ("recursively_search() did not anticipate type ",type(object_from_json))
return False
return False # no match found among the list elements, dict keys, nor dict values

Fastest way to update a dictionary in python

I have a dictionary A, and a possible entry foo. I know that A[foo] should be equal to x, but I don't know if A[foo] has been already defined. In any case if A[foo] has been defined it means that it already has the correct value.
It is faster to execute:
if foo not in A.keys():
A[foo]=x
or simply update
A[foo]=x
because by the time the computer has found the foo entry, it can as well update it. While if not I would have to call the hash table two times?
Thanks.
Just add items to the dictionary without checking for their existence. I added 100,000 items to a dictionary using 3 different methods and timed it with the timeit module.
if k not in d: d[k] = v
d.setdefault(k, v)
d[k] = v
Option 3 was the fastest, but not by much.
[ Actually, I also tried if k not in d.keys(): d[k] = v, but that was slower by a factor of 300 (each iteration built a list of keys and performed a linear search). It made my tests so slow that I left it out here. ]
Here's my code:
import timeit
setup = """
import random
random.seed(0)
item_count = 100000
# divide key range by 5 to ensure lots of duplicates
items = [(random.randint(0, item_count/5), 0) for i in xrange(item_count)]
"""
in_dict = """
d = {}
for k, v in items:
if k not in d:
d[k] = v
"""
set_default = """
d = {}
for k, v in items:
d.setdefault(k, v)
"""
straight_add = """
d = {}
for k, v in items:
d[k] = v
"""
print 'in_dict ', timeit.Timer(in_dict, setup).timeit(1000)
print 'set_default ', timeit.Timer(set_default, setup).timeit(1000)
print 'straight_add ', timeit.Timer(straight_add, setup).timeit(1000)
And the results:
in_dict 13.090878085
set_default 21.1309413091
straight_add 11.4781760635
Note: This is all pretty pointless. We get many questions daily about what's the fastest way to do x or y in Python. In most cases, it is clear that the question was being asked before any performance issues were encountered. My advice? Focus on writing the clearest program you can write and if it's too slow, profile it and optimize where needed. In my experience, I almost never get to to profile and optimize step. From the description of the problem, it seems as if dictionary storage will not be the major bottle-neck in your program.
Using the built-in update() function is even faster. I tweaked Steven Rumbalski's example above a bit and it shows how update() is the fastest. There are at least two ways to use it (with a list of tuples or with another dictionary). The former (shown below as update_method1) is the fastest. Note that I also changed a couple of other things about Steven Rumbalski's example. My dictionaries will each have exactly 100,000 keys but the new values have a 10% chance of not needing to be updated. This chance of redundancy will depend on the nature of the data that you're updating your dictionary with. In all cases on my machine, my update_method1 was the fastest.
import timeit
setup = """
import random
random.seed(0)
item_count = 100000
existing_dict = dict([(str(i), random.randint(1, 10)) for i in xrange(item_count)])
items = [(str(i), random.randint(1, 10)) for i in xrange(item_count)]
items_dict = dict(items)
"""
in_dict = """
for k, v in items:
if k not in existing_dict:
existing_dict[k] = v
"""
set_default = """
for k, v in items:
existing_dict.setdefault(k, v)
"""
straight_add = """
for k, v in items:
existing_dict[k] = v
"""
update_method1 = """
existing_dict.update(items)
"""
update_method2 = """
existing_dict.update(items_dict)
"""
print 'in_dict ', timeit.Timer(in_dict, setup).timeit(1000)
print 'set_default ', timeit.Timer(set_default, setup).timeit(1000)
print 'straight_add ', timeit.Timer(straight_add, setup).timeit(1000)
print 'update_method1 ', timeit.Timer(update_method1, setup).timeit(1000)
print 'update_method2 ', timeit.Timer(update_method2, setup).timeit(1000)
This code resulted in the following results:
in_dict 10.6597309113
set_default 19.3389420509
straight_add 11.5891621113
update_method1 7.52693581581
update_method2 9.10132408142
if foo not in A.keys():
A[foo] = x
is very slow, because A.keys() creates a list, which has to be parsed in O(N).
if foo not in A:
A[foo] = x
is faster, because it takes O(1) to check, whether foo exists in A.
A[foo] = x
is even better, because you already have the object x and you just add (if it already does not exist) a pointer to it to A.
There are certainly faster ways than your first example. But I suspect the straight update will be faster than any test.
foo not in A.keys()
will, in Python 2, create a new list with the keys and then perform linear search on it. This is guaranteed to be slower (although I mainly object to it because there are alternatives that are faster and more elegant/idiomatic).
A[foo] = x
and
if foo not in A:
A[foo] = x
are different if A[foo] already exists but is not x. But since your "know" A[foo] will be x, it doesn't matter semantically. Anyway, both will be fine performance-wise (hard to tell without benchmarking, although intuitively I'd say the if takes much more time than copying a pointer).
So the answer is clear anyway: Choose the one that is much shorter code-wise and just as clear (the first one).
If you "know" that A[foo] "should be" equal to x, then I would just do:
assert(A[foo]==x)
which will tell you if your assumption is wrong!
A.setdefault(foo, x) but i'm not sure it is faster then if not A.has_key(foo): A[foo] = x. Should be tested.

Concatenate sequence from a predefined datastructure

I've been struggling a little to build this piece of code, and I was wondering if there are others more simple/efficient way of doing this:
fsSchema = {'published': {'renders': {'SIM': ('fold1', 'fold2'), 'REN': ('fold1', 'fold2')}}}
def __buildPathFromSchema(self, schema, root=''):
metaDirs = []
for dir_ in schema.keys():
root = os.path.join(root, dir_)
if isinstance(schema[dir_], dict):
return self.__buildPathFromSchema(schema[dir_], root)
if isinstance(schema[dir_], tuple):
for i in schema[dir_]:
bottom = os.path.join(root, i)
metaDirs.append(bottom)
root = os.sep.join(os.path.split(root)[:-1])
return metaDirs
Basically what I want to do is generating paths from a predefined structure like fsSchema. Note the latest iteration is always a tuple.
The ouput looks like:
['published\renders\REN\fold1',
'published\renders\REN\fold2',
'published\renders\SIM\fold1',
'published\renders\SIM\fold2']
Thanks!
You can use a recursive function to generate all the paths:
def flatten(data):
if isinstance(data, tuple):
for v in data:
yield v
else:
for k in data:
for v in flatten(data[k]):
yield k + '\\' + v
This should be able to handle any kind of nested dictionaries:
>>> fsSchema = {'published': {'renders': {'SIM': ('fold1', 'fold2'), 'REN': ('fold1', 'fold2')}}}
>>> list(flatten(fsSchema))
['published\\renders\\REN\\fold1', 'published\\renders\\REN\\fold2', 'published\\renders\\SIM\\fold1', 'published\\renders\\SIM\\fold2']
Note that the paths are generated in "random" order since dictionaries don't have any internal ordering.
Instead of:
for dir_ in schema.keys():
...
if isinstance(schema[dir_], dict):
you can do:
for dir_name, dir_content in schema.iteritems():
...
if isinstance(dir_content, tuple):
It's both faster and more readable.
I would keep doing it recursively like you already are but split the walker off from the path generator:
def walk(data):
if hasattr(data, 'items'):
for outer_piece, subdata in data.items():
for inner_piece in walk(subdata):
yield (outer_piece, ) + inner_piece
else:
for piece in data:
yield (piece, )
def paths(data):
for path in walk(data):
yield os.sep.join(path)
The reason being that it is really two separate pieces of functionality and having them implemented as separate functions is hence easier to debug, maintain, implement and just generally think about.

Finding matching keys in two large dictionaries and doing it fast

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries.
Say for example:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
I want to print out the value for Actinobacter (8924342) since it matches a value in myRDP.
The following code works, but is very slow:
for key in myRDP:
for jey in myNames:
if key == jey:
print key, myNames[key]
I've tried the following but it always results in a KeyError:
for key in myRDP:
print myNames[key]
Is there perhaps a function implemented in C for doing this? I've googled around but nothing seems to work.
Thanks.
Use sets, because they have a built-in intersection method which ought to be quick:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
rdpSet = set(myRDP)
namesSet = set(myNames)
for name in rdpSet.intersection(namesSet):
print name, myNames[name]
# Prints: Actinobacter 8924342
You could do this:
for key in myRDP:
if key in myNames:
print key, myNames[key]
Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!
But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.
in python 3 you can just do
myNames.keys() & myRDP.keys()
for key in myRDP:
name = myNames.get(key, None)
if name:
print key, name
dict.get returns the default value you give it (in this case, None) if the key doesn't exist.
You could start by finding the common keys and then iterating over them. Set operations should be fast because they are implemented in C, at least in modern versions of Python.
common_keys = set(myRDP).intersection(myNames)
for key in common_keys:
print key, myNames[key]
Best and easiest way would be simply perform common set operations(Python 3).
a = {"a": 1, "b":2, "c":3, "d":4}
b = {"t1": 1, "b":2, "e":5, "c":3}
res = a.items() & b.items() # {('b', 2), ('c', 3)} For common Key and Value
res = {i[0]:i[1] for i in res} # In dict format
common_keys = a.keys() & b.keys() # {'b', 'c'}
Cheers!
Use the get method instead:
for key in myRDP:
value = myNames.get(key)
if value != None:
print key, "=", value
You can simply write this code and it will save the common key in a list.
common = [i for i in myRDP.keys() if i in myNames.keys()]
Copy both dictionaries into one dictionary/array. This makes sense as you have 1:1 related values. Then you need only one search, no comparison loop, and can access the related value directly.
Example Resulting Dictionary/Array:
[Name][Value1][Value2]
[Actinobacter][GATCGA...TCA][8924342]
[XYZbacter][BCABCA...ABC][43594344]
...
Here is my code for doing intersections, unions, differences, and other set operations on dictionaries:
class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
if __name__ == '__main__':
import unittest
class TestDictDifferNoChanged(unittest.TestCase):
def setUp(self):
self.past = dict((k, 2*k) for k in range(5))
self.current = dict((k, 2*k) for k in range(3,8))
self.d = DictDiffer(self.current, self.past)
def testAdded(self):
self.assertEqual(self.d.added(), set((5,6,7)))
def testRemoved(self):
self.assertEqual(self.d.removed(), set((0,1,2)))
def testChanged(self):
self.assertEqual(self.d.changed(), set())
def testUnchanged(self):
self.assertEqual(self.d.unchanged(), set((3,4)))
class TestDictDifferNoCUnchanged(unittest.TestCase):
def setUp(self):
self.past = dict((k, 2*k) for k in range(5))
self.current = dict((k, 2*k+1) for k in range(3,8))
self.d = DictDiffer(self.current, self.past)
def testAdded(self):
self.assertEqual(self.d.added(), set((5,6,7)))
def testRemoved(self):
self.assertEqual(self.d.removed(), set((0,1,2)))
def testChanged(self):
self.assertEqual(self.d.changed(), set((3,4)))
def testUnchanged(self):
self.assertEqual(self.d.unchanged(), set())
unittest.main()
def combine_two_json(json_request, json_request2):
intersect = {}
for item in json_request.keys():
if item in json_request2.keys():
intersect[item]=json_request2.get(item)
return intersect

Categories

Resources