How to traverse json tree(dictionary) in parallel way using python?

How to traverse json tree(dictionary) in parallel way using python? - python

{
'A':
{
'a':[],
'aa':val
},
'B':
{
'b':[],
'bb':val
},
'C':
{
'c':[],
'cc':val
}
}
Suppose this tree has thousand branches like 'A', 'B', 'C'. It's easy to traverse and search one by one from 'A' to 'C' which consumes a lot of time.`
So I want to traverse and search 'val' at once in A, B, C branches in any kind of parallel way.

I doubt you can speed it up with parallelization, but in case you have a lot of keys, you can use joblib to try and make it faster:
from joblib import Parallel, delayed
def parallel_process(key):
return your_function(your_dict[key])
result = Parallel(n_jobs=-1)(delayed(parallel_process)(key) for key in your_dict.keys())
The result will be a list, if you want to change a common variable you also can. For example define a result dict and add keys and values inside the parallel function.
Your question does not say what you want the end result to be, but the above solution is a great start.
You should play around and see if it's actually make it faster, but generally speaking my intuition says that you have to have a very "heavy" your_function to make it worth it, as well as an enormous amounts of keys.

Related

Pythonic way to write REST wrapper function that takes optional arguments

I have a REST api and want to write a wrapper around it in Python for others to use. It's a search api and each parameters are treated as AND
Example:
api/search/v1/search_parameters[words]=cat cute fluffy&search_parameters[author_id]=12345&search_parameters[filter][orientation]=horizontal
What's the most Pythonic way to write a function that takes all this arguments, must specify at least one search_parameters string and value.
My wrapper function would look something like this below but I'm lost with the way the user can input multiple search parameter for this search api call:
def search(self):
url = BASE_URL + search_param_url
response = self.session.get(url)
return response.json()
In the end, users should be able to just call something like api.search()

Disclaimer: questions like what is the most Pythonic (best/prettiest) way can attract unnecessary discussion (and create a distraction) yielding an inconclusive results. My personal recommendation, over reusing recommendation from a particular part of the community would be above all: be consistent across your code and how you design your interfaces. Think of those who will use them (incl. yourself 12 months down the road). As well as "The Best" solution is usually function of the intended purpose and not necessarily a universal constant (even though there might be more or less recommendable ways). That said.
If I understand correctly, your parameters are of key=value pairs nature (and you will expand them into URL as search_parameters[key]=value). Event though the filter and orientation in your example throw me off... if not true, please, describe a bit more and I can revisit my suggestion. For that a dictionary seems to offer itself as a good choice. To get one, your method could be either:
def search(self, search_kwargs):
...
And you expect your user to pass a dict of parameters (args_dict = {'string': 'xxx', ...}; c.search(args_dict)). Or:
def search(self, **kwargs):
...
And you expect your user to pass key/value pairs as keyword arguments of the method (c.search(string='xxx')). I would probably favor the former option. Dict is flexible when you prepare the parameters (and yes, you could also pass a dict in the latter case, but that kind beats the purpose of keyword arguments expansion; always chose the simpler option achieving the same goal).
In any case, you can just take the dict (my_args stands for either one of the two above). Check you have at least one of the required keys:
not ('string' in my_args or 'value' in my_args):
raise SearchParamsError("Require 'string' or 'value'.")
Perform any other sanity checks. Prepare params to be appended to the URL:
url_params = '&'.join(('{}={}'.format(k, my_dict[k]) for k in my_dict))
That's the trivial stuff. But depending on your needs and usage, you may actually introduce a (e.g.) SearchRequest class whose constructor could take initial set of parameters similar to the above described method, but you would have further method(s) allowing to manipulate the search (add more parameters) before executing it. And each parameter addition could be already subject to validity check. You could make the instance callable to execute the search itself (corresponding method) or pass this to a search method that takes a prepared requests as its argument.
Updated based on bit more insight in the comment.
If your API actually uses (arbitrarily) nested mapping objects, dictionary is still a good structure to hold your parameters. I'd pick one of the two options.
You can use nested dictionaries, which might afford you flexibility describing the request and could more accurately reflect how your REST API understand its data -> the way you form your request is more similar to how the REST API describes it. However using keyword arguments mentioned above is no longer an option (or not without extra work similar to the next option and some more translation). And the structure of the data might make (esp. simple cases) using it less convenient. E.g.:
my_dict = {'string': 'foo bar',
'author_id': 12345,
'filter': {'orientation': 'horizontal',
'other': 'baz'},
'other': {'more': {'nested': 1,
'also': 2},
'less': 'flat'}}
def par_dict_format(in_dict, *, _pfx='search_parameters'):
ret = []
for key, value in in_dict.items():
if isinstance(value, dict):
ret.append(par_dict_format(value, _pfx='{}[{}]'.format(_pfx, key)))
else:
ret.append('{}[{}]={}'.format(_pfx, key, value))
return '&'.join(ret)
Or you can opt for a structure of flat key/value pairs introducing notation using reasonable and non-conflicting separator for individual elements. Depending on the separator used, you could even get keyword arguments back into play (not with the . in my example though). One of the downsides is, you effectively create a new/parallel interface and notation. E.g.:
my_dict = {'string': 'foo bar',
'author_id': 12345,
'filter.orientation': 'horizontal',
'filter.other': 'baz',
'other.more.nested': 1,
'other.more.also': 2,
'other.more.also': 2,
'other.less': 'flat'}
def par_dict_format(in_dict):
ret = []
for key, value in in_dict.items():
key_str = ''.join(('[{}]'.format(p) for p in key.split('.')))
ret.append('{}={}'.format(key_str, value))
return '&'.join(('search_parameters{}'.format(i) for i in ret))
My take on these two would be. If I mostly construct the query programmatically (for instance having different methods to launch different queries), I'd lean to nesting dictionaries. If expected usage would be geared more towards people writing queries directly, calling the search method or even perhaps exposing it through a CLI, the latter (flat) structure could be easier to use/write for that.

Performance implications of unpacking dictionaries in Python

I have a chunk of code that takes a series of variables and passes them to N number of modules. To simplify readability of code rather than passing the variables over and over again I created a dictionary and unpack that to modules as follows:
message_package = {
'v1' : v1,
'v2' : v2,
'v3' : v3
}
for mod in mods:
mod.f1(**message_package)
[...]
if condition:
mod.f2(**message_package)
Each module then grabs variables they need and ignore the rest:
def mod1.f1(v1=None,**kwargs):
do_something()
From a readability/usability standpoint I find this quite nice -- variables are immediately available without having to pull them out of **kwargs, and if I add a variable to the message package it's only one line and I don't have to update all modules.
As I'm somewhat new to Python I'm wondering... is this very unpythonic? Is there a big performance impact from constantly unpacking these dictionaries and/or is there a better way to do this?

Thanks to all the comments above. I followed martijn's suggestion and ran a simple test using timeit.
The results using my data are as follows:
>>> timeit.timeit('passdict()',setup=setup,number=1000000)
0.1841774140484631
>>> timeit.timeit('unpack()',setup=setup,number=1000000)
0.43643336702371016
>>>
Looks like Cyphase was correct that there would only be a performance issue if I were doing this a "whoooole lot" -- unpacking is twice as slow as passing a dictionary, but only costs 250ms over 1M iterations. For me this is negligible as I'm only dealing with 5-10 calls in one function.

Python - get object - Is dictionary or if statements faster?

I am making a POST to a python script, the POST has 2 parameters. Name and Location, and then it returns one string. My question is I am going to have 100's of these options, is it faster to do it in a dictionary like this:
myDictionary = {"Name":{"Location":"result", "LocationB":"resultB"},
"Name2":{"Location2":"result2A", "Location2B":"result2B"}}
And then I would use.get("Name").get("Location") to get the results
OR do something like this:
if Name = "Name":
if Location = "Location":
result = "result"
elif Location = "LocationB":
result = "resultB"
elif Name = "Name2":
if Location = "Location2B":
result = "result2A"
elif Location = "LocationB":
result = "result2B"
Now if there are hundreds or thousands of these what is faster? Or is there a better way all together?

First of all:
Generally, it's much more pythonic to match keys to values using dictionaries. You should do that from a point of style.
Secondly:
If you really care about performance, python might not always be the optimal tool. However, the dict approach should be much much faster, unless your selections happen about as often as the creation of these dicts. The creation of thousands and thousands of PyObjects to check your case is a really bad idea.
Thirdly:
If you care about your application so much, you might really want to benchmark both solutions -- as usual when it comes to performance questions, there's a million factors including your computing platform that only experiments will help to sort out
Fourth(ly?):
It looks like you're building something like a protocol parser. That's really not python's forte, performance-wise. Maybe you'd want to look into one of the dozens of tools that can write C code parsers for you and wrap that in a native module, it's pretty sure to be faster than either of your implementations, if done right.
Here's the python documentation on Extending Python with C or C++

I decided to test the two scenarios of 1000 Names and 2 locations
The Test Samples
Team Dictionary:
di = {}
for i in range(1000):
di["Name{}".format(i)] = {'Location': 'result{}'.format(i), 'LocationB':'result{}B'.format(i)}
def get_dictionary_value():
di.get("Name999").get("LocationB")
Team If Statement:
I used a python script to generate a 5000 line function if_statements(name, location): following this pattern
elif name == 'Name994':
if location == 'Location':
return 'result994'
elif location == 'LocationB':
return 'result994B'
# Some time later ...
def get_if_value():
if_statements("Name999", "LocationB")
Timing Results
You can time with the timeit function to test the time it takes a function to complete.
import timeit
print(timeit.timeit(get_dictionary_value))
# 0.06353...
print(timeit.timeit(get_if_value))
# 6.3684...
So there you have it, dictionary was 100 times faster on my machine than the hefty 165 KB if-statement function.

I will root for dict().
In most cases [key] selection is much faster than conditional checks. Rule of thumb conditionals are generally used for boolean statements.
The reason for this is; when you create a dictionary you essentially create a registry of that data which is stored in as hashes in a bucket. When you say for instance my dictonary_name['key'] if that value exist python knows the exact location of that value and returns it in almost in an instant.
However conditionals are different. Conditionals are sequential checks meaning worse case it has to check every condition provided to first establish the value's existence then it has return the respective data.
As you can see with 100's of statements this can be problematic. Though in this case dictionaries are faster. You also need to be cognizant of how often and how quickly these checks are. Because if they are faster than the the building of your dictionary you might get an error of value not found.

Implicit functions, looking to perform different math functions based on a key

The title is a bit weird but I couldn't think of a better way to phrase it. I'm working on a plugin for a 3D software. It stores data in things called "channels", and the plugin I'm working on is synchronising different "channels" based on some mathematical relationships. Basically something like this:
If the user updates channel "quality" I want to update "rays" and "ratio" according to a formula. If they update "rays" or "ratio" I want to update just "quality". There are a whole bunch of different things I need to update, not just these ones, but I'll keep it simple for the sake of... well, simplicity.
At the moment I've got a dictionary which is a list of channels to be updated for each channel-that-got-update key, like this:
channel_relationships = {
"quality": ["rays", "ratio"],
"rays": ["quality"],
"ratio": ["quality"]
}
That's working well for figuring out what's changing. My code gets a value sent to it to notify it of which channel was changed by the user, so I simply access the right list by using the right key from the dictionary:
channels_to_update = channel_relationships[incoming_channel]
for channel in channels_to_update:
UpdateChannel(channel)
That's all fine and dandy. But the tricky thing is that all the different channels affect each-other in different ways using some rather long math expressions. Currently I'm solving that by a bunch of if/elif statements in my UpdateChannel function, something like this:
def UpdateChannel(channel):
if channel == "rays":
pass #do math here
elif channel == "ratio":
pass #do other math...
elif channel == "quality"
pass #do other math...
Which is not very elegant. Ideally I'd want to store the functions themselves in an implicit way in a dictionary, like this:
functions = {"quality": {"rays": 0.5*x**2+3.2*x+2,
"ratio": 7.3*x**2+1.2*x-5}}
Basically storing the functions themselves directly in the dictionary. Note that I'd rather not actually calculate all the values in the dictionary. I want them to only be evaluated as needed. So when I access functions["quality"]["rays"] I want to get either the implicit function, or the value as calculated based on "x". Something like that. The problem is, I have no idea if this is even possible in Python, nor do I have any idea where to start looking for such a thing. I could just define my functions explicitly for each relationship, but I'd end up with a LOT of functions. Because all the functions are literally just a single floating point value based on an input floating point value, this seems like the most elegant way of doing it.

Functions are first-class objects, you can store them in dictionaries, retrieve them, then call them, no problem.
A lambda will produce a function from a single expression:
functions = {
"quality": {
"rays": lambda x: 0.5*x**2+3.2*x+2,
"ratio": lambda x: 7.3*x**2+1.2*x-5
}
}
Now look up your functions and call them:
functions['quality']['rays'](input_float_value)
Yes, it is that simple. :-)

In Python, is it possible for an external function when passed a dict key, to find the other keys in that dictionary?

Let's say I have the following code:
def function(k):
# do something here
d = { 0: 'a', 1: 'b', 2: 'c' }
function(d[0])
Is it possible for the function to find out what are the other keys in the d? For example is there such a thing:
def function(k):
print k.__parent__.keys()
I don't think there is such a feature (as it would be a significant security issue). However I don't know much about the technical implementation to know that for sure.
EDIT: The motivation was if everything in Python is an object, is it possible to find other objects that are bound to it, in this case the parent dictionary.

No, there isn't. Values in Python do not track where they originated from, nor are they 'bound' to one another. Containers can refer to other values, but since any value can be referenced from multiple locations there is no point in tracking back-references.
From function()s point of view, there is absolutely no difference between the following two invocations:
function(d[0])
function('a')
In both cases, k in the function is bound to a python string value, 'a'.
If you need more context in a function, you need to pass it in explicitly:
def function(key, mapping):
k = mapping[key]
function(0, d)
Now you have the dictionary itself too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.