Python: Using a multidimensional multiprocessing.manager.list() - python

This might not be its intended use, but I would like to know how to use a multidimensional manager.list(). I can create on just fine, something like this:
from multiprocessing import manager
test = manager.list(manager.list())
however when ever I try to access the first element of the test list it returns the element's value and not its proxy object
test[0] # returns [] and not the proxy, since I think python is running __getitem__.
Is there anyway for me to get around this and use the manager.list() in this way?

The multiprocessing documentation has a note on this:
Note
Modifications to mutable values or items in dict and list proxies will
not be propagated through the manager, because the proxy has no way of
knowing when its values or items are modified. To modify such an item,
you can re-assign the modified object to the container proxy:
# create a list proxy and append a mutable object (a dictionary)
lproxy = manager.list()
lproxy.append({})
# now mutate the dictionary
d = lproxy[0]
d['a'] = 1
d['b'] = 2
# at this point, the changes to d are not yet synced, but by
# reassigning the dictionary, the proxy is notified of the change
lproxy[0] = d
So, the only way to use a multidimensional list is to actually reassign any changes you make to the second dimension of the list back to the top-level list, so instead of:
test[0][0] = 1
You do:
tmp = test[0]
tmp[0] = 1
test[0] = tmp
Not the most pleasant way to do things, but you can probably write some helper functions to make it a bit more tolerable.
Edit:
It seems the reason that you get a plain list back when you append a ListProxy to another ListProxy is because of how pickling Proxies works. BaseProxy.__reduce__ creates a RebuildProxy object, which what actually is used to unpickle the Proxy. RebuildProxy looks like this:
def RebuildProxy(func, token, serializer, kwds):
'''
Function used for unpickling proxy objects.
If possible the shared object is returned, or otherwise a proxy for it.
'''
server = getattr(process.current_process(), '_manager_server', None)
if server and server.address == token.address:
return server.id_to_obj[token.id][0]
else:
incref = (
kwds.pop('incref', True) and
not getattr(process.current_process(), '_inheriting', False)
)
return func(token, serializer, incref=incref, **kwds)
As the docstring says, if the unpickling is occuring inside a manager server, the actual shared object is created, rather than the Proxy to it. This is probably a bug, and there is actually one filed against this behavior already.

Related

append to request.sessions[list] in Django

Something is bugging me.
I'm following along with this beginner tutorial for django (cs50) and at some point we receive a string back from a form submission and want to add it to a list:
https://www.youtube.com/watch?v=w8q0C-C1js4&list=PLhQjrBD2T380xvFSUmToMMzERZ3qB5Ueu&t=5777s
def add(request):
if 'tasklist' not in request.session:
request.session['tasklist'] = []
if request.method == 'POST':
form_data = NewTaskForm(request.POST)
if form_data.is_valid():
task = form_data.cleaned_data['task']
request.session['tasklist'] += [task]
return HttpResponseRedirect(reverse('tasks:index'))
I've checked the type of request.session['tasklist']and python shows it's a list.
The task variable is a string.
So why doesn't request.session['tasklist'].append(task) work properly? I can see it being added to the list via some print statements but then it is 'forgotten again' - it doesn't seem to be permanently added to the tasklist.
Why do we use this request.session['tasklist'] += [task] instead?
The only thing I could find is https://ogirardot.wordpress.com/2010/09/17/append-objects-in-request-session-in-django/ but that refers to a site that no longer exists.
The code works fine, but I'm trying to understand why you need to use a different operation and can't / shouldn't use the append method.
Thanks.
The reason why it does not work is because django does not see that you have changed anything in the session by using the append() method on a list that is in the session.
What you are doing here is essentially pulling out the reference to the list and making changes to it without the session backend knowing anything about it. An other way to explain:
The append() method is on the list itself not on the session object
When you call append() on the list you are only talking to the list and the list's parent (the session) has no idea what you guys are doing
When you however do an assignment on the session itself session['whatever'] = 'something' then it knows that something is up and changes are made
So the key here is that you need to operate on the session object directly if you want your changes to be updated automatically
Django only thinks it needs to save a changed session item if the item got reassigned to the session. See here: django session base code the __setitem__ method containing a self.modified = True statement.
The session['list'] += [new_element] adds a new list item (mutates the list stored in the session, so the list reference stays the same) and then gets it reassigned to the session again -> thus triggering first a __getitem__ call -> then your += / __iadd__ runs on the value read -> then a __setitem__ call is made (with the list ref. passed to it). You can see it in the django codebase that it marks the session after each __setitem__ call as modified.
The session['list'] = session['list'] + [new_item] mode of doing the same does create a new list every time it's run so its a bit less efficient, but you should not store hundreds of items in the session anyway. So you're probably fine. This also works exactly as above.
However if you use sub-keys in the session like session['list']['x'] = 'whatever' the session will not see itself as modified so you need to mark it as by request.session.modified = True
Short answer: It's about how Python chooses to implement the dict data structure.
Long answer:
Let's start by saying that request.session is a dictionary.
Quoting Django's documentation, "By default, Django only saves to the session database when the session has been modified – that is if any of its dictionary values have been assigned or deleted". Link
So, the problem is that the session database is not being modified by
request.session['tasklist'].append(task)
Seeing the related parts Django's Session base code (as posted by #Csaba K. in an answer), the variable self.modified is to be set True when setitem dunder method is called.
Now, at this step the problem seems like the setitem dunder method is not being called with request.session['tasklist'].append(task) but with request.session['tasklist'] += [task] it gets called. It is not due to if the reference of request.session['tasklist'] is changing or not as pointed out by another answer, because the reference to the underlying list remains the same.
To confirm, let's create a custom dictionary which extends the Python dict, and print something when setitem dunder method is called.
class MyDict(dict):
def __init__(self, globalVar):
super().__init__()
self.globalVar = globalVar
def __setitem__(self, key, value):
super().__setitem__(key, value)
print("Called Set item when: ", end="")
myDict = MyDict(0)
print("Creating Dict")
print("-----")
myDict["y"] = []
print("Adding a new key-value pair")
print("-----")
myDict["y"] += ["x"]
print(" using +=")
print("-----")
myDict["y"].append("x")
print("append")
print("-----")
myDict["y"].extend(["x"])
print("extend")
print("-----")
myDict["y"] = myDict["y"] + ["x"]
print(" using +",)
print("-----")
It prints:
Creating Dict
-----
Called Set item when: Adding a new key-value pair
-----
Called Set item when: using +=
-----
append
-----
extend
-----
Called Set item when: using +
-----
As we can see, setitem dunder method is called and in turn self.modified is set true only when adding a new key-value pair, or using += or using +, but not when initializing, appending or extending an iterable (in this case a list). Now, the operator + and += do very different things in Python, as explained in the other answer. += behaves more like the append method but in this case, I guess it's more about how Python chooses to implement the dict data structure rather than how +, += and append behave on lists.
I found this while doing some more searching:
https://code.djangoproject.com/wiki/NewbieMistakes
Scroll to 'Appending to a list in session doesn't work'
Again, it is a very dated entry but still seems to hold true.
Not completely satisfied because this does not answer the question as to 'why' this doesn't work, but at the very least confirms 'something's up' and you should probably still use the recommendations there.
(if anyone out there can actually explain this in a more verbose manner then I'd be happy to hear it)

i want to use variable globally in veiws.py

veiws.py
def getBusRouteId(strSrch):
end_point = "----API url----"
parameters = "?ServiceKey=" + "----my servicekey----"
parameters += "&strSrch=" + strSrch
url = end_point + parameters
retData = get_request_url(url)
asd = xmltodict.parse(retData)
json_type = json.dumps(asd)
data = json.loads(json_type)
if (data == None):
return None
else:
return data
def show_list(request)
Nm_list=[]
dictData_1 = getBusRouteId("110")
for i in range(len(dictData_1['ServiceResult']['msgBody']['itemList'])):
Nm_list.append(dictData_1['ServiceResult']['msgBody']['itemList'][i]['busRouteNm'])
return render(request, 'list.html', {'Nm_list': Nm_list})
There is a dict data that was given by API
In 'def getBusRouteId', some Xml data is saved by dict data
In 'def show_list', I call 'def getBusRouteId' so 'dictData_1' get a dict data
And I want to refer this dictData_1 in another function
Is there any way to use dictData_1 globally?
Either store those data in a session (if those are short-lived data) or in the database (if you want to persist them).
The point is that a WSGI app is typically deployed as a pool of long-running processes, with a "supervisor" process that will dispatch incoming HTTP requests to the first available process (or to a newly spawned one etc), so using process-wide globals to store per-user data does NOT work as you always end up with user A getting data from user B, or no data at all, etc.
NB: this kind of issues may not appear when testing with a single user on the dev server, but it's still GARANTEED to break in production.
Also, totally unrelated but:
1/ this bit seems totally useless - you serialize a dict to json then unserialize it to a dict, which, unless you have custom serialization / unseralization hooks (which is not the case here), it's functionalmly a no-op.
json_type = json.dumps(asd)
data = json.loads(json_type)
2/ Here:
end_point = "----API url----"
parameters = "?ServiceKey=" + "----my servicekey----"
parameters += "&strSrch=" + strSrch
url = end_point + parameters
retData = get_request_url(url)
I don't know how get_request_url is implemented but if you are using python-requests, it already knows how to turn a dict into a (properly encoded) querystring. And if you're using the standard urllib packages, they ALSO provide a way to turn a dict into a properly built querystring. This makes for more robust AND more maintainable code.
3/ you may want to learn how to properly use Python's for loops
Here:
Nm_list=[]
dictData_1 = getBusRouteId("110")
for i in range(len(dictData_1['ServiceResult']['msgBody']['itemList'])):
Nm_list.append(dictData_1['ServiceResult']['msgBody']['itemList'][i]['busRouteNm'])
Python for loop naturally iterate over the sequence, yielding an item from the sequence in each iteration. So the proper way to write this is:
Nm_list=[]
for item in dictData_1['ServiceResult']['msgBody']['itemList']:
Nm_list.append(item['busRouteNm'])
which is both much more readable AND much more efficient.
Also, this can be further improved using list comprehension:
# intermediate var for readability
source = dictData_1['ServiceResult']['msgBody']['itemList']
Nm_list = [item['busRouteNm'] for item in source]
which is even more efficient (it's optimized by the runtime to avoid memory reallocation when the list grows).
4/ this:
if (data == None):
return None
else:
return data
is a very convoluted way to write:
return data
(also note that since None is a singleton, the preferred way is to use the identity test operator is, ie if data is None - same result but more idiomatic).
I understood that you want to perform some operations on dict_data returned by getBusRouteId() and pass them to another function.
SOLUTION - Just passing the dict_data as an argument to another function will work. No need to make global variables.

Python - Dynamically created dictionary can't be returned or added to another dictionary

I've got a function that dynamically creates some number of dictionaries. What I'm trying to do, is to add each of those dictionaries to another dictionary, and then return that dictionary of dictionaries.
The dictionaries are all created just fine. I've stepped through the process and inspected the variables, it all looks good.
But when I try to add them to another dictionary (as a value), all that gets added is None.
I think this might have something to do with global and local variables, but I'm not sure.
This is the code that creates the dictionaries inside the function
def build_DoD(block, page, PROJECT, master_string, logging):
# other code up here
exec("{} = {{}}".format(base))
exec("{0}['section'] = '{1}'".format(base, section))
exec("{}['question_number'] = '{}'".format(base, question_number))
exec("{}['sub_question_label'] = '{}'".format(base, sub_question_label))
exec("{}['sub_question_text'] = '{}'".format(base, sub_question_text))
exec("{}['display_function'] = '{}'".format(base, display_function))
exec("{}['has_other'] = '{}'".format(base, has_other))
exec("{}['has_explain'] = '{}'".format(base, has_explain))
exec("{}['has_text_year'] = '{}'".format(base, has_text_year))
exec("{}['randomize_response'] = '{}'".format(base, randomize_response))
exec("{}['response_table'] = '{}'".format(base, resp_table))
# here is where I try to add the dynamically created dict to a larger dict.
dict_of_dicts[str(base)] = exec("{}".format(base))
OK, don't do this. Ever.
Don't use exec outside of really special cases. But to answer the question, even if I don't thing you should use it: The problem you are having is that exec does not return anything. It just executes the code, it does not evaluate it. For that, there is the function eval. Replace exec in the last line with eval.
I however fully agree with #RafaelC. This is the XYProblem

Keep data in a variable after deletion of an object in Django

I am on Django 1.9
I would like to keep a list of ids after I deleted the objects having these ids (to be sent back to an Ajax function).
But because I delete these objects, the list is also emptied.
Here is my code:
relatedsteps = Step.objects.filter(theplace=theplaceclicked)
listofrelatedstepsid = relatedsteps.values('id')
response_data = {}
response_data['listofrelatedstepsid'] = listofrelatedstepsid
print(response_data['listofrelatedstepsid'])
relatedsteps.delete()
print(response_data['listofrelatedstepsid'])
The first
print(response_data['listofrelatedstepsid'])
returns
[{u'id': 589}]
But the second one returns:
[]
Any clue? Thanks a lot
QuerySet.values does not actually return a list, it returns a clone of the queryset. Each time you iterate on it (e.g. what print does) it hits the db, so the second print re-executes the query after delete.
What you should do instead is:
response_data['listofrelatedstepsid'] = list(listofrelatedstepsid)
As stated in django documentation
values(*fields)
Returns a ValuesQuerySet — a QuerySet subclass that
returns dictionaries when used as an iterable, rather than
model-instance objects.
So to fix your problem you have to use relatedsteps.values('id') as iterable, calling list or tuple on it is completely fine for you:
response_data['listofrelatedstepsid'] = list(listofrelatedstepsid)
print(response_data['listofrelatedstepsid'])
relatedsteps.delete()
print(response_data['listofrelatedstepsid'])

__iter__() implemented as a generator

I have an object subclass which implements a dynamic dispatch __ iter __ using a caching generator (I also have a method for invalidating the iter cache) like so:
def __iter__(self):
print("iter called")
if self.__iter_cache is None:
iter_seen = {}
iter_cache = []
for name in self.__slots:
value = self.__slots[name]
iter_seen[name] = True
item = (name, value)
iter_cache.append(item)
yield item
for d in self.__dc_list:
for name, value in iter(d):
if name not in iter_seen:
iter_seen[name] = True
item = (name, value)
iter_cache.append(item)
yield item
self.__iter_cache = iter_cache
else:
print("iter cache hit")
for item in self.__iter_cache:
yield item
It seems to be working... Are there any gotchas I may not be aware of? Am I doing something ridiculous?
container.__iter__() returns an iterator object. The iterator objects themselves are required to support the two following methods, which together form the iterator protocol:
iterator.__iter__()
Returns the iterator object itself.
iterator.next()
Return the next item from the container.
That's exactly what every generator has, so don't be afraid of any side-effects.
It seems like a very fragile approach. It is enough to change any of __slots, __dc_list, __iter_cache during active iteration to put the object into an inconsistent state.
You need either to forbid changing the object during iteration or generate all cache items at once and return a copy of the list.
It might be better to separate the iteration of the object from the caching of the values it returns. That would simplify the iteration process and allow you to easily control how the caching is accomplished as well as whether it is enabled or not, for example.
Another possibly important consideration is the fact that your code would not predictively handle the situation where the object being iterated over gets changed between successive calls to the method. One simple way to deal with that would be to populate the cache's contents completely on the first call, and then just yield what it contains for each call -- and document the behavior.
What you're doing is valid albeit weird. What is a __slots or a __dc_list ?? Generally it's better to describe the contents of your object in an attribute name, rather than its type (eg: self.users rather than self.u_list).
You can use my LazyProperty decorator to simplify this substantially.
Just decorate your method with #LazyProperty. It will be called the first time, and the decorator will then replace the attribute with the results. The only requirement is that the value is repeatable; it doesn't depend on mutable state. You also have that requirement in your current code, with your self.__iter_cache.
def __iter__(self)
return self.__iter
#LazyProperty
def __iter(self)
def my_generator():
yield whatever
return tuple(my_generator())

Categories

Resources