l[:] performance problem - python

Recently I was debugging a code like following
def getList():
#query db and return a list
total_list = Model.objects.all()
result = list()
for item in total_list:
if item.attr1:
result.append(item)
return result
# in main code
org_list = getList()
list = orgList[:]#this line cause cpu problems.
if len(org_list)>0 and org_list[0].is_special:
my_item = org_list[0]
for i in list:
print_item(i)
doSomethingelse(list[0])
In order to simplify the code I change most of it but the main part is here.
In getList method we query db and get 20~30 rows. Then we create a python list from it and return it.
in main method we get org_list variable from getList method
and slice it with orgList[:]
and loop over this list and call spesific elements on it like list[0]
The problem here is that this code runs on a really busy server and unfortunaletty it uses most of the cpu and eventually locks our servers.
Problem here is the line that we slice list varibale with list[:]
if we dont do that and just use org_list variable instead our servers does not have a problem. Does anybosy have any idea why that might happen. is slicing uses alot of cpu or when we use a sliced list. does it uses alot of cpu?

The code that you are showing would run in the 0.1 microseconds that it would take to raise an exception:
org_list = getList()
list = orgList[:]#this line cause cpu problems.
orgList should be org_list. Show the minimal code that actually reproduces the problem.
Also, that kills the list built-in function. Don't do that.
Update Another thought: A common response to "My Django app runs slowly!" is "turn off the debug flag" ... evidently it doesn't free up memory in debug mode.
Update2 about """I found out that when you slice a list. it actually works as a view to that list and just call original list methods to get actual item.""" Where did you get that idea? That can't be the case with a plain old list! Have you redefined list somewhere?
In your getList function:
(1) put in print type(list)
(2) replace result = list() with result = [] and see whether the problem goes away.

list = org_list[:] makes a copy of org_list. Usually you see something like this when you need to modify the list but you want to keep the original around as well.
The code you are showing me doesn't seem like it actually modifies org_list, and if doSomethingelse(list[0]) doesn't modify list, I don't see why it's actually being copied in the first place. Even if it does modify list, as long as org_list isn't needed after those few lines of code, you could probably get away with just using org_list and not doing the slice copy.

I would like to know:
what is the type of total_list ? Please do print type(total_list)
what are the types of the elements of total_list ?
have you an idea of the sizes of these elements ?
what is the number of elements in the list returned by getList() ? 20-30 ?
Remarks:
-giving the keyword "list" as a name for a list isn't a good idea (in list = orgList[:] )
-your getList function can be written :
def getList():
#query db and return a list
total_list = Model.objects.all()
return [ item for item in total_list if item.attr1 ]

unfortunately actual code is property of the company I worked for so I cannot give it exactly like it is but here is some part of it (variable names are changed,but argument of mainTree method was really named list.)
mainObj = get_object_or_404(MainObj, code_name=my_code_name)
if (not mainObj.is_now()) and (not request.user.is_staff):
return HttpResponseRedirect("/")
main_tree = mainObj.main_tree()
objects = main_tree[:] # !!!
if objects.__len__() == 1 and not objects[0].has_special_page:
my_code_name = objects[0].code_name
...
...
...
mainObj.main_tree() is the following
def main_tree(self):
def append_obj_recursive(list, obj):
list.append(obj)
for c in self.obj_set.filter(parent_obj=obj):
append_obj_recursive(list, c)
return list
root_obj = self.get_or_create_root_obj();
top_level_objs = self.obj_set.filter(parent_obj=root_obj).order_by("weight")
objs = []
for tlc in top_level_objs:
append_obj_recursive(objs, tlc)
return objs
This is a really weird problem. slicing should not cause such a bizare problem. If I can find an answer I will post it here too.

Related

How can I modify this class to check if a list of lists is empty or not?

I am trying to build a class object that can take in a list of lists and check the entries to see if one is empty, and if it does, returns a message.
class EmptyCheck():
def __init__(self, some_list):
self.some_list = some_list
def check_empty(some_list):
for item in some_list:
if '' in item:
print('please fill out survey')
data = [[1],[2],[''],['apples']]
x = EmptyCheck(data)
x.check_empty()
The problem seems to be that the EmptyCheck object is not iterable. But this is where I get a bit confused because I am not trying to iterate over the EmptyCheck object, I am trying to iterate over some_list. So I am hoping someone could help to clarify what is going on and help me to understand this issue a bit deeper. I suspect I will need to add some of the special dunder methods, but maybe I don't?
Your check_empty method checks a new list passed as a function argument, not the list you store as an instance attribute. Change it to:
def check_empty(self):
for item in self.some_list:
As a side note, you should return right after the print statement so that you don't print the same message multiple times in case there are multiple sub-lists with empty strings.

append to request.sessions[list] in Django

Something is bugging me.
I'm following along with this beginner tutorial for django (cs50) and at some point we receive a string back from a form submission and want to add it to a list:
https://www.youtube.com/watch?v=w8q0C-C1js4&list=PLhQjrBD2T380xvFSUmToMMzERZ3qB5Ueu&t=5777s
def add(request):
if 'tasklist' not in request.session:
request.session['tasklist'] = []
if request.method == 'POST':
form_data = NewTaskForm(request.POST)
if form_data.is_valid():
task = form_data.cleaned_data['task']
request.session['tasklist'] += [task]
return HttpResponseRedirect(reverse('tasks:index'))
I've checked the type of request.session['tasklist']and python shows it's a list.
The task variable is a string.
So why doesn't request.session['tasklist'].append(task) work properly? I can see it being added to the list via some print statements but then it is 'forgotten again' - it doesn't seem to be permanently added to the tasklist.
Why do we use this request.session['tasklist'] += [task] instead?
The only thing I could find is https://ogirardot.wordpress.com/2010/09/17/append-objects-in-request-session-in-django/ but that refers to a site that no longer exists.
The code works fine, but I'm trying to understand why you need to use a different operation and can't / shouldn't use the append method.
Thanks.
The reason why it does not work is because django does not see that you have changed anything in the session by using the append() method on a list that is in the session.
What you are doing here is essentially pulling out the reference to the list and making changes to it without the session backend knowing anything about it. An other way to explain:
The append() method is on the list itself not on the session object
When you call append() on the list you are only talking to the list and the list's parent (the session) has no idea what you guys are doing
When you however do an assignment on the session itself session['whatever'] = 'something' then it knows that something is up and changes are made
So the key here is that you need to operate on the session object directly if you want your changes to be updated automatically
Django only thinks it needs to save a changed session item if the item got reassigned to the session. See here: django session base code the __setitem__ method containing a self.modified = True statement.
The session['list'] += [new_element] adds a new list item (mutates the list stored in the session, so the list reference stays the same) and then gets it reassigned to the session again -> thus triggering first a __getitem__ call -> then your += / __iadd__ runs on the value read -> then a __setitem__ call is made (with the list ref. passed to it). You can see it in the django codebase that it marks the session after each __setitem__ call as modified.
The session['list'] = session['list'] + [new_item] mode of doing the same does create a new list every time it's run so its a bit less efficient, but you should not store hundreds of items in the session anyway. So you're probably fine. This also works exactly as above.
However if you use sub-keys in the session like session['list']['x'] = 'whatever' the session will not see itself as modified so you need to mark it as by request.session.modified = True
Short answer: It's about how Python chooses to implement the dict data structure.
Long answer:
Let's start by saying that request.session is a dictionary.
Quoting Django's documentation, "By default, Django only saves to the session database when the session has been modified – that is if any of its dictionary values have been assigned or deleted". Link
So, the problem is that the session database is not being modified by
request.session['tasklist'].append(task)
Seeing the related parts Django's Session base code (as posted by #Csaba K. in an answer), the variable self.modified is to be set True when setitem dunder method is called.
Now, at this step the problem seems like the setitem dunder method is not being called with request.session['tasklist'].append(task) but with request.session['tasklist'] += [task] it gets called. It is not due to if the reference of request.session['tasklist'] is changing or not as pointed out by another answer, because the reference to the underlying list remains the same.
To confirm, let's create a custom dictionary which extends the Python dict, and print something when setitem dunder method is called.
class MyDict(dict):
def __init__(self, globalVar):
super().__init__()
self.globalVar = globalVar
def __setitem__(self, key, value):
super().__setitem__(key, value)
print("Called Set item when: ", end="")
myDict = MyDict(0)
print("Creating Dict")
print("-----")
myDict["y"] = []
print("Adding a new key-value pair")
print("-----")
myDict["y"] += ["x"]
print(" using +=")
print("-----")
myDict["y"].append("x")
print("append")
print("-----")
myDict["y"].extend(["x"])
print("extend")
print("-----")
myDict["y"] = myDict["y"] + ["x"]
print(" using +",)
print("-----")
It prints:
Creating Dict
-----
Called Set item when: Adding a new key-value pair
-----
Called Set item when: using +=
-----
append
-----
extend
-----
Called Set item when: using +
-----
As we can see, setitem dunder method is called and in turn self.modified is set true only when adding a new key-value pair, or using += or using +, but not when initializing, appending or extending an iterable (in this case a list). Now, the operator + and += do very different things in Python, as explained in the other answer. += behaves more like the append method but in this case, I guess it's more about how Python chooses to implement the dict data structure rather than how +, += and append behave on lists.
I found this while doing some more searching:
https://code.djangoproject.com/wiki/NewbieMistakes
Scroll to 'Appending to a list in session doesn't work'
Again, it is a very dated entry but still seems to hold true.
Not completely satisfied because this does not answer the question as to 'why' this doesn't work, but at the very least confirms 'something's up' and you should probably still use the recommendations there.
(if anyone out there can actually explain this in a more verbose manner then I'd be happy to hear it)

Checking for Type `NoneType` fails in randomly

I'm currently working on function that gets some data from an API and outputs it to csv file. In this function there is a part where I filter the retrieved data before passing it on. Since there is a possibility that the request returns None I decided to explicitly check for that and implement a special behavior in this case. Please take a look at the following code snippet.
r = requests.post(self.settings['apiurl'],
headers=self.settings['header'],
json={'query': query_string, 'variables': vars})
jsd = json.loads(r.text)
if jsd is not None:
lists = jsd["data"]["User"]["stats"]["favouredGenres"]
newlist = [entry for entry in lists if entry["meanScore"] is not None]
if not newlist:
return None
else:
jsd["data"]["User"]["stats"]["favouredGenres"] = newlist
try:
jsd = jsd # json.loads(jsd)
except ValueError:
return None
else:
return jsd
else:
return None
The if jsd is not None part is the before mentioned check. If jsd is not None I filter again some parts out that I don't need and return a modified version of jsd.
The problem now is that I sporadically get the error message:
lists = jsd["data"]["User"]["stats"]["favouredGenres"]
TypeError: 'NoneType' object is not subscriptable
The first thing that really confuses me is that this error appears completely randomly. In one run it doesn't work on user_id=7 in the next one it doesn't work on user_id=8475 but works fine for user_id=7, etc. The second thing that confuses me is that it is even possible for the error to pop up since I explicitly check before accessing the variable if it is of type NoneType. Beside from these isolated cases where an error occurs the code produces exactly the results I expect....
I hope I provided you with everything necessary, if not, please let me know. Any kind of suggestions on how to approach this kind of problem are welcome.
Blindly sub-scripting a dictionary is usually fraught with this kind of problem. You should switch to using the get method on dictionaries. It can handle this case pretty nicely:
lists = (jsd.get("data", {}).get("User", {})
.get("stats", {}).get("favouredGenres", []))
Here we exploit the fact that get() can take a default argument, which it will use if the given key isn't found. In the first 3 cases, we provide an empty dictionary as the default, and we provide an empty list for the last case (since that's what I'm guessing you're expecting).
If you get into the habit of using this accessing method, you'll avoid these kinds of problems.
The Python Anti-Patterns book lists this as one to be aware of. The corollary is the setdefault() call when setting values in a dictionary.

Keep data in a variable after deletion of an object in Django

I am on Django 1.9
I would like to keep a list of ids after I deleted the objects having these ids (to be sent back to an Ajax function).
But because I delete these objects, the list is also emptied.
Here is my code:
relatedsteps = Step.objects.filter(theplace=theplaceclicked)
listofrelatedstepsid = relatedsteps.values('id')
response_data = {}
response_data['listofrelatedstepsid'] = listofrelatedstepsid
print(response_data['listofrelatedstepsid'])
relatedsteps.delete()
print(response_data['listofrelatedstepsid'])
The first
print(response_data['listofrelatedstepsid'])
returns
[{u'id': 589}]
But the second one returns:
[]
Any clue? Thanks a lot
QuerySet.values does not actually return a list, it returns a clone of the queryset. Each time you iterate on it (e.g. what print does) it hits the db, so the second print re-executes the query after delete.
What you should do instead is:
response_data['listofrelatedstepsid'] = list(listofrelatedstepsid)
As stated in django documentation
values(*fields)
Returns a ValuesQuerySet — a QuerySet subclass that
returns dictionaries when used as an iterable, rather than
model-instance objects.
So to fix your problem you have to use relatedsteps.values('id') as iterable, calling list or tuple on it is completely fine for you:
response_data['listofrelatedstepsid'] = list(listofrelatedstepsid)
print(response_data['listofrelatedstepsid'])
relatedsteps.delete()
print(response_data['listofrelatedstepsid'])

Python: Return function won’t return a list

The following function prints installed apps from a server.
def getAppNames():
for app in service.apps:
print app.name
It works absolutely fine and it prints a list of installed apps like so:
App A
App B
App C
App D
App E
App F
However when I change the "print" to a "return", all I get is "App A". I have seen similar questions on this but I cant find a solution and have explored different methods. Basicly I require the return function like the print function, I would appreciate any help.
Thanks.
The return statement causes your function to immediately exit. From the documentation:
return leaves the current function call with the expression list (or
None) as return value.
The quick fix is to save the names in a temporary list, then return the list:
def getAppNames():
result = []
for app in service.apps:
result.append(app.name)
return result
Since this is such a common thing to do -- iterate over a list and return a new list -- python gives us a better way: list comprehensions.
You can rewrite the above like this:
def getAppNames:
return [app.name for app in service.apps]
This is considered a "pythonic" solution, which means it uses special features of the language to make common tasks easier.
Another "pythonic" solution involves the use of a generator. Creating a generator involves taking your original code as-is, but replacing return With yield. However, this affects how you use the function. Since you didn't show how you are using the function, I'll not show that example here since it might add more confusion than clarity. Suffice it to say there are more than two ways to solve your problem.
There are two solutions:
def getAppNames():
return [app.name for app in service.apps]
or
def getAppNames():
for app in service.apps:
yield app.name
Unlike return, yield will stay in the loop. This is called a "generator" in Python. You can then use list() to turn the generator into a list or iterate over it:
for name in getAppNames():
...
The advantage of the generator is that it doesn't have to build the whole list in memory.
You should read basic docs about Python :)
try this:
def getAppNames():
return [app.name for app in service.apps]
After a return statement the function "ends" - it leaves the function so your for loop actually does only a single iteration.
To return a list you can do -
return [app for app in service.apps]
or just -
return service.apps
The first time you hit a return command, the function returns which is why you only get one result.
Some options:
Use 'yield' to yield results and make the function act as an generator
Collect all the items in a list (or some other collection) and return that.

Categories

Resources