What's wrong in this caching function in Django?

What's wrong in this caching function in Django? - python

I've created the model for counting the number of views of my page:
class RequestCounter(models.Model):
count = models.IntegerField(default=0)
def __unicode__(self):
return str(self.count)
For incrementing the counter I use:
def inc_counter():
counter = RequestCounter.objects.get_or_create(id =1)[0]
counter.count = F('count') + 1
counter.save()
Then I show the number of the page views on my page and it works fine.
But now I need to cache my counter for some time. I use:
def get_view_count():
view_count = cache.get('v_count')
if view_count==None:
cache.set('v_count',RequestCounter.objects.filter(id = 1)[0],15)
view_count = cache.get('v_count')
return view_count
After this I'm passing the result of get_view_count to my template.
So I expect, that my counter would now stand still for 15 sec and then change to a new value. But, actually, it isn't quite so: when I'm testing this from my virtual ubuntu it jumps, for example, from 55 to 56, after 15 secs it changes and now jumps from 87 to 88.
The values are always alternating and they don't differ much from each other.
If I'm trying this locally from windows, the counter seems to be fine, until I try to open more than browser.
Got no idea what to do with it. Do you see what can be the problem?
p.s. i tried using caching in the templates - and received the same result.

What CACHE_BACKEND are you using? If it's locmem:// and you're running Apache, you'll have a different cache active for each Apache child, which would explain the differing results. I had this a while ago and it was a subtle one to work out. I'd recommend switching to memcache if you're not already on it, as this won't give you the multiple-caches problem

Related

Waiting MySQL execution before rendering template

I am using Flask and MySQL. I have an issue with updated data not showing up after the execution.
Currently, I am deleting and redirecting back to the admin page so I may then have a refreshed version of the website. However, I still get old entries showing up in the table I have in the front end. After refreshing manually, everything works normally. The issue sometimes also happens with data simply not being sent to the front-end at all as a result of the template being rendered faster than the MySQL execution and an empty list being sent forward, I assume.
On Python I have:
#app.route("/admin")
def admin_main():
query = "SELECT * FROM categories"
conn.executing = True
results = conn.exec_query(query)
return render_template("src/admin.html", categories = results)
#app.route('/delete_category', methods = ['POST'])
def delete_category():
id = request.form['category_id']
query = "DELETE FROM categories WHERE cid = '{}'".format(id)
conn.delete_query(query)
return redirect("admin", code=302)
admin_main is the main page. I tried adding some sort of "semaphore" system, only executing once "conn.executing" would become false, but that did not work out either. I also tried playing around with async and await, but no luck ("The view function did not return a valid response").
I am somehow out of options in this case and do not really know how to treat the problem. Any help is appreciated!

I figured that the problem was not with the data not being properly read, but with the page not being refreshed. Though the console prints a GET request for the page, it does not direct since it is already on that same page.
The only workaround I am currently working on is socket.io implementation to have the content updated dynamically.

python requests...or something else... mysteriously caching? hashes don't change right when file does

I have a very odd bug. I'm writing some code in python3 to check a url for changes by comparing sha256 hashes. The relevant part of the code is as follows:
from requests import get
from hashlib import sha256
def fetch_and_hash(url):
file = get(url, stream=True)
f = file.raw.read()
hash = sha256()
hash.update(f)
return hash.hexdigest()
def check_url(target): # passed a dict containing hash from previous examination of url
t = deepcopy(target)
old_hash = t["hash"]
new_hash = fetch_and_hash(t["url"])
if old_hash == new_hash:
t["justchanged"] = False
return t
else:
t["justchanged"] = True
return handle_changes(t, new_hash) # records the changes
So I was testing this on an old webpage of mine. I ran the check, recorded the hash, and then changed the page. Then I re-ran it a few times, and the code above did not reflect a new hash (i.e., it followed the old_hash == new_hash branch).
Then I waited maybe 5 minutes and ran it again without changing the code at all except to add a couple of debugging calls to print(). And this time, it worked.
Naturally, my first thought was "huh, requests must be keeping a cache for a few seconds." But then I googled around and learned that requests doesn't cache.
Again, I changed no code except for print calls. You might not believe me. You might think "he must have changed something else." But I didn't! I can prove it! Here's the commit!
So what gives? Does anyone know why this is going on? If it matters, the webpage is hosted on a standard commercial hosting service, IIRC using Apache, and I'm on a lousy local phone company DSL connection---don't know if there are any serverside caching settings going on, but it's not on any kind of CDN.
So I'm trying to figure out whether there is some mysterious ISP cache thing going on, or I'm somehow misusing requests... the former I can handle; the latter I need to fix!

Deciphering datetime discrepancy between application output and server

In a Django/Python application where I'm using redis, I do:
my_server = redis.Redis(connection_pool=POOL)
updated_at = time.time()
object_hash = "np:"+str(object_id)
sorted_set = "sn:"+str(user_id)
my_server.zadd(sorted_set, object_hash, updated_at)
This is straight forward. Essentially, I'm maintaining a sorted set that contains objects sorted by time of updating the object.
The problem is if I used redis-cli to get zrange sorted_set 0 -1 WITHSCORES, the score shows time that is precisely 5 hours older than what was originally in updated_at.
e.g. if updated_at was fed 1479646405.21, the redis sorted set score ends up being 1479628405.497179 (as per output from redis-cli). I.e. 5 hours behind. This looks like an issue of timezone - my location's ahead by 5 hours from UTC.
My question is: why does the score jump back 5 hours when updating the redis server? Whenever I print the updated_at variable from within my application, I get the correct number. Is this a Linux issue (the OS my application resides on is Ubuntu 14.04), and if so, can you explain precisely what could be going on? Being a beginner, I'm trying to understand the dynamics at play here. Thanks!

Updated approach to Google search with python

I was trying to use xgoogle but I has not been updated for 3 years and I just keep getting no more than 5 results even if I set 100 results per page. If anyone uses xgoogle without any problem please let me know.
Now, since the only available(apparently) wrapper is xgoogle, the option is to use some sort of browser, like mechanize, but that is gonna make the code entirely dependant on google HTML and they might change it a lot.
Final option is to use the Custom search API that google offers, but is has a redicolous 100 requests per day limit and a pricing after that.
I need help on which direction should I go, what other options do you know of and what works for you.
Thanks !

It only needs a minor patch.
The function GoogleSearch._extract_result (Line 237 of search.py) calls GoogleSearch._extract_description (Line 258) which fails causing _extract_result to return None for most of the results therefore showing fewer results than expected.
Fix:
In search.py, change Line 259 from this:
desc_div = result.find('div', {'class': re.compile(r'\bs\b')})
to this:
desc_div = result.find('span', {'class': 'st'})
I tested using:
#!/usr/bin/python
#
# This program does a Google search for "quick and dirty" and returns
# 200 results.
#
from xgoogle.search import GoogleSearch, SearchError
class give_me(object):
def __init__(self, query, target):
self.gs = GoogleSearch(query)
self.gs.results_per_page = 50
self.current = 0
self.target = target
self.buf_list = []
def __iter__(self):
return self
def next(self):
if self.current >= self.target:
raise StopIteration
else:
if(not self.buf_list):
self.buf_list = self.gs.get_results()
self.current += 1
return self.buf_list.pop(0)
try:
sites = {}
for res in give_me("quick and dirty", 200):
t_dict = \
{
"title" : res.title.encode('utf8'),
"desc" : res.desc.encode('utf8'),
"url" : res.url.encode('utf8')
}
sites[t_dict["url"]] = t_dict
print t_dict
except SearchError, e:
print "Search failed: %s" % e

I think you misunderstand what xgoogle is. xgoogle is not a wrapper; it's a library that fakes being a human user with a browser, and scrapes the results. It's heavily dependent on the format of Google's search queries and results pages as of 2009, so it's no surprise that it doesn't work the same in 2013. See the announcement blog post for more details.
You can, of course, hack up the xgoogle source and try to make it work with Google's current format (as it turns out, they've only broken xgoogle by accident, and not very badly…), but it's just going to break again.
Meanwhile, you're trying to get around Google's Terms of Service:
Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide.
They're been specifically asked about exactly what you're trying to do, and their answer is:
Google's Terms of Service do not allow the sending of automated queries of any sort to our system without express permission in advance from Google.
And you even say that's explicitly what you want to do:
Final option is to use the Custom search API that google offers, but is has a redicolous 100 requests per day limit and a pricing after that.
So, you're looking for a way to access Google search using a method other than the interface they provide, in a deliberate attempt to get around their free usage quota without paying. They are completely within their rights to do anything they want to break your code—and if they get enough hits from people doing things kind of thing, they will do so.
(Note that when a program is scraping the results, nobody's seeing the ads, and the ads are what pay for the whole thing.)
Of course nobody's forcing you to use Google. EntireWeb has an free "unlimited" (as in "as long as you don't use too much, and we haven't specified the limit") search API. Bing gives you a higher quota, and amortized by month instead of by day. Yahoo BOSS is flexible and super-cheap (and even offers a "discount server" that provides lower-quality results if it's not cheap enough), although I believe you're forced to type the ridiculous exclamation point. If none of them are good enough for you… then pay for Google.

Datastore performance, my code or the datastore latency

I had for the last month a bit of a problem with a quite basic datastore query. It involves 2 db.Models with one referring to the other with a db.ReferenceProperty.
The problem is that according to the admin logs the request takes about 2-4 seconds to complete. I strip it down to a bare form and a list to display the results.
The put works fine, but the get accumulates (in my opinion) way to much cpu time.
#The get look like this:
outputData['items'] = {}
labelsData = Label.all()
for label in labelsData:
labelItem = label.item.name
if labelItem not in outputData['items']:
outputData['items'][labelItem] = { 'item' : labelItem, 'labels' : [] }
outputData['items'][labelItem]['labels'].append(label.text)
path = os.path.join(os.path.dirname(__file__), 'index.html')
self.response.out.write(template.render(path, outputData))
#And the models:
class Item(db.Model):
name = db.StringProperty()
class Label(db.Model):
text = db.StringProperty()
lang = db.StringProperty()
item = db.ReferenceProperty(Item)
I've tried to make it a number of different way ie. instead of ReferenceProperty storing all Label keys in the Item Model as a db.ListProperty.
My test data is just 10 rows in Item and 40 in Label.
So my questions: Is it a fools errand to try to optimize this since the high cpu usage is due to the problems with the datastore or have I just screwed up somewhere in the code?
..fredrik
EDIT:
I got a great response from djidjadji at the google appengine mailing list.
The new code looks like this:
outputData['items'] = {}
labelsData = Label.all().fetch(1000)
labelItems = db.get([Label.item.get_value_for_datastore(label) for label in labelsData ])
for label,labelItem in zip(labelsData, labelItems):
name = labelItem.name
try:
outputData['items'][name]['labels'].append(label.text)
except KeyError:
outputData['items'][name] = { 'item' : name, 'labels' : [label.text] }

There's certainly things you can do to optimize your code. For example, you're iterating over a query, which is less efficient than fetching the query and iterating over the results.
I'd recommend using Appstats to profile your app, and check out the Patterns of Doom series of posts.

Don't just try things. That's guessing. You'll only be right some of the time. Don't ask other people to guess either, for the same reason.
Be right every time.
Just pause the code several times and look at the call stack. That will tell you exactly what's going on.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.