OFFSET must not be negative - python

When we have zero results for a pagination object, and we force ?page=-1
then we will get the error OFFSET must not be negative.
-1 will get the last page by default.
So, If you add that parameter in url you can cause an internal error always if the output is empty to paginate.
Example:
page = request.args.get('page', 1, type=int)
pagination = company.comments.order_by(Comment.timestamp.asc()).paginate(
page, per_page=current_app.config['COMMENTS_PER_PAGE'],
error_out=False)
This will avoid the error, but it is annoying make always this type of validation to handle potential empty paginations.
if company.comments.count() > 0:
pagination = ...
else:
pagination=None
My question is about the best way to handle this particular Internal server error.

This is probably what you're trying to do, but sqlalchemy won't evaluate this for you.
My suggestion is to calculate the number of pages yourself and then simply subtract one.
from sqlalchemy import func
if page < 1:
count = session.query(func.count(Comments.id)).scalar()
comments_per_page = current_app.config['COMMENTS_PER_PAGE']
page = count/float(comments_per_page) -1 # gets the last page
Please be aware that this is untested.

Related

How do I make sure a model field is and incremental number for my model?

I have the following model in django:
class Page(models.Model):
page_number = models.IntegerField()
...
and I would like to make sure that this page number keeps being a sequence of integers without gaps, even if I delete some pages in the middle of the existing pages in the data base. For example, I have pages 1, 2 and 3, delete page 2, and ensure page 3 becomes page 2.
At the moment, I am not updating the page_number, but rather reconstructing an increasing sequence without gaps in my front end by:
querying the pages
sorting them according to page_number
assigning a new page_order which is incremental and without gaps
But this does not seem to the be best way to go...
Basically you'd have to manually bump all of the pages down
When you get to custom views you'd do something like this:
def deletePage(request):
if request.method = 'POST':
pageObj = Page.objects.filter(page_number=request.POST.get('page_number')).first()
if pageObj:
pageObj.delete()
# Note: Using F() means Django doesn't need to Fetch the value from the db before subtracting
# - It's a blind change, it's faster though
from django.db.models import F
for i in Page.objects.filter(page_number__gt=request.POST.get('page_number')):
i.page_number = F('page_number') - 1
i.save()
else:
# No Page Object Found
# Raise some error
pass
The admin page is tougher tho, you'd basically do the same thing but in functions described in: Django admin: override delete method
Note: Deleting multiple would be tough, especially if you're deleting page 2 + 4 + 5. Possible, but a lot of thinking involved

Imgur API - How do I retrieve all favorites without pagination?

According to the Imgur Docs, the "GET Account Favorites" API call takes optional arguments for pagination, implying that all objects are returned without it.
However, when I use the following code snippet (the application has been registered and OAuth has already performed against my account for testing), I get only the first 30 JSON objects. In the snippet below, I already have an access_token for an authorized user and can retrieve data for that username. But the length of the returned list is always the first 30 items.
username = token['username']
bearer_headers = {
'Authorization': 'Bearer ' + token['access_token']
}
fav_url = 'https://api.imgur.com/3/account/' + username + '/' + 'favorites'
r = requests.get(fav_url, headers=bearer_headers)
r_json = r.json()
favorites=r_json['data']
len(favorites)
print(favorites)
The requests response returns a dictionary with three keys: status (the HTTP status code), success (true or false), and data, of which the value is a list of dictionaries (one per favorited item).
I'm trying to retrieve this without pagination so I can extract specific metadata values into a Pandas dataframe (id, post date, etc).
I originally thought this was a Pandas display problem in Jupyter notebook, but tracked it back to the API only returning the newest 30 list items, despite the docs indicating otherwise. If I place an arbitrary page number at the end (eg, "/favorites/1"), it returns the 30 items appropriate to that page, but there doesn't seem to be an option to get all items or retrieve a count of the total items or number of pages in advance.
What am I missing?
Postscript: It appears that none of the URIs work without pagination, eg, get account images, get gallery submissions, etc. Anything where there is an optional "/{{page}}" parameter, it will default to first page if none is specified. So I guess the larger question is, "does Imgur API even support non-paginated data, and how is that accessed?".
Paginated data is usually used when the possible size of the response can be arbitrarily large. I would be surprised if a major service like Imgur had an API that didn't work this way.
As you have found, the page attribute may be optional, and if you don't provide it, you get the first page as your response.
If you want to get more than the first page, you will need to loop over the page number:
data = []
page = 0
while block := connection.get(page=page):
data.append(block)
page += 1
This assumes Python3.8+ due to the := assignment expression. If you are on an older version you'll need to set block in the loop body, but the same idea applies.

Getting quantity of issues through github.api

My task is to get the number of open issues using github.api. Unfortunately, when I parsing any repositories, I get the same number - 30.
import requests
r = requests.get('https://api.github.com/repos/grpc/grpc/issues')
count = 0
for item in r.json():
if item['state'] == 'open':
count += 1
print(count)
Is there any way to get a real quantity of issues?
See the documentation about the Link response header, also you can pass the state or filters.
https://developer.github.com/v3/guides/traversing-with-pagination/
https://developer.github.com/v3/issues/
You'll have to page through.
http://.../issues?page=1&state=open
http://.../issues?page=2&state=open
The /issues/ endpoint is paginated: it means that you will have to iterate through several pages to get all the issues.
https://api.github.com/repos/grpc/grpc/issues?page=1
https://api.github.com/repos/grpc/grpc/issues?page=2
...
But there is a better way to get what you want. The GET /repos/:owner/:repo directly gives the number of open issues on a repository.
For instance, on https://api.github.com/repos/grpc/grpc, you can see:
"open_issues_count": 1052,
Click here to have a look at the documentation for this endpoint.

how to find the last available url which does not return 302 (Redirect) status code in a url list quickly

Now I am facing a problem like this:
Say I have a list of urls, e.g.
['http://example.com/1',
'http://example.com/2',
'http://example.com/3',
'http://example.com/4',
...,
'http://example.com/100']
And some of them are unavailable urls, requesting for those urls will result in 302 redirect status code. e.g. .../1 - .../50 are available urls, but .../51 will cause 302. Then .../50 is the url I want.
I want to find out which url is the last availble url (which does not return 302 code), I believe binary search will do the work, but I wonder how to implement it with better efficiency. I use python's urllib2 to detect 302 status code.
p.s. e.g. .../1 - .../50 are available urls, but .../51 will cause 302. Then .../50 is the url I want.
This answer makes the assumption that your URLs are currently ordered in a meaningful way, and that all URLs up to some value n will be available and all URLs after n will result in a 302.
If this is the case, then you can adapt this binary search answer to fit your needs:
import requests
def binary_search_urls(urls, lo=0, hi=None):
if hi is None:
hi = len(urls)
while lo < hi:
mid = (lo+hi)//2
status = requests.head(urls[mid]).status_code
if status != 302:
lo = mid+1
else:
hi = mid
return lo - 1
This will give you the index of the last good URL, or -1 if there are no good URLs.
I would just check the entire lot, however I would use requests instead of urllib2 and make sure to only make HEAD requests to keep bandwith down (which is possibly going to be your greatest bottle neck anyway).
import requests
urls = [...]
results = [(url, requests.head(url).status_code) for url in urls]
Then go from there...
I don't see how a binary search could be at all faster than straight in order iteration, and in most cases, it would end up being slower. Given n is the length of the list, if you are searching for the last good url of the first good batch, only the case where urls[n/2]-1 is your target would take the same number of searches as just brute force iteration; all others would take more. If you are looking for the last good url in the entire list, the only search target that would take the same number of searches compared to a reversed order iteration would be urls[n/2]-1. Binary search is only faster when your dataset is ordered. For an unordered dataset, sampling the middle of the set tells you nothing about being able to exclude values to either side of it, so you still have to process the entire sequence before you can tell anything.
I suspect what you may really be wanting here is a way to sample your dataset at intervals so that you can run fewer requests before finding your target, which isn't quite the same as a binary search. Binary search relies on the fact that sampling a point in your sequence provides information on being able to exclude either one side or the other of the sequence from subsequent searches based upon a binary condition. What you have is a system where if a sample fails the test, you can exclude one side, but if it passes the test, it tells you nothing about what you can assume about any other values in the list. That doesn't really work for a binary search.

GAE Datastore - Is there a next page / Are there x+1 entities?

Currently, to determine whether or not there is a next page of entities I'm using the following code:
q = Entity.all().fetch(10)
cursor = q.cursor()
extra = q.fetch(1)
has_next_page = False
if extra:
has_next_page = True
However, this is very expensive in terms of the time it takes to execute the 'extra' query. I need to extract the cursor after 10 results, but I need to fetch 11 to see if there is a succeeding page.
Anyone have any better methods?
If you fetch 11 items straight away you'll only have to fetch 1 extra item to know if there is a next page or not. And you can just display the first 10 results and use the 11th result only as a "next page" indicator.

Categories

Resources