going to next page using django paginator sends request again. - python

My google search application making a request each time while i am using paginator. Suppose i have a 100 records. Each page have to show 10 records so ten pages. When i click 2nd page it again sending a request. Ideally it should not send the request.

When i click 2nd page it again sending a request. Ideally it should not send the request.
What do you mean by request? Is it a request to Google?
Your application apparently does not cache the results. If your request to Google returns 100 pages then you should cache those hundred. When you request the second page the view should retrieve this cache and return the second page to you.
If you mean request to your app, then #Daniel's comment has it right. You can get around this by sending all the results to the browser and then do the pagination using JavaScript to avoid this.
A more detailed answer is difficult without seeing some code.

Related

How to identify the original HTTP request from client?

"To display a Web page, the browser sends an original request to fetch
the HTML document that represents the page. It then parses this file,
making additional requests corresponding to execution scripts, layout
information (CSS) to display, and sub-resources contained within the
page (usually images and videos)."
The previous quote is form MDN Web Docs An overview of HTTP
My question is: I want to identify the original request from the client, and then temporarily store the request and all subrequests made to the server, but when the client make another original request I want to replace the temporarily data with the new requests.
for example let say that I have an html page that when parsed by the client make additional requests to some resources on the server, when the user reload the page he is just making another original request, so the temporarily stored request data should be replaced by the new original request and its subrequests, the same happens when the client request for another html page.

Logging and potentially blocking XHR Requests by javascript using selenium

I have some kind of single page application which composes XHR requests on-the-fly. It is used to implement pagination for a list of links I want to click on using selenium.
The page only provides a Goto next page link. When clicking the next page link a javascript function creates a XHR request and updates the page content.
Now when I click on one of the links in the list I get redirected to a new page (again through javascript with obfuscated request generation). Though this is exactly the behaviour I want, when going back to the previous page I have to start over from the beginning (i.e. starting at page 0 and click through to page n)
There are a few solutions which came to my mind:
block the second XHR request when clicking on the links in the list, store it and replay it later. This way I can skim through the pages but keep my links for replay later
Somehow 'inject' the first XHR request which does the pagination in order to save myself from clicking through all the pages again
I was also trying out some simple proxies but https is causing troubles for me and was wondering if there is any simple solution I might have missed.
browsermobproxy integrates easily and will allow you to capture all the requests made. It should also allow you to block certain calls from returning.
It does sound like you are scraping a site, so it might be worth parsing the data the XHR calls make and mimicking them.

Scraping site that uses AJAX

I've read some relevant posts here but couldn't figure an answer.
I'm trying to crawl a web page with reviews. When site is visited there are only 10 reviews at first and a user should press "Show more" to get 10 more reviews (that also adds #add10 to the end of site's address) every time when he scrolls down to the end of reviews list. Actually, a user can get full review list by adding #add1000 (where 1000 is a number of additional reviews) to the end of the site's address. The problem is that I get only first 10 reviews using site_url#add1000 in my spider just like with site_url so this approach doesn't work.
I also can't find a way to make an appropriate Request imitating the origin one from the site. Origin AJAX url is of the form 'domain/ajaxlst?par1=x&par2=y' and I tried all of this:
Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all)
Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all,
headers={all_headers})
Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all,
headers={all_headers}, cookies={all_cookies})
But every time I'm getting a 404 Error. Can anyone explain what I'm doing wrong?
What you need is a headless browser for this since request module can not handle AJAX well.
One of such headless browser is selenium.
i.e.)
driver.find_element_by_id("show more").click() # This is just an example case
Normally, when you scroll down the page, Ajax will send request to the server, and the server will then response a json/xml file back to your browser to refresh the page.
You need to figure out the url linked to this json/xml file. Normally, you can open your firefox browser and open tools/web dev/web console. monitor the network activities and you can easily catch this json/xml file.
Once you find this file, then you can directly parse reviews from them (I recommend Python Module requests and bs4 to do this work) and decrease a huge amount of time. Remember to use some different clients and IPs. Be nice to the server and it won't block you.

Double form submission

I have a Django 1.6 app and I'm facing a problem with double submissions.
I could use the ideas spread all over here like: redirections, tokens, etc. (I got a JS prevention but I don't trust it) but after the submission I have to make another request to an API. The external API request takes let's say 20 seconds so there's plenty of time to play with the submit button.
The best solution I got right now is to save the CSRF (or any other unique token) in the DB and check if that token exists, if so 'kill' the request.
But that's the thing, can I kill the request? I can't respond with a 500, 404 because it's going to be delivered to the browser faster than the first/original request that does the API call.
Is there a way to kill/drop/pend a request with Django? Maybe this idea is crappy? Please share your knowledge.
I hope my English is understandable.

Parse HTML Infinite Scroll

I'm trying to parse the HTML of a page with infinite scrolling. I want to load all of the content so that I can parse it all. I'm using Python. Any hints?
Those pages update their html with AJAX. Usually you just need to find the new AJAX requests send by browser, guess the meaning of the AJAX url parameters and fetch the data from the API.
API servers may validate the user agent, referer, cookie, oauth_token ... of the AJAX request, keep an eye on them.
the data is
either loaded in advance
or the page sends a request while you scroll
you can use httpfox to find the request and send it

Categories

Resources