creating non-reloading dynamic webapps using Django - python

As far as I know, for a new request coming from a webapp, you need to reload the page to process and respond to that request.
For example, if you want to show a comment on a post, you need to reload the page, process the comment, and then show it. What I want, however, is I want to be able to add comments (something like facebook, where the comment gets added and shown without having to reload the whole page, for example) without having to reload the web-page. Is it possible to do with only Django and Python with no Javascript/AJAX knowledge?
I have heard it's possible with AJAX (I don't know how), but I was wondering if it was possible to do with Django.
Thanks,

You want to do that with out any client side code (javascript and ajax are just examples) and with out reloading your page (or at least part of it)?
If that is your question, then the answer unfortunately is you can't. You need to either have client side code or reload your page.
Think about it, once the client get's the page it will not change unless
The client requests the same page from the server and the server returns and updated one
the page has some client side code (eg: javascript) that updates the page.

You definitely want to use AJAX. Which means the client will need to run some javascript code.
If you don't want to learn javascript you can always try something like pyjamas. You can check out an example of it's HttpRequest here
But I always feel that using straight javascript via a library (like jQuery) is easier to understand than trying to force one language into another one.

To do it right, ajax would be the way to go BUT in a limited sense you can achieve the same thing by using a iframe, iframe is like another page embedded inside main page, so instead of refreshing whole page you may just refresh the inner iframe page and that may give the same effect.
More about iframe patterns you can read at
http://ajaxpatterns.org/IFrame_Call

Maybe a few iFrames and some Comet/long-polling? Have the comment submission in an iFrame (so the whole page doesn't reload), and then show the result in the long-polled iFrame...
Having said that, it's a pretty bad design idea, and you probably don't want to be doing this. AJAX/JavaScript is pretty much the way to go for things like this.
I have heard it's possible with AJAX...but I was
wondering if it was possible to do
with Django.
There's no reason you can't use both - specifically, AJAX within a Django web application. Django provides your organization and framework needs (and a page that will respond to AJAX requests) and then use some JavaScript on the client side to make AJAX calls to your Django-backed page that will respond correctly.
I suggest you go find a basic jQuery tutorial which should explain enough basic JavaScript to get this working.

Related

Login and Clicking Buttons with Python "requests" Module

I have been playing with requests module on Python for a while as part of studying HTTP requests/responses; and I think I grasped most of the fundamental things on the topic that are supposed to be understood. With a naive analogy it basically works on ping-pong principle. You send a request in a packet to server and then it send back to you another packet. For instance, logging in to a site is simply sending a post request to server, I managed to do that. However, what I have trouble is to fail clicking on buttons through HTTP post request. I searched for it here and there, but I could not find a valid answer to my inquiry other than utilizing selenium module, which is what I do not want to if there is another way with requests module too. I am also aware of the fact that they created such a module called selenium for a thing.
QUESTIONS:
1) What kind of parameters do I have to take into account for being able to click on buttons or links from the account I accessed through HTTP requests? For instance, when I watch network activity for request header and response header with my browser's built-in inspect tool, I get so many parameters sent back by server, e.g. sec-fetch-dest, sec-fetch-mode, etc.
2) Is it too complicated for a beginner or is there too much advanced stuff going on behind the scene to do that so selenium was created for that reason?
Theoretically, you could write a program to do this with requests, but you would be duplicating much of the functionality that is already built and optimized in other tools and APIs. The general process would be:
Load the HTML that is normally rendered in your browser using a get request.
Process the HTML to find the button in question.
Then, if it's a simple form:
Determine the request method the button will carry out (e.g. using the formmethod argument, see here).
Perform the specified request with the required information in your request packet.
If it's a complex page (i.e. it uses JavaScript):
Find the button's unique identifier.
Process the JavaScript code to determine what action is performed when the button is clicked.
If possible, perform the JavaScript action using requests (e.g. following a link or something like that). I say if possible because JavaScript can do many things that, to my knowledge, simple HTTP request cannot, like changing rendered CSS in order to change the background color of a <div> when a button is clicked.
You are much better off using a tool like selenium or beautiful soup, as they have created APIs that do a lot of the above for you. If you've used the built-in requests library to learn about the basic HTTP request types and how they work, awesome--now move on to the plethora of excellent tools that wrap requests up into a more functional and robust API.

Python: How to retrieve POST parameters from HTML? [duplicate]

I have been using mechanize to fill in a form from a website but this now has changed and some of the required fields seem to be hidden and cannot be accessed using mechanize any longer - when printing all available forms.
I assume it has been modified to use more current methods (application/x-www-form-urlencoded) but I have not found a way to update my script to continue using this form programmatically.
From what I have read, I should be able to send a dict (key/value pair) to the submit button directly rather than filling the form in the first place - please correct me if I am wrong.
BUT I have not been able to find a way to obtain what keys are required...
I would massively appreciate it if someone could point me in the right direction or put me straight in case this is no longer possible.
You cannot, in all circumstances, extract all fields a server expects.
The post target, the code handling the POST, is a black box. You cannot look inside the code that the server runs. The best information you have about what it expects is what the original form tells your browser to post. That original form consists not only of the HTML, but also of the headers that were sent with it (cookies for example) and any JavaScript code that is run by the browser.
In many cases, parsing the HTML sent for the form is enough; that's what Mechanize (or a recent more modern framework like robobrowser) does, plus a little cookie handling and making sure typical headers such as the referrer are included. But if any JavaScript code manipulated the HTML or intercepts the form submission to add or remove data then Mechanize or other Python form parsers cannot replicate that step.
Your options then are to:
Reverse engineer what the Javascript code does and replicate that in Python code. The development tools of your browser can help here; observe what is being posted on the network tab, for example, or use the debugger to step through the JavaScript code to see what it does.
Use an actual browser, controlled from Python. Selenium could do this for you; it can drive a desktop browser (Chrome, Firefox, etc.) or it can be used to drive a headless browser implementation such as PhantomJS. This is heavier on the resources, but will actually run the JavaScript code and let you post a form just as your browser would, in each and every way.

Obtaining required keys for application/x-www-form-urlencoded

I have been using mechanize to fill in a form from a website but this now has changed and some of the required fields seem to be hidden and cannot be accessed using mechanize any longer - when printing all available forms.
I assume it has been modified to use more current methods (application/x-www-form-urlencoded) but I have not found a way to update my script to continue using this form programmatically.
From what I have read, I should be able to send a dict (key/value pair) to the submit button directly rather than filling the form in the first place - please correct me if I am wrong.
BUT I have not been able to find a way to obtain what keys are required...
I would massively appreciate it if someone could point me in the right direction or put me straight in case this is no longer possible.
You cannot, in all circumstances, extract all fields a server expects.
The post target, the code handling the POST, is a black box. You cannot look inside the code that the server runs. The best information you have about what it expects is what the original form tells your browser to post. That original form consists not only of the HTML, but also of the headers that were sent with it (cookies for example) and any JavaScript code that is run by the browser.
In many cases, parsing the HTML sent for the form is enough; that's what Mechanize (or a recent more modern framework like robobrowser) does, plus a little cookie handling and making sure typical headers such as the referrer are included. But if any JavaScript code manipulated the HTML or intercepts the form submission to add or remove data then Mechanize or other Python form parsers cannot replicate that step.
Your options then are to:
Reverse engineer what the Javascript code does and replicate that in Python code. The development tools of your browser can help here; observe what is being posted on the network tab, for example, or use the debugger to step through the JavaScript code to see what it does.
Use an actual browser, controlled from Python. Selenium could do this for you; it can drive a desktop browser (Chrome, Firefox, etc.) or it can be used to drive a headless browser implementation such as PhantomJS. This is heavier on the resources, but will actually run the JavaScript code and let you post a form just as your browser would, in each and every way.

Redirecting all links from scraped webpage

The basic idea is that the web-application fetches an external website and overlays it some JavaScript, for additional functionality.
However the links on the webpage I fetched shouldn't navigate to the external website, but stay on my website. I figured that converting the links with regular expressions (or a similar method) would be inefficient as it would not cover the links dynamically generated, like AJAX-requests or other JavaScript functionality. So basically what I can't seem to find is a method to change/intercept/redirect all links of the scraped website.
So, what is a (good) way to change/intercept the dynamically generated links of a scraped website? Preferably a python-method.
Unless you're changing the URL's on the scraped web page (including dynamic ones), you can't do what you ask.
If a client is served up a web page with a URL pointing to an external site, your website will have no opportunity to intercept this or change it since their browser will navigate away without even going to your site (not strictly true though - read on). Theoretically, you could possibly attach event handlers to all links (before serving up the scraped page), and even intercept dynamically created ones (by parsing their javascript), but this might prove pretty difficult. You also have to stop other methods of the URL changing as well (like header redirection).
Clients themselves can use proxies in their browsers (that affect all outgoing URLs), but this is the client deciding that all traffic should be routed through a proxy server. You can't do this on their behalf (without actually changing the URLs).
EDIT: Since OP removed suggestion of using a web proxy, the answer details change slightly, but the end result is the same. For all practical purposes, it's nearly impossible to do this.
You could try parsing the javascript on the page and be successful for some pages (or possibly with a sophisticated enough script for many typical pages); but throw in one little eval on the page, and you'll need your own javascript engine written in javascript to try to figure out every possible external request on a page. ...and even then you couldn't do it.
Basically, give me a script which someone says can parse any webpage (including javascript) to intercept any external calls, and I'll give you a webpage that this script won't work for. Disclaimer: I'm speaking about intercepting the links, but letting the site function normally after...not just parsing the page to remove all javascript entirely.
Someone else may be able to provide you an answer that works sometimes on some web pages - maybe that would be good enough for your purposes.
Also, have you considered that most javascript on a page isn't embedded, but rather either loaded via <script> tags, or possibly even loaded dynamically, from the original server. I assume you'd want to distinguish "stuff loaded from original server needed to make page function and look correctly", from "stuff loaded from original server for other things". How does your program "know" this?
You could try parsing the page and removing all javascript...but even this would be very difficult, since there are still tricky ways of getting around this.

Retrieve cookie created using javascript in python

I've had a look at many tutorials regarding cookiejar, but my problem is that the webpage that i want to scape creates the cookie using javascript and I can't seem to retrieve the cookie. Does anybody have a solution to this problem?
If all pages have the same JavaScript then maybe you could parse the HTML to find that piece of code, and from that get the value the cookie would be set to?
That would make your scraping quite vulnerable to changes in the third party website, but that's most often the case while scraping. (Please bear in mind that the third-party website owner may not like that you're getting the content this way.)
I responded to your other question as well: take a look at mechanize. It's probably the most fully featured scraping module I know: if the cookie is sent, then I'm sure you can get to it with this module.
Maybe you can execute the JavaScript code in a JavaScript engine with Python bindings (like python-spidermonkey or pyv8) and then retrieve the cookie. Or, as the javascript code is executed client side anyway, you may be able to convert the cookie-generating code to Python.
You could access the page using a real browser, via PAMIE, win32com or similar, then the JavaScript will be running in its native environment.

Categories

Resources