Python Selenium WebDriver - Possible to monitor XHR AJAX Calls? - python

I am using Python with Selenium WebDriver to browse through website.
Now I have the problem that I want to monitor the XHR AJAX calls, thrown on the current site.
For example: I am on a specific page and the python selenium script clicks on a button on this site. This website button calls an AJAX request within this site.
Is it possible to monitor this XHR AJAX request and get it in my python script to handle the called AJAX URL?
Thanks!
UPDATE: I exactly want to do sth like this (but in python obviously)
https://dmitripavlutin.com/catch-the-xmlhttp-request-in-plain-javascript/

You can use browser.execute_script to catch the calls as explained in the link that you mentioned. In addition, start a fake website using Django on a separate thread. And in the JavaScript handler (sendReplacement), replace the url with the one of your django server.
In that server you will receive the AJAX call and be able to examine it.
You may be able to implement a simpler solution without the django server by simply monitor the calls directly from the JavaScript snippet and make it return the value you want directly. But if you need to monitor many calls and perform more complex examination is the requests, the former solution is more powerful.

Related

Requests How to use two different web sites and switching them?

Hello is there a way to use two different web site urls and switching them?
I mean i have two different websites like:
import requests
session = request.session()
firstPage = session.get("https://stackoverflow.com")
print("Hey! im in first page now!")
secondPage = session.get("https://youtube.com")
print("Hey! im in second page now!")
i know a way to do it in selenium like this: driver.switch_to.window(driver.window_handles[1])
but i want do it in "Requests" so is there a way to do it?
Selenium and Requests are two fundamentally different services. Selenium is a headless browser which fully simulates a user. Requests is a python library which simply sends HTTP requests.
Because of this, Requests is particularly good for scraping static data and data that does not involve javascript rendering (through jQuery or similar), such as RESTful APIs, which often return JSON formatted data (with no HTML styling, or page rendering at all). With Requests, after the initial HTTP request is made, the data is saved in an object, and the connection is closed.
Selenium allows you to traverse through complex, javascript-rendered menus and the like, since each page is actually built (under the hood) as if it were being displayed to a user. Selenium encapsulates everything that your browser does except displaying the HTML (including the HTTP requests that Requests is built to perform). After connecting to a page with Selenium, the connection remains open. This allows you to navigate through a complex site where you would need the full URL of the final page to use Requests.
Because of this distinction, it makes sense that Selenium would have a switch_to_window method, but Requests would not. The way your code is written, you can access the response to the HTTP get calls which you've made directly though your variables (firstPage contains the response from stackoverflow, secondPage contains the response from youtube). While using Requests, you are never "in" a page in the sense that you can be with Selenium, since it is an HTTP library and not a full headless browser.
Depending on what you're looking to scrape, it might be better to use either Requests or Selenium.

Is there any other way to extract data from dynamic website, rather than using selenium?

I am trying to extract the data from the website https://shop.nordstrom.com/ for all the products (like shirt, t-shirt and so on). The page is dynamically loaded. I know I can use selenium with headless browser, but that is also a time consuming process and looking up on the elements, having strange ID and class names, that is also not so promising.
So I thought of looking up on the Network tool, if I can find the path to the API, from where the data is being loaded (XHR Request) . But I could not find any thing helpful. So is there a way to get the data from the website ?
If you don't want to use selenium then the alternative is to use a web parser like bs4 or use simply the request module.
You are on the right path in finding the call to the API. XHR requests can be seen under the network tab but the multitude of resources that appears makes it intricate to understand the requests being made. A simple way around this is to use the following method:
Instead of Network tab go to the console tab. There click on the settings icon, and then tick just the option Log XMLHTTPRequests.
Now refresh the page and scroll down to initiate dynamic calls. You will now be able to see the logs of all XHR in a more clear way.
For example
(index):29 Fetch finished loading: GET "**https://shop.nordstrom.com/api/recs?page_type=home&placement=HP_SALE%2CHP_TOP_RECS%2CHP_CUST_HIS%2CHP_AFF_BRAND%2CHP_FTR&channel=web&bound=24%2C24%2C24%2C24%2C6&apikey=9df15975b8cb98f775942f3b0d614157&session_id=0&shopper_id=df0fdb2bb2cf4965a344452cb42ce560&country_code=US&experiment_id=945b2363-c75d-4950-b255-194803a3ee2a&category_id=2375500&style_id=0%2C0%2C0%2C0&ts=1593768329863&url=https%3A%2F%2Fshop.nordstrom.com%2F&zip_code=null**".
Making a get request to that URL gives a bunch of Json objects. You can now use this url and others that you can derive to make the request straight to the URL.
See the answer here on how you can integrate the url with a request module to fetch data.

Login and Clicking Buttons with Python "requests" Module

I have been playing with requests module on Python for a while as part of studying HTTP requests/responses; and I think I grasped most of the fundamental things on the topic that are supposed to be understood. With a naive analogy it basically works on ping-pong principle. You send a request in a packet to server and then it send back to you another packet. For instance, logging in to a site is simply sending a post request to server, I managed to do that. However, what I have trouble is to fail clicking on buttons through HTTP post request. I searched for it here and there, but I could not find a valid answer to my inquiry other than utilizing selenium module, which is what I do not want to if there is another way with requests module too. I am also aware of the fact that they created such a module called selenium for a thing.
QUESTIONS:
1) What kind of parameters do I have to take into account for being able to click on buttons or links from the account I accessed through HTTP requests? For instance, when I watch network activity for request header and response header with my browser's built-in inspect tool, I get so many parameters sent back by server, e.g. sec-fetch-dest, sec-fetch-mode, etc.
2) Is it too complicated for a beginner or is there too much advanced stuff going on behind the scene to do that so selenium was created for that reason?
Theoretically, you could write a program to do this with requests, but you would be duplicating much of the functionality that is already built and optimized in other tools and APIs. The general process would be:
Load the HTML that is normally rendered in your browser using a get request.
Process the HTML to find the button in question.
Then, if it's a simple form:
Determine the request method the button will carry out (e.g. using the formmethod argument, see here).
Perform the specified request with the required information in your request packet.
If it's a complex page (i.e. it uses JavaScript):
Find the button's unique identifier.
Process the JavaScript code to determine what action is performed when the button is clicked.
If possible, perform the JavaScript action using requests (e.g. following a link or something like that). I say if possible because JavaScript can do many things that, to my knowledge, simple HTTP request cannot, like changing rendered CSS in order to change the background color of a <div> when a button is clicked.
You are much better off using a tool like selenium or beautiful soup, as they have created APIs that do a lot of the above for you. If you've used the built-in requests library to learn about the basic HTTP request types and how they work, awesome--now move on to the plethora of excellent tools that wrap requests up into a more functional and robust API.

Using python to parse a webpage that is already open

From this question, the last responder seems to think that it is possible to use python to open a webpage, let me sign in manually, go through a bunch of menus then let the python parse the page when I get where I want. The website has a weird sign in procedure so using requests and passing a user name and password will not be sufficient.
However it seems from this question that it's not a possibility.
SO the question is, is it possible? if so, do you know of some example code out there?
The way to approach this problem is when you login normally have the developer tools next to you and see what the request is sending.
When logging in to bandcamp the XHR request that's being sent is the following:
From that response you can see that an identity cookie is being sent. That's probably how they identify that you are logged in. So when you've got that cookie set you would be authorized to view logged in pages.
So in your program you could login normally using requests, save the cookie in a variable and then apply the cookie to further requests using requests.
Of course login procedures and how this authorization mechanism works may differ, but that's the general gist of it.
So when do you actually need selenium? You need it if a lot of the things are being rendered by javascript. requests is only able to get the html. So if the menus and such is rendered with javascript you won't ever be able to see that information using requests.

creating non-reloading dynamic webapps using Django

As far as I know, for a new request coming from a webapp, you need to reload the page to process and respond to that request.
For example, if you want to show a comment on a post, you need to reload the page, process the comment, and then show it. What I want, however, is I want to be able to add comments (something like facebook, where the comment gets added and shown without having to reload the whole page, for example) without having to reload the web-page. Is it possible to do with only Django and Python with no Javascript/AJAX knowledge?
I have heard it's possible with AJAX (I don't know how), but I was wondering if it was possible to do with Django.
Thanks,
You want to do that with out any client side code (javascript and ajax are just examples) and with out reloading your page (or at least part of it)?
If that is your question, then the answer unfortunately is you can't. You need to either have client side code or reload your page.
Think about it, once the client get's the page it will not change unless
The client requests the same page from the server and the server returns and updated one
the page has some client side code (eg: javascript) that updates the page.
You definitely want to use AJAX. Which means the client will need to run some javascript code.
If you don't want to learn javascript you can always try something like pyjamas. You can check out an example of it's HttpRequest here
But I always feel that using straight javascript via a library (like jQuery) is easier to understand than trying to force one language into another one.
To do it right, ajax would be the way to go BUT in a limited sense you can achieve the same thing by using a iframe, iframe is like another page embedded inside main page, so instead of refreshing whole page you may just refresh the inner iframe page and that may give the same effect.
More about iframe patterns you can read at
http://ajaxpatterns.org/IFrame_Call
Maybe a few iFrames and some Comet/long-polling? Have the comment submission in an iFrame (so the whole page doesn't reload), and then show the result in the long-polled iFrame...
Having said that, it's a pretty bad design idea, and you probably don't want to be doing this. AJAX/JavaScript is pretty much the way to go for things like this.
I have heard it's possible with AJAX...but I was
wondering if it was possible to do
with Django.
There's no reason you can't use both - specifically, AJAX within a Django web application. Django provides your organization and framework needs (and a page that will respond to AJAX requests) and then use some JavaScript on the client side to make AJAX calls to your Django-backed page that will respond correctly.
I suggest you go find a basic jQuery tutorial which should explain enough basic JavaScript to get this working.

Categories

Resources