capture http calls and headers in python - python

We are testing videos in our website, and in order to play it should authenticate the user, get the authorization for the device he is playing and so on, check his entitlements etc.,
we have many varieties of network and video types to test. And I am in process of writing script which checks one of those calls are working fine for all type of videos.
Which is a POST call and need to build the param/data to post. Theres no direct way of getting one of the param value and this is how we do currently. We go to browser and play the video. Open the dev tools like firebug and capture the param value from the request header of the same call and use it in my script for rest of other 100 different calls to verify programmatically.
Is there a way in python to do the steps which we are manually doing? like open a url and capture all the calls which happens at the background just like how firebug does?
I am trying firePython, firelogger, mechanize to see if they help . but I have invested so much time figuring out doing this, so thought its time to approach some expert advice.

If you haven't looked at the Requests library, it's generally quite pleasant to work with and might make your life easier.

Related

How can i take screenshots of a website while using python requests?

I currently scrape a website and take a screenshot when a certain case happens.
I want to consume less bandwidth so im trying to do it via Requests.
I cant figure out how will i take screenshots but i thought of a workaround which:
Once the certain case happens - it will open chrome as usual - take screenshot - close chrome.
Any smarter way im not thinking about?
Thanks!
Request is a library for making HTTP requests. You can't "take a screenshot" with it, it makes no sense.
Maybe try Selenium instead.

Login and Clicking Buttons with Python "requests" Module

I have been playing with requests module on Python for a while as part of studying HTTP requests/responses; and I think I grasped most of the fundamental things on the topic that are supposed to be understood. With a naive analogy it basically works on ping-pong principle. You send a request in a packet to server and then it send back to you another packet. For instance, logging in to a site is simply sending a post request to server, I managed to do that. However, what I have trouble is to fail clicking on buttons through HTTP post request. I searched for it here and there, but I could not find a valid answer to my inquiry other than utilizing selenium module, which is what I do not want to if there is another way with requests module too. I am also aware of the fact that they created such a module called selenium for a thing.
QUESTIONS:
1) What kind of parameters do I have to take into account for being able to click on buttons or links from the account I accessed through HTTP requests? For instance, when I watch network activity for request header and response header with my browser's built-in inspect tool, I get so many parameters sent back by server, e.g. sec-fetch-dest, sec-fetch-mode, etc.
2) Is it too complicated for a beginner or is there too much advanced stuff going on behind the scene to do that so selenium was created for that reason?
Theoretically, you could write a program to do this with requests, but you would be duplicating much of the functionality that is already built and optimized in other tools and APIs. The general process would be:
Load the HTML that is normally rendered in your browser using a get request.
Process the HTML to find the button in question.
Then, if it's a simple form:
Determine the request method the button will carry out (e.g. using the formmethod argument, see here).
Perform the specified request with the required information in your request packet.
If it's a complex page (i.e. it uses JavaScript):
Find the button's unique identifier.
Process the JavaScript code to determine what action is performed when the button is clicked.
If possible, perform the JavaScript action using requests (e.g. following a link or something like that). I say if possible because JavaScript can do many things that, to my knowledge, simple HTTP request cannot, like changing rendered CSS in order to change the background color of a <div> when a button is clicked.
You are much better off using a tool like selenium or beautiful soup, as they have created APIs that do a lot of the above for you. If you've used the built-in requests library to learn about the basic HTTP request types and how they work, awesome--now move on to the plethora of excellent tools that wrap requests up into a more functional and robust API.

Big requests issue: GET doesnt release/reset TCP connections, loop crashes

im using python3.3 and the requests module to scrape links from an arbitrary webpage. My program works as follows: I have a list of urls which in the beginning has just the starting url in it.
The program loops over that list and gives the urls to a procedure GetLinks, where im using requests.get and Beautifulsoup to extract all links. Before that procedure appends links to my urllist it gives them to another procedure testLinks to see whether its an internal, external or broken link. In the testLinks im using requests.get too to be able to handle redirects etc.
The program worked really well so far, i tested it with quite some wesites and was able to get all links of pages with like 2000 sites etc. But yesterday i encountered a problem on one page, by looking on the Kaspersky Network Monitor. On this page some TCP connections just dont reset, it seems to me that in that case, the initial request for my first url dont get reset, the connection time is as long as my program runs.
Ok so far. My first try was to use requests.head instead of .get in my testLinks procedure. And then everything works fine! The connections are released as wanted. But the problem is, the information i get from requests.head is not sufficient, im not able to see the redirected url and how many redirects took place.
Then i tried requests.head with
allow_redirects=True
But unfortunately this is not a real .head request, it is a usual .get request. So i got the same problem. I also tried to use to set the parameter
keep_alive=False
but it didnt work either. I even tried to use urllib.request(url).geturl() in my testLinks for redirect issues, but here the same problem occurs, the TCP connections dont get reset.
I tried so much to avoid this problem, i used request sessions but it also had the same problem. I also tried a request.post with the header information Connection: close but it didnt worked.
I analyzed some links where i think it gets struck and so far i believe it has something to do with redirects like 301->302. But im really not sure because on all the other websites i tested it there mustve been such a redirect, they are quite common.
I hope someone can help me. For Information im using a VPN connection to be able to see all websites, because the country im in right now blocks some pages, which are interesting for me. But of course i tested it without the VPN and i had the same problem.
Maybe theres a workaround, because request.head in testLinks is sufficient if i just would be able in case of redirects to see the finnish url and maybe the number of redirects.
If the text is not well readable, i will provide a scheme of my code.
Thanks alot!

How to run a python program in realtime?

My python program basically submits a form by loading a URL. There is a security code that seems to change every second so that you have to actually be on the website to enter the form.
For example,
http://www.locationary.com/prizes/index.jsp?ACTION_TOKEN=index_jsp$JspView$BetAction&inTickets=125000000&inSecureCode=091823021&inCampaignId=3060745
The only solution I can think of is using something like Selenium...I don't know any other way of kind of simulating a web browser but not really having it be as heavy and slow as a web browser...any ideas? Or is there a way I can do this without browser automation?
EDIT:
Response to first answer: I DID get the security code using urllib...the problem is that it seems to already have changed by the time I try to load my submission url...so I'm just guessing/assuming that you have to do it in realtime...
Yes, you'll need to get the security code programmatically since it changes every time. You can do this manually with urllib, or you can use mechanize or Selenium to make things easier.

Capturing browser specific rendering of a webpage?

Is there any way to capture (image, pdf etc) how a webpage will look like in lets say chrome or I.E? I am guessing there will be different ways to do this for different browsers but is there any API, library or addon that does this?
Use selenium webdriver (has a python api) to remote control a browser and take a screenshot. Supports all major browsers as far as I'm aware.
Yes there are few wonderful websites providing this service and also some kinds of primitive to some advanced API services for capturing browser screenshots.
Browsershots.org
Its quite slow most of the times, may be due to the heavy traffic it has to withstand. However its one of the best screenshots provider.
[LINK]http://browsershots.org/xmlrpc/ Check this url to understand how to use the XMLRPC based API for browsershots.
And if you want some primitive and straight forward type thumbnailing services, may be the following sites work good for you.
http://www.thumbalizr.com/
http://api1.thumbalizr.com/?url=http://acpmasquerade.com&width=some_width
I checked another website webshotspro.com and when I queued one for a snapshot, it said my queue was behind 7053 other requests. the loading icon keeps rotating :P
Give a try with the XMLRPC call from Browsershots.org

Categories

Resources