What would be a good way to monitor network requests made from a Python application , in the same way that the browser console does (usually in the "Network" tab)?
Ideally, this would display informations such as:
url requested
headers
payload (if any)
time the request was sent
time the response was received
response headers and payload
timeline
This is mostly for debugging purposes, as the actual requests I want to track are made by third-party imports. A rich console similar to Chrome or Safari network tabs would be the absolute dream obviously, but there might be some "functional equivalents" in command-line mode as well.
Update: using macOS with root access
Without details of the operating system you are using it on, and whether you have root access, it is difficult to give a definitive answer.
However, you should consider using Wireshark (https://www.wireshark.org), which gives you pretty good insights into exactly what traffic is going from your application to the Internet and vice versa.
Related
Is it technically possible to FAKE website’s link click referrer when using flask or any other python web framework?
I mean when I open website A and click the link_1, logs should indicate that the click was made from website B.
I am not hoping for a full solution - just give me some start code/ tips of what to look for because I have no idea of how this could be done.
Thanks.
Folks writing Selenium tests come across
such issues often enough.
Use Chrome's dev tools inspect network tab to
see detailed outbound headers sent to the webserver.
Recreate those headers with your requests call.
Pay attention to the Referer: header, and also UA.
I've been trying to write a script that logs into an account and grabs data for the last few days, but I can't manage to get it to login and I always encounter this error message:
Your computer or network may be sending automated queries. To protect
our users, we can't process your request right now.
I assume this is the error message provided by ReCaptcha v2, I'm using a ReCaptcha service, but I even get this error message on my machine locally without or with a proxy.
I've tried different proxies, different proxy sources, headers, user agents, nothing seems to work. I've used requests, and I still get this error message, Selenium and still get this error message and my own browser and still get this error message.
What kind of workaround is there to prevent this?
So I am writing this answer from my general experience with web scraping.
Different web application react differently under different conditions, the solutions I am giving here may not fully solve your problem.
Here are a few work around methodologies:
Use selenium only and set a proper window screen size. Most modern web apps identify users based on window size and user agent. In your case it is not recommended going for other solutions such as requests which do not allow proper handling of window size.
Use a modern valid user agent (Mozilla 5.0 compatible). Usually a Chrome browser > 60.0 UA will work good.
Keep chaining and changing proxies with each interval of xxx requests (depends upon your work load).
Use a single user agent for a specific proxy. If your UA keeps changing for a specific IP, Recaptcha will grab you as automated.
Handle cookies properly. Make sure the cookies set by the server are sent with subsequent requests (for a single proxy).
Use time gap between requests. Use time.sleep() to delay consecutive requests. Usually a time delay of 2 seconds would be enough.
I know this would considerably slow down your work, but Recaptcha is something that which is designated to prevent such automated queries/scraping.
I'm currently working on a web based educational tool where student can look at example of code in a browser and edit the code in the browser. I have been trying to implement a system where by they can interface with a client-side compiler and run/debug the code with in the browser. The more research I do the more I see that browsers are designed against letting this happen because of the security issues that this creates. I was wondering is there any way to run a compiler locally i.e via a extension or ajax or some other method.
The aim is to accommodate as many languages as possible although we are starting of with python.
I'm aware that I could run the script server side and display the output however This is limited in application(to my knowledge), Specifically regarding to GUIs.
I needed to do something like this (though not a compiler) for a project of mine. It had to download and process a resource given its URL into a format that could be read on a kindle. It's not exactly similar to yours since I had a browser plugin (rather than a web page) which triggered the operation and even that was not allowed to "leave" the browser.
In the end, I was forced to write a little app that ran on the client side which the plugin submitted the URL to and then processed.
The setup is something like this
browser plugin (via ajax) <------> web app on client ----> compiler/etc.
The browser sends the code snippet (in your case, a URL in mine) to a web app that runs on the local machine listening on some port (say 9999). It can access local resources and so can actually run the code and then return something to the browser which can then render it.
In my case, the browser sends a JSON string to the web app which just contains a URL. The web app fetches the resource, processes it and converts it into a .mobi file which the kindle can read and then drops it into a directory. The result of the conversion (success/failure) and the location of the converted file is sent back to the browser which informs you that it's done.
I don't think you can write a plugin that directly accesses the compiler. It'll have to communicate with a local app. The setup is complicated for non technical users (look at the README on my project) but it works.
I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. The other option is to create an desktop application which I don't want to do. Is there any way I could deploy my application so that I can get my web-requests through the client side. One way was to use Jython to create an applet but I've read that Java applets can only make web-requests to the server it is deployed on and the only way to to circumvent this is to create a server side proxy which leads us back to the problem of the server's ip getting banned.
This might sounds sound like and impossible situation and I'll probably end up creating a desktop application but I thought I'd ask if anyone knew of an alternate solution.
Thanks.
You can use a signed Java applet, they can use the Java security mechanism to enable access to any site.
This tutorial explains exactly what you have to do: http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-applet.html
The same might be possible from a Flash applet. Javascript is also restricted to the published site and doesn't allow being signed or security exceptions like this, AFAIK.
You probably can use AJAX requests made from JavaScript that is a part of client-side.
Use server → client communication to give commands and necessary data to make a request
…and use AJAX communication from client to 3rd party server then.
This depends on the form of "scraping" you intend to do:
You might run into problems running an AJAX call to a third-party site. Please see Screen scraping through AJAX and javascript.
An alternative would be to do it server-side, but to cache the results so that you don't hit the third-party server unnecessarily.
Check out diggstripper on google code.
I'd like to retrieve data from a specific webpage by using urllib library.
The problem is that in order to open this page some data should be sent to
the server before. If I do it with IE, i need to update first some checkboxes and
then press "display data" button, which opens the desired page.
Looking into the source code, I see that pressing "display data" submits some kind of
form - there is no specific url address there. I cannot figure out by looking
at the code what paramaters are sent to the server...
I think that maybe the simpler way to do that would be to analyze the communication
between the IE and the webserver after pressing the "display data" button.
If I could see explicitly what IE does, I could mimic it with urllib.
What is the easiest way to do that?
An HTML debugging proxy would be the best tool to use in this situation. As you're using IE, I recommend Fiddler, as it is developed by Microsoft and automatically integrates with Internet Explorer through a plugin. I personally use Fiddler all the time, and it is a really helpful tool, as I'm building an app that mimics a user's browsing session with a website. Fiddler has really good debugging of request parameters, responses, and can even decode encrypted packets.
You can use a web debugging proxy (e.g. Fiddler, Charles) or a browser addon (e.g. HttpFox, TamperData) or a packet sniffer (e.g. Wireshark).