Is it technically possible to FAKE website’s link click referrer when using flask or any other python web framework?
I mean when I open website A and click the link_1, logs should indicate that the click was made from website B.
I am not hoping for a full solution - just give me some start code/ tips of what to look for because I have no idea of how this could be done.
Thanks.
Folks writing Selenium tests come across
such issues often enough.
Use Chrome's dev tools inspect network tab to
see detailed outbound headers sent to the webserver.
Recreate those headers with your requests call.
Pay attention to the Referer: header, and also UA.
Related
Are there any alternatives to Selenium that don't require a web driver or browser to operate? I recently moved my code over to a Google Cloud VM instance, and when I run it there are multiple errors. I've been trying to get it to work for hours but just can't (no luck with PhantomJS, Chrome and GeckoDriver - tried re-downloading browsers, editing the sources.list file e.c.t.).
The page I'm web scraping uses JavaScript to load in numbers, which I was I initially chose Selenium. Everything else works perfectly though!
You could simply use the request library.
https://requests.readthedocs.io/en/master/
https://anaconda.org/anaconda/requests
You would then need to send a GET or POST request to the server.
If you do not know how to generate a proper POST request, simply try to "record" it.
If you have chrome, got to the page you want to navigate, press F12, navigate to the "Network" section and write method:POST into the filter.
Further info here:
https://stackoverflow.com/a/39661536/11971785
At first it is a bit more confusing than selenium, but once you understand it its waaaay better in my opinion.
Also the Java values shown on the page can usually be simply read out of the java code which is returned by your request.
No web driver or anything required and a lot more stable and customizable.
I want to write a python script for a website that requires a login to enable some features and I want to find out, what I need to put in the header of my script requests (e.g. authentication token and other parameters), so they are executed the same way as requests over the browser.
Does wireshark help with this if the website uses HTTPS?
Or is my only option executing a browser script with Selenium after a manual login?
For anyone else with the same issue: you don't need to traffic your traffic from outside the browser. Just...
use Google Chrome
open developer tools
click on the Network Tab
clear the data
and do a request in the tab where the dev-tools are
open
You should see the initial request at the top followed by
subsequent ones (advertising, external image-server etc).
You can
rightclick the initial request, save it as a .har-file and use
something like https://toolbox.googleapps.com/apps/har_analyzer/ to
extract the headers of both or the request and the response.
Now you know what parameters (key and value) you need in your header and can even use submitted values like tokens and cookies in your python script
I am currently trying to write a small bot for a banking site that doesn't supply an API. Nevertheless, the security of the login page seems a little more ingenious than what I'd have expected, since even though I don't see any significant difference between Chrome and Python, it doesn't let requests made by Python through (I accounted for things such as headers and cookies)
I've been wondering if there is a tool to record requests in FireFox/Chrome/Any browser and replicate them in Python (or any other language)? Think selenium, but without the overhead of selenium :p
You can use Selenium web drivers to actually use browsers to make the requests for you.
In such cases, I usually checkout the request made by Chrome from my dev tools "Network" tab. Then I right click on the request and copy the request as cURL to run it on command line to see if it works perfectly. If it does, then I can be certain it can be achieved using Python's requests package.
Look into Phantomjs or casperjs. That is a complete browser that can be programmed using JavaScript
We have developed a web based application, with user login etc, and we developed a python application that have to get some data on this page.
Is there any way to communicate python and system default browser ?
Our main goal is to open a webpage, with system browser, and get the HTML source code from it ? We tried with python webbrowser, opened web page succesfully, but could not get source code, and tried with urllib2, in that case, i think we have to use system default browser's cookie etc, and i dont want to this, because of security.
https://pypi.python.org/pypi/selenium
You can try to use Selenium, he was done for testing, but nothing prevents you from using it for other purposes
If your web site is navigable without Javascript, then you could try Mechanize or zope.testbrowser. These tools offer a higher level API than urllib2, letting you do things like follow links on pages and fill out HTML forms.
This can be helpful in navigating a site that uses cookie based authentication with HTML forms for login, for example.
Have a look at the nltk module---they have some utilities for looking at web pages and getting text. There's also BeautifulSoup, which is a bit more elaborate. I'm currently using both to scrape web pages for a learning algorithm---they're pretty widely used modules, so that means you can find lots of hints here :)
I'd like to retrieve data from a specific webpage by using urllib library.
The problem is that in order to open this page some data should be sent to
the server before. If I do it with IE, i need to update first some checkboxes and
then press "display data" button, which opens the desired page.
Looking into the source code, I see that pressing "display data" submits some kind of
form - there is no specific url address there. I cannot figure out by looking
at the code what paramaters are sent to the server...
I think that maybe the simpler way to do that would be to analyze the communication
between the IE and the webserver after pressing the "display data" button.
If I could see explicitly what IE does, I could mimic it with urllib.
What is the easiest way to do that?
An HTML debugging proxy would be the best tool to use in this situation. As you're using IE, I recommend Fiddler, as it is developed by Microsoft and automatically integrates with Internet Explorer through a plugin. I personally use Fiddler all the time, and it is a really helpful tool, as I'm building an app that mimics a user's browsing session with a website. Fiddler has really good debugging of request parameters, responses, and can even decode encrypted packets.
You can use a web debugging proxy (e.g. Fiddler, Charles) or a browser addon (e.g. HttpFox, TamperData) or a packet sniffer (e.g. Wireshark).