In addition to sending cookies with my python requests request I would also like to send a localstorage key value pair.
I tried looking at the requests docs and it does not look like it is capable in doing this, is it?
Is there a way for me to do this without using a headless browser?
Local storage is not sent with requests. It is accessible via javascript through client-side code. This website has more information about the differences between local storage and cookies.
Related
i am confused on this particular topic, i built a bot for two different websites making use of python's requests module to manually simulate the sending of HTTP PoST and GET requests.
I implemented socks proxies and also used user agents in my requests as well as referrer URL;s when neccesary (i verified actual requests sent by a browser when on these sites using burpsuite) in order to make it look genuine.
However, any accounts i run through my bots keep getting suspended. It got me wondering what i'm doing wrong, a friend suggested that maybe i should use one of these headless solutions(phantomJS) and i am leaning towards that route but i am still confused and would like to know what the difference is between using HTTP requests module and using headless browser like phantomJS.
I am not sure if there is any need to paste my source code here. Just looking for some direction on this project. thank you for taking your time to read such a long wall of text :)
Probably, you have to set cookies.
To make your requests more genuine, you should set other headers such as Host and Referer. However, the Cookies header should change every time. You can get them in this way:
from requests import Session
with Session() as session:
# Send request to get cookies.
response = session.get('your_url', headers=your_headers, proxies=proxies) # eventually add params keyword
cookies = response.cookies.get_dict()
response = session.get('your_url', headers=your_headers, cookies=cookies, proxy=proxy)
Or maybe, the site is scanning for bots in some way.
In this case, you could try to add a delay between requests with time.sleep(). You can see timings in Dev Tools on your browser. Alternatively, you could emulate all the requests you send when you connect to the site on your browser, such as ajax scripts, etc.
In my experience, using requests or using Selenium webdrivers doesn't make much difference in terms of detection, because you can't access headers and even request and response data. Also, note that Phantom Js is no longer supported. It's preferred to use headless Chrome instead.
If none of requests approach doesn't work, I suggest using Selenium-wire or Mobilenium, modified versions of Selenium, that allow accessing requests and response data.
Hope it helps.
I am fairly proficient in Python and have started exploring the requests library to formulate simple HTTP requests. I have also taken a look at Sessions objects that allow me to login to a website and -using the session key- continue to interact with the website through my account.
Here comes my problem: I am trying to build a simple API in Python to perform certain actions that I would be able to do via the website. However, I do not know how certain HTTP requests need to look like in order to implement them via the requests library.
In general, when I know how to perform a task via the website, how can I identify:
the type of HTTP request (GET or POST will suffice in my case)
the URL, i.e where the resource is located on the server
the body parameters that I need to specify for the request to be successful
This has nothing to do with python, but you can use a network proxy to examine your requests.
Download a network proxy like Burpsuite
Setup your browser to route all traffic through Burpsuite (default is localhost:8080)
Deactivate packet interception (in the Proxy tab)
Browse to your target website normally
Examine the request history in Burpsuite. You will find every information you need
i want to access browser name and version in python by sending out a request.is this the ideal method or is there any other way? because all the methods which provide user agents give PythonUserlib2.7 as user agent,i want my actual user agent.
I'll assume you're familiar with HTTP requests and their structure, if not, here's a link to the RFC documentation for HTTP/1.1 requests, at the bottom of the page, there is a list of links to header fields.
The user-agent is a field in the HTTP request header that identifies the entity that sends the request, by entity I mean the program you used to send the request, that's hosted on your machine. Things like the browser type, version and operating system, are sent in the user-agent field.
So, when you use urllib.request to send a request, urllib fills the HTTP request headers with the values you provide to it, otherwise, default values are used. That's why you get PythonUserLib2.7 as a user-agent.
If you need the user-agent of a specific browser, you need to send request using that browser, you can do that in python by using a browser automation tool, like selenium webdriver. Which you can use to launch an instance of your browser, and go to websites.
I've worked only with selenium webdriver, and it doesn't have the capability to inspect sent/received packets/requests, in other words, you can't get HTTP requests/responses directly from selenium.
As a work around, you can use selenium(or any other automation tool) to launch your browser, then go to a website that will give you your user-agent, and may even parse it.
Here's a link to selenium documentation, it explains how to get started with selenium and how to download the required packages.
If you search on google for user-agent online, Google will tell you what's your user agent.
I have a python script that needs to download a csv file from:
https://myasx.asx.com.au/home/watchlist/download.do
The issue I have is you have to log in to the website first, It is Cookie based authentication (HTML form login).
So far I have looked at urllib2 and Requests and haven't had much luck.
The requests library should do what you want. You can use Session objects to persist the authentication.
To quote the request docs -
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance.
Post your code if you are still experiencing problems.
The site that I'm trying to scrape uses js to create a cookie. What I was thinking was that I can create a cookie in python and then use that cookie to scrape the site. However, I don't know any way of doing that. Does anybody have any ideas?
Please see Python httplib2 - Handling Cookies in HTTP Form Posts for an example of adding a cookie to a request.
I often need to automate tasks in web
based applications. I like to do this
at the protocol level by simulating a
real user's interactions via HTTP.
Python comes with two built-in modules
for this: urllib (higher level Web
interface) and httplib (lower level
HTTP interface).
If you want to do more involved browser emulation (including setting cookies) take a look at mechanize. It's simulation capabilities are almost complete (no Javascript support unfortunately): I've used it to build several scrapers with much success.