I'm trying to log into Instagram using Python Requests. I figured it would be as simple as creating a requests.Session object and then sending a post request i.e.
session.post(login_url, data={'username':****, 'password':****})
This didn't work. I didn't know why so I tried manually entering the browsers headers (I used Chrome dev tools to see the headers of the post request) and passing them along with the request (headers={...}) even though I figured the session would deal with that. I tried sending a get request to the login URL first in order to get a cookie (and CSRF token I think) then doing the steps mentioned before. None of this worked.
I dont have much experience at all with this type of thing and I just dont understand what differentiates my post requests from google chromes (I must be doing something wrong). Thanks
Related
i am confused on this particular topic, i built a bot for two different websites making use of python's requests module to manually simulate the sending of HTTP PoST and GET requests.
I implemented socks proxies and also used user agents in my requests as well as referrer URL;s when neccesary (i verified actual requests sent by a browser when on these sites using burpsuite) in order to make it look genuine.
However, any accounts i run through my bots keep getting suspended. It got me wondering what i'm doing wrong, a friend suggested that maybe i should use one of these headless solutions(phantomJS) and i am leaning towards that route but i am still confused and would like to know what the difference is between using HTTP requests module and using headless browser like phantomJS.
I am not sure if there is any need to paste my source code here. Just looking for some direction on this project. thank you for taking your time to read such a long wall of text :)
Probably, you have to set cookies.
To make your requests more genuine, you should set other headers such as Host and Referer. However, the Cookies header should change every time. You can get them in this way:
from requests import Session
with Session() as session:
# Send request to get cookies.
response = session.get('your_url', headers=your_headers, proxies=proxies) # eventually add params keyword
cookies = response.cookies.get_dict()
response = session.get('your_url', headers=your_headers, cookies=cookies, proxy=proxy)
Or maybe, the site is scanning for bots in some way.
In this case, you could try to add a delay between requests with time.sleep(). You can see timings in Dev Tools on your browser. Alternatively, you could emulate all the requests you send when you connect to the site on your browser, such as ajax scripts, etc.
In my experience, using requests or using Selenium webdrivers doesn't make much difference in terms of detection, because you can't access headers and even request and response data. Also, note that Phantom Js is no longer supported. It's preferred to use headless Chrome instead.
If none of requests approach doesn't work, I suggest using Selenium-wire or Mobilenium, modified versions of Selenium, that allow accessing requests and response data.
Hope it helps.
I am trying to web scrape the a piece of news. I try to login into the website by python so that I can have full access to the whole web page. But I have looked at so many tutorials but still fail.
Here is the code. Can anyone tell me why.
There is no bug in my code. But I still can not see the full text, which means I am still not log in.
`
url='https://id.wsj.com/access/pages/wsj/us/signin.html?mg=id-wsj&mg=id-wsj'
payload={'username':'my_user_name',
'password':'******'}
session=requests.Session()
session.get(url)
response=session.post(url,data=payload)
print(response.cookies)
r=requests.get('https://www.wsj.com/articles/companies-push-to-repeal-amt-after-senates-last-minute-move-to-keep-it-alive-1512435711')
print(r.text)
`
Try sending your last GET request using the response variable. After all, it's the one who made the login and holds the cookies (if there are any). You've used a new requests object for your last request thus ignoring the login you just made.
From this question, the last responder seems to think that it is possible to use python to open a webpage, let me sign in manually, go through a bunch of menus then let the python parse the page when I get where I want. The website has a weird sign in procedure so using requests and passing a user name and password will not be sufficient.
However it seems from this question that it's not a possibility.
SO the question is, is it possible? if so, do you know of some example code out there?
The way to approach this problem is when you login normally have the developer tools next to you and see what the request is sending.
When logging in to bandcamp the XHR request that's being sent is the following:
From that response you can see that an identity cookie is being sent. That's probably how they identify that you are logged in. So when you've got that cookie set you would be authorized to view logged in pages.
So in your program you could login normally using requests, save the cookie in a variable and then apply the cookie to further requests using requests.
Of course login procedures and how this authorization mechanism works may differ, but that's the general gist of it.
So when do you actually need selenium? You need it if a lot of the things are being rendered by javascript. requests is only able to get the html. So if the menus and such is rendered with javascript you won't ever be able to see that information using requests.
I am using this crhym3/simpleauth for oauth authentication with Google, Linkedin and Twitter in my project. It uses GAE's urlfetch.
Google is planning to change the behaviour of urlfetch in late April. I reproduce their notice here:
Currently, the URL Fetch service preserves your original HTTP method
(e.g., GET, POST) when it receives and responds to a 302 Moved
Temporarily response. Modern user agents typically issue a GET request
in response to a 302. After the update, URL Fetch will only issue a
GET request after receiving a 302 response, rather than preserving the
original method. This may cause requests to be routed differently
and/or return 404s or other errors, and will drop the message body
from POST requests.
I have posted a question on the project's forum but I haven't got a reply yet.
My question is:
What is the best way to test this piece of software is safe from the change? I am thinking of adding follow_redirects=False to the urlfetch calls to see what redirections I get from google, linkedin and twitter.
They are just following the specifications. I'm pretty sure that all of them (google, linkedin and twitter) are accepts GET request after redirect as its defined in the specifications.
So I think that you don't need to do anything.
I am wondering on how to log in to specific site, however no luck so far.
The way it happens on browser is that you click on button, it triggers jQuery AJAX request to /ajax/authorize_ajax.html with post variables login and pass. When it returns result = true it reloads document and you are logged in.
When I go to /ajax/authorize_ajax.html on my browser it gives me {"data": [{"result":false}]} in response. Using C# I did went to this address and posted login and pass and it gave me {"data": [{"result":true}]} in response. However then, of course, when I go back to main folder of the website I'm not logged in.
Can anyone help me solve this problem? I think that cookies are set via javascript, is it even possible in that case? I did some research and all I could do is this, please help me to get around with this problem. Used urllib in python and web libraries in .NET.
EDIT 0
It is setting cookie in response headers. SID, PATH & DOMAIN.
Example: sid=bf32b9ff0dfd24059665bf1d767ad401; path=/; domain=site
I don't know how to save this cookie and go back to / using this cookie. I've never done anything like this before, can someone give me some example using python?
EDIT 1
All done, thanks to this post - How to use Python to login to a webpage and retrieve cookies for later usage?
Here's a blog post I did a while ago about using an HttpWebRequest to post to a site when cookies are involved:
http://crazorsharp.blogspot.com/2009/06/c-html-screen-scraping-part-2.html
The idea is, when you get a Response using the HttpWebRequest, you can get access to the Cookies that are sent down. For every subsequent request, you can new up a CookieContainer on the request object, and add the cookies that you got into that container.