Cookie authentication with Python requests - python

I am trying to mimic a user action on a site programmatically using Python requests API.
to accomplish this programmatically the request must have user/pass authentication and also should pass few NVPs as Cookies in Header.
To get the NVPs I initially make a dummy request and the server returns me the cookies.
I acquire the required values from these cookies and use this to send the actual request.
But the request doesn't succeeds and server complains I am not logged in.
But if I use the cookie value from my browser the request succeeds.
The the dummy request to programmatically acquire JSESSIONID,glide_user and glide_user_session params in cookie is
response = requests.get('http://example.com/make_dummy_get',auth=('username','pasword'))
cookie_params = response.cookies.items()
below is the actual request
headers = {
'Host': 'example.com'
,'Connection': 'keep-alive'
,'Content-Length': 113
,'Cache-Control': 'max-age=0'
,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
,'Origin': 'example.com'
,'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'
,'Content-Type': 'application/x-www-form-urlencoded'
,'Referer': 'www.example.com/asdas/'
,'Accept-Encoding': 'gzip,deflate,sdch'
,'Accept-Language': 'en-US,en;q=0.8'
,'Cookie': 'JSESSIONID=B6F7371A11825472CAB0366A4DCDD8EFB; glide_user="SC:Z3Vlc3Q=:b890b38b7f000001121dbe81a08c413ca5"; glide_user_session="SC:Z3Vlc3Q=:b890b38b7f000001121dbe81a08c413ca5"'
}
form_data = {
'param1': 'value1'
,'param2': 'value2'
,'param3': 'value3'
}
res = requests.post('http://example.com/make_post_request',auth=('username','pasword'),data=form_data,headers = headers)
It seems to me that the session created by my dummy request for some reason is getting closed
and Hence the second request is rejected and html response says I must login to access the requested resource.
I did the same exercise with Java apache's HttpClient and ended with the same issue.What am I missing here to make the request succeed without any login or authentication issues?

First you should be using a Session object from requests. This will manage cookies (and prepare them for you) so you do not have to create the cookie header for yourself.
s = requests.Session()
s.get('http://example.com/make_dummy_get',auth=('username','pasword'))
print(s.cookies)
Next I have to strongly advise you to stop setting the following headers:
Host
Content-Length
Content-Type
Cookie
All four of those headers will be generated by requests for you. The Cookie header will be generated using the CookieJar that the Session uses. The Content-Length and Content-Type will be computed while requests prepares the body.
Also, if you're trying to use cookies to authenticate, the server is likely becoming confused because you're also passing auth=('username', 'password') in your second request. That's generating an authorization header so you're both sending a Cookie header and an Authorization header. The server sees this as suspicious most likely and rightly refuses to accept your request as authenicated.

Related

Python requests PUT

I need to send a PUT request with authentication in one time.
When I use Postman for that and input
headers = {'Authorization': 'Basic Token', 'Content-Type': 'application/json'}
Authorization = Basic Auth Username = 'login' Password = 'pass'
Body = data
everything goes well.
If I try to write request in python:
req = r.put(url, headers={'Authorization': 'Basic Token', 'Content-Type': 'application/json'}, auth=HTTPBasicAuth('login','password'), data=data)
I get response 400 Bad Request
Whats wrong with my request?
I don't know if this works for your case, but I did use Basic authentication a while ago to authenticate with the Reddit API.
Here's my code:
import requests
client_auth = requests.auth.HTTPBasicAuth("put something here","put something here")
headers = {"User-Agent": "manage your reddit easily by u/0xff"}
code = "ajkldjfalkdjflajfd;lakdjfa"
data = {
"code":code,
"grant_type":"authorization_code",
"redirect_uri":"http://127.0.0.1:3000/authorize_callback"
}
r = requests.post("https://www.reddit.com/api/v1/access_token", auth=client_auth, data=data, headers=headers);
print(r.content)
Just make the appropriate changes for your case and try it.
You are setting authorization information twice, and different HTTP libraries will handle this conflict in different ways.
HTTP Basic Authorization uses the Authorization header, encoding the username and password (separated by :) as base64 and setting the header to the value Basic plus space plus the base64 encoded string. You are telling both POSTman and requests to set the Authorization header to the string Basic Token and to use a username and password for Basic Auth, so the clients will have to make a choice between these two options.
Trying this out in requests version 2.25.1 I see that the auth information will win here:
>>> from requests import Session, Request
>>> from requests.auth import HTTPBasicAuth
>>> req = Request(
... "PUT",
... "http://example.com",
... headers={
... 'Authorization': 'Basic Token',
... 'Content-Type': 'application/json'
... },
... auth=HTTPBasicAuth('login','password'),
... data=b"{}"
... )
>>> session = Session()
>>> prepped = session.prepare_request(req)
>>> from pprint import pp
>>> pp(dict(prepped.headers))
{'User-Agent': 'python-requests/2.25.1',
'Accept-Encoding': 'gzip, deflate',
'Accept': '*/*',
'Connection': 'keep-alive',
'Authorization': 'Basic bG9naW46cGFzc3dvcmQ=',
'Content-Type': 'application/json',
'Content-Length': '2'}
The above session creates a prepared request so I can inspect the effect of the auth argument on the headers given to the request, and as you can see the Authorization header has been set to a base64 value created from the login and password pair.
It looks like Postman will do the same, the UI even tells you so:
You didn't share any details about what web service you are using or what expectations that service has for headers or request contents. If this a OAuth2-protected service, then you should not confuse obtaining a token with using that token for subsequent requests to protected URLs. For a grant_type="password" token request, it could be that the server expects you to use the username and password in a Basic Auth header, but it may also expect you to use client_id and client_secret values for that purpose and put the username and password in the POST body. You'll need to carefully read the documentation.
Other than that, you could replace your destination URL with an online HTTP echo service such as httpbin. The URL https://httpbin.org/put will give you a JSON response with the headers that the service received as well as the body of your request.
Further things you probably should be aware of:
requests can encode JSON data for you if you use the json argument, and if you do, the Content-Type header is generated for you.
You don't need to import the HTTPBasicAuth object, as auth=(username, password) (as a tuple) works too.

Python webscrape from company sharepoint

I need to scrape data from my company's Sharepoint site using Python, but I am stuck at the authentication phase. I have tried using HttpNtlmAuth from requests_ntlm, HttpNegotiateAuth from requests_negotiate_sspi, mechanize and none worked. I am new to web scraping and I have been stuck on this issue for a few days already. I just need to get the HTML source so I can start filtering for the data I need. Please anyone give me some guidance on this issue.
Methods I've tried:
import requests
from requests_negotiate_sspi import HttpNegotiateAuth
# this is the security certificate I downloaded using chrome
cert = 'certsharepoint.cer'
response = requests.get(
r'https://company.sharepoint.com/xxx/xxx/xxx/xxx/xxx.aspx',
auth=HttpNegotiateAuth(),
verify=cert)
print(response.status_code)
Error:
[X509: NO_CERTIFICATE_OR_CRL_FOUND] no certificate or crl found (_ssl.c:4293)
Another method:
import sharepy
s = sharepy.connect("https://company.sharepoint.com/xxx/xxx/xxx/xxx/xxx.aspx",
username="username",
password="password")
Error:
Invalid Request: AADSTS90023: Invalid STS request
There seems to be a problem with the certificate in the first method and researching the Invalid STS request does not bring up any solutions that work for me.
Another method:
import requests
from requests_ntlm import HttpNtlmAuth
r = requests.get("http://ntlm_protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))
Error:
403 FORBIDDEN
Using requests.get with headers like so:
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML, like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
auth = HttpNtlmAuth(username = username,
password = password)
responseObject = requests.get(url, auth = auth, headers=headers)
returns a 200 response, whereas using requests.get without headers would return a 403 forbidden response. The returned HTML however is of no use, because it's the HTML for this page:
Moreover, removing the auth parameter from requests.get responseObject = requests.get(url, headers=headers) does not change anything, as in it still returns a 200 response with the same HTML for the "We can't sign you in" page.
If doing this interactively, try using Selenium. https://selenium-python.readthedocs.io/ with webdriver_manager (so you can skip having to download the web browser driver). https://pypi.org/project/webdriver-manager/. Selenium will not only allow you to authenticate to your tenant interactively, but also makes it possible to collect dynamic content that may require interaction after loading the page: like pushing a button to reveal a table.
I managed to connect to my company's sharepoint by using https://pypi.org/project/sharepy/2.0.0b1.post2/ instead of https://pypi.org/project/sharepy/
Using the current release of sharepy (1.3.0) and this code:
s = sharepy.connect("https://company.sharepoint.com",
username=username,
password=password)
responseObject = (s.get("https://company.sharepoint.com/teams/xxx/xxx/xxx.aspx"))
i got this error:
Authentication Failure: AADSTS50126: Error validating credentials due to invalid username or password
BUT using sharepy 2.0.0b1.post2 with the same code returns no error and successfully authenticates to sharepoint.

Login into Duolingo using Python Requests

I want to land on the main (learning) page of my Duolingo profile but I am having a little trouble finding the correct way to sign into the website with my credentials using Python Requests.
I have tried making requests as well as I understood them but I am pretty much a noob in this so it has all went in vain thus far.
Help would be really appreciated!
This is what I was trying by my own means by the way:
#The Dictionary Keys/Values and the Post Request URL were taken from the Network Source code in Inspect on Google Chrome
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/81.0.4044.138 Safari/537.36'
}
login_data =
{
'identifier': 'something#email.com',
'password': 'myPassword'
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
s.post(url, headers = headers, params = login_data)
r = s.get("https://www.duolingo.com/learn")
print(r.content)
The post request receives the following content:
b'{"details": "Malformed JSON: No JSON object could be decoded", "error": "BAD_REQUEST_SCHEMA"}'
And since the login fails, the get request for the learn page receives this:
b'<html>\n <head>\n <title>401 Unauthorized</title>\n </head>\n <body>\n <h1>401
Unauthorized</h1>\n This server could not verify that you are authorized to access the document you
requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not
understand how to supply the credentials required.<br/><br/>\n\n\n\n </body>\n</html>'
Sorry if I am making any stupid mistakes. I do not know a lot about all this. Thanks!
If you inspect the POST request carefully you can see that:
accepted content type is application/json
there are more fields than you have supplied (distinctId, landingUrl)
the data is sent as a json request body and not url params
The only thing that you need to figure out is how to get distinctId then you can do the following:
EDIT:
Sending email/password as json body appears to be enough and there is no need to get distinctId, example:
import requests
import json
headers = {'content-type': 'application/json'}
data = {
'identifier': 'something#email.com',
'password': 'myPassword',
}
with requests.Session() as s:
url = "https://www.duolingo.com/2017-06-30/login?fields="
# use json.dumps to convert dict to serialized json string
s.post(url, headers=headers, data=json.dumps(data))
r = s.get("https://www.duolingo.com/learn")
print(r.content)

requests.post from python script to my Django website hosted using Apache giving 403 Forbidden

My Django website is hosted using Apache server. I want to send data using requests.post to my website using a python script on my pc but It is giving 403 forbidden.
import json
url = "http://54.161.205.225/Project/devicecapture"
headers = {'User-Agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'content-type': 'application/json'}
data = {
"nb_imsi":"test API",
"tmsi1":"test",
"tmsi2":"test",
"imsi":"test API",
"country":"USA",
"brand":"Vodafone",
"operator":"test",
"mcc":"aa",
"mnc":"jhj",
"lac":"hfhf",
"cellIid":"test cell"
}
response = requests.post(url, data =json.dumps(data),headers=headers)
print(response.status_code)
I have also given permission to the directory containing the views.py where this request will go.
I have gone through many other answers but they didn't help.
I have tried the code without json.dumps also but it isn't working with that also.
How to resolve this?
After investigating it looks like the URL that you need to post to in order to login is: http://54.161.205.225/Project/accounts/login/?next=/Project/
You can work out what you need to send in a post request by looking in the Chrome DevTools, Network tab. This tells us that you need to send the fields username, password and csrfmiddlewaretoken, which you need to pull from the page.
You can get it by extracting it from the response of the first get request. It is stored on the page like this:
<input type="hidden" name="csrfmiddlewaretoken" value="OspZfYQscPMHXZ3inZ5Yy5HUPt52LTiARwVuAxpD6r4xbgyVu4wYbfpgYMxDgHta">
So you'll need to do some kind of Regex to get it. You'll work it out.
So first you have to create a session. Then load the login page with a get request. Then send a post request with your login credentials to that same URL. And then your session will gain the required cookies that will then allow you to post to your desired URL. This is an example below.
import requests
# Create session
session = requests.session()
# Add user-agent string
session.headers.update({'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'})
# Get login page
response = session.get('http://54.161.205.225/Project/accounts/login/?next=/Project/')
# Get csrf
# Do something to response.text
# Post to login
response = session.post('http://54.161.205.225/Project/accounts/login/?next=/Project/', data={
'username': 'example123',
'password': 'examplexamplexample',
'csrfmiddlewaretoken': 'something123123',
})
# Post desired data
response = session.post('http://url.url/other_page', data={
'data': 'something',
})
print(response.status_code)
Hopefully this should get you there. Good luck.
For more information check out this question on requests: Python Requests and persistent sessions
I faced that situation many times
The problems were :
54.161.205.225 is not added to allowed hosts in settings.py
the apache wsgi is not correctly configured
things might help with debug :
Check apache error-logs to investigate what went wrong
try running server locally and post to it to make sure prob is not related to apache

Extremely strange Web-Scraping issue: Post request not behaving as expected

I'm attempting to programmatically submit some data to a form on our company's admin page rather than doing it by hand.
I've written numerous other tools which scrape this website and manipulate data. However, for some reason, this particular one is giving me a ton of trouble.
Walking through with a browser:
Below are the pages I'm attempting to scrape and post data to. Note, that these pages usually show up in js shadowboxes, however, it functions fine with Javascript disabled, so I'm assuming that javascript is not an issue with regards to the scraper trouble.
(Note, since this is a company page, I've filled I've replaced all the form fields with junk titles, so, for instance, the client numbers are completely made-up)
Also, being that it is a company page behind a username/password wall, I can't give out the website for testing, so I've attempted in inject as much detail as possible into this post!
Main entry point is here:
From this page, I click "Add New form", which opens this next page in a new tag (since javascript is disabled).
On this page, I fill out the small form, click submit, which then gets the next page displaying a success message.
Should be simple, right?
Code attempt 1: Mechanize
import mechanize
import base64
import cookielib
br = mechanize.Browser()
username = 'USERNAME'
password = 'PASSWORD'
br.addheaders.append(('Authorization',
'Basic %s' % base64.encodestring('%s:%s' % (username, password))))
br.addheaders = [('User-agent',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML,'
' like Gecko) Chrome/25.0.1364.172 Safari/537.22')]
br.open('www.our_company_page.com/adm/add_forms.php')
links = [link for link in br.links()]
# Follow "Add a form" Link
response = br.follow_link(links[0])
br.select_form(nr=0)
br.form.set_all_readonly(False)
br.form['formNumber'] = "FROM_PYTHON"
br.form['RevisionNumber'] = ['20']
br.form['FormType'] = ['H(num)']
response = br.submit()
print response.read() #Shows the exact same page! >:(
So, as you can see, I attempt to duplicate the steps that I would take in a browser. I load the initial /adm/forms page, follow the first link, which is Add a Form, and fill out the form, and click the submit button. But here's where it get screwy. The response that mechanize returns is the exact same page with the form. No error messages, no success messages, and when I manually check our admin page, no changes have been made.
Inspecting Network Activity
Frustrated, I opened Chrome and watched the network tab as I manually filed out and submitted the form in the browser.
Upon submitting the form, this is the network activity:
Seems pretty straight forward to me. There's the post, and then a get for the css files, and another get for the jquery library. There's another get for some kind of image, but I have no idea what that is for.
Inspecting the details of the POST request:
After some Googling about scraping problems, I saw a suggestion that the server may be expecting a certain header, and the I should simply copy everything that gets made in the POST request and then slowly take away headers until I figure out which one was the important one. So I did just that, copied every bit of information in the Network tab and stuck in my post request.
Code Attempt 2: Urllib
I had some trouble figuring out all of the header stuff with Mechanize, so I switched over to urllib2.
import urllib
import urllib2
import base64
url = 'www.our_company_page.com/adm/add_forms.php'
values = {
'SID':'', #Hidden field
'FormNumber':'FROM_PYTHON1030PM',
'RevisionNumber':'5',
'FormType':'H(num)',
'fsubmit':'Save Page'
}
username = 'USERNAME'
password = 'PASSWORD'
headers = {
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset' : 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding' : 'gzip,deflate,sdch',
'Accept-Language' : 'en-US,en;q=0.8',
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Authorization': 'Basic %s' % base64.encodestring('%s:%s' % (username, password)),
'Cache-Control' : 'max-age=0',
'Connection' : 'keep-alive',
'Content-Type' : 'application/x-www-form-urlencoded',
'Cookie' : 'ID=201399',
'Host' : 'our_company_page.com',
'Origin' : 'http://our_company_page.com',
'Referer' : 'http://our_company_page.com/adm/add_form.php',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, '
'like Gecko) Chrome/26.0.1410.43 Safari/537.31'
}
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
print response.read()
As you can see, I added header present in Chrome's Network tab to the POST request in urllib2.
One addition change from the Mechainze version is that I now access the add_form.php page directly by adding the relevant cookie to my Request.
However, even with duplication everything I can, I still have the exact same issue: The response is the exact same page I started on -- no errors, no success messages, no changes on the server, just returned to a blank form.
Final Step: Desperation sits in, I install WireShark
Time to do some traffic sniffing. I'm determined to see WTF is going on in this magical post request!
I download, install, and fire up Wireshark. I filter for http, and then first submit the form manually in the browser, and then run my code with attempts to submit the form programmatically.
This is the network traffic:
Browser:
Python:
Aside from the headers being in a slightly different order (does that matter), they look exactly the same!
So that's where I am, completely confused as to why a post request, which is (as far as I can tell) nearly identical to the one made by the browser, isn't making any changes on the server.
Has anyone ever encountered anything like this? Am I missing something obvious? What's going on here?
Edit
As per Ric's suggestion, I replicated the POST data exactly. I copies it directly from the Network Source tab in Chrome.
Modified code looks as follows
data = 'SegmentID=&Segment=FROMPYTHON&SegmentPosition=1&SegmentContains=Sections&fsubmit=Save+Page'
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
print response.read()
The only thing I changed was the Segment value from FROMBROWSER to FROMPYTHON.
Unfortunately, this still yields the same result. The response is the same page, I started from.
Update
working, but not solved
I checked out the requests library, duplicated my efforts using their API, and lo' magically it worked! The POST actually went through. The question remains: why!? I again took another snapshot with wireshark, and as near as I can tell it is exactly the same as the POST made from the browser.
The Code
def post(eventID, name, pos, containsID):
segmentContains = ["Sections", "Products"]
url = 'http://my_site.com/adm/add_page.php'
cookies = dict(EventID=str(eventID))
payload = { "SegmentID" : "",
"FormNumber" : name,
"RevisionNumber" : str(pos),
"FormType" : containsID,
"fsubmit" : "Save Page"
}
r = requests.post(
url,
auth=(auth.username, auth.password),
allow_redirects=True,
cookies=cookies,
data=payload)
Wireshark output
Requests
Browser
So, to summarize the current state of the question. It works, but I nothing has really changed. I have no idea why attempts with both Mechanize and urllib2 failed. What is going on that allows that requests POST to actually go through?
Edit -- Wing Tang Wong suggestion:
At Wing Tand Wongs suggestion, I created a cookie handler, and attached that to the urllib.opener. So no more cookies are being send manually in the headers -- in fact, I don't assign anything at all now.
I first connect to the adm page with has the link to the form, rather than connecting to the form right away.
'http://my_web_page.com/adm/segments.php?&n=201399'
This gives the ID cookie to my urllib cookieJar. From this point I follow the link to the page that has the form, and then attempt to submit to it as usual.
Full Code:
url = 'http://my_web_page.com/adm/segments.php?&n=201399'
post_url = 'http://my_web_page.com/adm/add_page.php'
values = {
'SegmentID':'',
'Segment':'FROM_PYTHON1030PM',
'SegmentPosition':'5',
'SegmentContains':'Products',
'fsubmit':'Save Page'
}
username = auth.username
password = auth.password
headers = {
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset' : 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding' : 'gzip,deflate,sdch',
'Accept-Language' : 'en-US,en;q=0.8',
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
'Authorization': 'Basic %s' % base64.encodestring('%s:%s' % (username, password)),
'Cache-Control' : 'max-age=0',
'Connection' : 'keep-alive',
'Content-Type' : 'application/x-www-form-urlencoded',
'Host' : 'mt_site.com',
'Origin' : 'http://my_site.com',
'Referer' : 'http://my_site.com/adm/add_page.php',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.43 Safari/537.31'
}
COOKIEFILE = 'cookies.lwp'
cj = cookielib.LWPCookieJar()
if os.path.isfile(COOKIEFILE):
cj.load(COOKIEFILE)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
data = urllib.urlencode(values)
req = urllib2.Request(url, headers=headers)
handle = urllib2.urlopen(req)
req = urllib2.Request(post_url, data, headers)
handle = urllib2.urlopen(req)
print handle.info()
print handle.read()
print
if cj:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
cj.save(COOKIEFILE)
Same thing as before. No changes get made on the server. To verify that the cookies are indeed there, I print them to the console after submitting the form, which gives the output:
These are the cookies we have received so far :
<Cookie EventID=201399 for my_site.com/adm>
So, the cookie is there, and has been sent along side the request.. so still not sure what's going on.
Read and re-read your post and the other folks answers a few times. My thoughts:
When you implemented in mechanize and urllib2, it looks like the cookies were hard coded into the header response. This would most likely cause the form to kick you out.
When you switched to using the web broswer and using the python 'requests' library, the cookies and sessions handling was being taken care of behind the scenes.
I believe that if you change your code to take into account the cookie and session states, ie. presume an automated session at start, has an empty cookie for the site and no session data, but properly tracks and manages it during the session, it should work.
Simple copying and substituting the header data will not work, and a properly coded site should bounce you back to the beginning.
Without seeing the backend code for the website, the above is my observation. Cookies and Session data are the culprit.
Edit:
Found this link: http://docs.python-requests.org/en/latest/
Which describes accessing a site with authentication/etc. The format of the authentication is similar to the Requests implementation you are using. They link to a git source for a urllib2 implementation that does the same thing and I noticed that the authentication bits are different from how you are doing the auth bits:
https://gist.github.com/kennethreitz/973705
from the page:
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, 'user', 'pass')
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
I wonder if you change the way you are implementing the authentication bits for the urllib2 implementation, that it would work.
I think that the PHP script is erroring out and not displaying anything because your form data is not exactly identical. Try replicating a post request to be completely identical including all the values. I see that the line-based text data on your Wireshark screenshot for the browser includes parameters such as SegmentPosition which is 0, but in your Python screenshot does not have a value for SegmentPosition. The format for some of the parameters such as Segment seem different between the Browser and the Python request which may be causing an error as it tries to parse it.

Categories

Resources