Log in to website using requests - python

I am currently trying to get data off http://www.spotrac.com/ that requires being signed in. My current attempt uses this code(which I got by going through a bunch of other stack overflow questions on a similar topic)
from bs4 import BeautifulSoup as bs
from requests import session
payload = {
'id': 'contactForm',
'cmd': 'http://www.spotrac.com/signin/submit/',
'email': '*****',
'password': '*****'
}
with session() as c:
r_login = c.post('http://www.spotrac.com/signin/', data=payload)
print(r_login.headers)
response = c.get('http://www.spotrac.com/nba/cleveland-cavaliers/lebron-james')
print(response.cookies)
soup=bs(response.text, 'html.parser')
with open('ex.html','w') as f:
f.write(soup.prettify())
My current code does everything right, except I am not logged in when I'm making the request.
Thanks

You're sending POST request to a wrong URL, and with an incorrect payload as well.
POST http://www.spotrac.com/signin/submit/ HTTP/1.1
Host: www.spotrac.com
Connection: keep-alive
Content-Length: 86
Cache-Control: max-age=0
Origin: http://www.spotrac.com
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: http://www.spotrac.com/signin/
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%2206021e191bdbbaf955f111f67b961056%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485487245%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dd6089620b21ecce6837161605055ae04; _ga=GA1.2.910256341.1481865346; _gali=contactForm
redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 27 Jan 2017 04:21:16 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: cisession=a%3A5%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22badb1275aee1cdad6736a6b4bb1ce809%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A11%3A%22119.9.105.6%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A108%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F55.0.2883.87+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1485490876%3Bs%3A9%3A%22user_data%22%3Bs%3A0%3A%22%22%3B%7Dad486866c32cac526487707cea85b8a9; expires=Fri, 10-Feb-2017 04:21:16 GMT; path=/
Location: http://www.spotrac.com/register/
X-Powered-By: PleskLin
MS-Author-Via: DAV
As you can see from above session, the correct url should be http://www.spotrac.com/signin/submit/, and payload string is redirect=http%3A%2F%2Fwww.spotrac.com%2F&email=sdfs%40gmail.com&password=lkasjdflksjad, which is basically:
payload = {'redirect': 'http://www.spotrac.com/',
'email': mail_address,
'password': password}
Also make sure simulate headers with correct parameters, then you're good to go.

Related

How Send Cookie and Payload to Broker?

I need to send a buy request to my broker from c#.
when i send this request from google chrome, right click > inspect > network > Postdata give me this informations:
General
> Request URL: https://api2.mofidonline.com/Web/V1/Order/Post Request
> Method: POST Status Code: 200 Remote Address: 109.94.166.83:443
> Referrer Policy: strict-origin-when-cross-origin
Response Headers
access-control-allow-credentials: true
access-control-allow-origin: https://www.mofidonline.com
api-supported-versions: 1.0
content-encoding: gzip
content-type: application/json; charset=utf-8
date: Sat, 25 Sep 2021 20:46:10 GMT
set-cookie: TS0182ab0d=0180bb6f2279578c94b3b606e45b8e8836f5652f16390e58a9d78a748f534b2a9f44b16e8f6f38db6302791734e4522bb1cd199d2a; Path=/; Domain=.api2.mofidonline.com
vary: Accept-Encoding
vary: Origin
Request Headers
:authority: api2.mofidonline.com
:method: POST
:path: /Web/V1/Order/Post
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,fa;q=0.8
authorization: BasicAuthentication ca4f8b23-2921-4f51-ab7c-ed9fcb04bb4d
content-length: 372
content-type: application/json
origin: https://www.mofidonline.com
referer: https://www.mofidonline.com/
sec-ch-ua: "Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-site
user-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36
Request Payload
{IsSymbolCautionAgreement: false, CautionAgreementSelected: false, IsSymbolSepahAgreement: false,…}
CautionAgreementSelected: false
FinancialProviderId: 1
IsSymbolCautionAgreement: false
IsSymbolSepahAgreement: false
SepahAgreementSelected: false
isin: "IRT1PLSH0001"
maxShow: 0
minimumQuantity: 0
orderCount: 1000
orderId: 0
orderPrice: 82870
orderSide: 65
orderValidity: 74
orderValiditydate: null
shortSellIncentivePercent: 0
shortSellIsEnabled: false
how can i send this information manual with httpwebrequest ( or any way other)? Please help me...

Unable to log in to a site requiring unconventional payload to be sent with post requests

I'm trying to log in to a website using requests module. While creating a script to do so, I could notice that the payload used in there is completely different from the conventional approach. This is exactly how the payload +åEMAIL"PASSWORD(0 looks like. This is the content type parameters content-type: application/grpc-web+proto.
The following is what I see in dev tools when I log in to that site manually:
General
--------------------------------------------------------
Request URL: https://grips-web.aboutyou.com/checkout.CheckoutV1/logInWithEmail
Request Method: POST
Status Code: 200
Remote Address: 104.18.9.228:443
Response Headers
--------------------------------------------------------
Referrer Policy: strict-origin-when-cross-origin
access-control-allow-credentials: true
access-control-allow-origin: https://www.aboutyou.cz
access-control-expose-headers: Content-Encoding, Vary, Access-Control-Allow-Origin, Access-Control-Allow-Credentials, Date, Content-Type, grpc-status, grpc-message
cf-cache-status: DYNAMIC
cf-ray: 67d009674f604a4d-SIN
content-encoding: gzip
content-type: application/grpc-web+proto
date: Wed, 11 Aug 2021 08:19:04 GMT
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
set-cookie: __cf_bm=a45185d4acac45725b46236884673503104a9473-1628669944-1800-Ab2Aos6ocz7q8B8v53oEsSK5QiImY/zqlTba/Y0FqpdsaQt2c10FJylcwTacmdovm6tjGd8hLdy/LidfFCtOj70=; path=/; expires=Wed, 11-Aug-21 08:49:04 GMT; domain=.aboutyou.com; HttpOnly; Secure; SameSite=None
vary: Origin
Request Headers
--------------------------------------------------------
:authority: grips-web.aboutyou.com
:method: POST
:path: /checkout.CheckoutV1/logInWithEmail
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: no-cache
content-length: 48
content-type: application/grpc-web+proto
origin: https://www.aboutyou.cz
pragma: no-cache
referer: https://www.aboutyou.cz/
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36
x-grpc-web: 1
Request Payload
--------------------------------------------------------
+åEMAIL"PASSWORD(0
This is what I've created so far (can't find any way to fill in the payload):
import requests
from bs4 import BeautifulSoup
start_url = 'https://www.aboutyou.cz/'
post_link = 'https://grips-web.aboutyou.com/checkout.CheckoutV1/logInWithEmail'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3',
'content-type': 'application/grpc-web+proto',
'origin': 'https://www.aboutyou.cz',
'referer': 'https://www.aboutyou.cz/',
'x-grpc-web': '1'
}
payload = {
}
with requests.Session() as s:
s.headers.update(headers)
r = s.post(post_link,data=payload)
print(r.status_code)
print(r.url)
Steps to log in to that site manually:
Go to this site
This is how to get the login form
Login form looks like this
How can I log in to that site using requests module?
I don't think that you'll be able to use Python Requests to login to your target site.
Your post_link url:
post_link = 'https://grips-web.aboutyou.com/checkout.CheckoutV1/logInWithEmail'
states that it is: gRPC requires HTTP/2 and Python Requests send HTTP/1.1 requests only.
Additionally, I noted that the target site also uses CloudFlare, which is difficult to bypass with Python, especially when using Python Requests
'Set-Cookie': '__cf_bm=11d867459fe0951da4157b475cf88eb3ab7658fb-1629229293-1800-AeFomlmROcmUYcRosxxcSnoJkGOW/WXjUe1WxK6SkM2eXIbnAqXRlpwOkpvOfONrbApJd4Qwj+a8+kOzLAfpHIE=; path=/; expires=Tue, 17-Aug-21 20:11:33 GMT; domain=.aboutyou.com; HttpOnly; Secure; SameSite=None', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '6805616b8facf1b2-ATL', 'Content-Encoding': 'gzip'}
Here are previous Stack Overflow questions on Python Requests with gRPC
Can't make gRPC work with python requests rest api call
Send plain JSON to a gRPC server using python
I looked through the GitHub repository for Python Requests and saw that HTTP/2 has been a requested feature for almost 7 years.
During my research, I discovered HTTPX, which is a HTTP client for Python 3, which provides sync and async APIs, and support for both HTTP/1.1 and HTTP/2. The documentation states that the package is stable, but is still considered a beta at this point. I would recommend trying HTTPX to see if it solves your issue with logging into your target site.

Editing dictionary in Python

I'm still very new to coding (been coding for a week) so I am struggling with a very basic function.
I am trying to log into a website using python however I am having a hard time changing the set-cookie header.
See my current code below:
import requests
targetURL = "http://hostip/v2_Website/aspx/login.aspx"
headers = {
"Host": "*host IP*",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0"}
response = requests.get(url=targetURL,
proxies=proxies,
headers=headers,)
response_headers = response.headers
When I print the response.headers I get the following:
{'Cache-Control': 'no-cache, no-store', 'Pragma': 'no-cache,no-cache', 'Content-Length': '15671', 'Content-Type': 'text/html; charset=utf-8', 'Expires': '-1', 'Server': 'Microsoft-IIS/7.5', 'X-AspNet-Version': '2.0.50727', 'Set-Cookie': 'ASP.NET_SessionId=vq5q4lzlrqiiebbmxw341yic; path=/; HttpOnly, CookieLoginAttempts=5; expires=Tue, 14-Aug-2018 17:14:09 GMT; path=/', 'X-Powered-By': 'ASP.NET', 'Date': 'Tue, 14 Aug 2018 07:14:10 GMT', 'Connection': 'close'}
Obviously when I use these headers in my HTTP POST it fails due to the POST having a Set-Cookie header instead of Cookie value.
My objectives are as follows:
Update/change the Set-Cookie key to Cookie
Then I would like to remove values that are not needed in the Cookie key
Add other keys and values
Ultimately I would like to change the headers to the following so I can use it for my POST in order for me to pass login credentials:
POST /Test server/aspx/Login.aspx?function=Welcome HTTP/1.1
Host: *Host IP*
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://*HostIP*/v2_Website/aspx/main.aspx?function=Welcome
Cookie: ASP.NET_SessionId=3vy0fy55xsmffhbotikrwh55; CookieLoginAttempts=5; Admin=false
Connection: close
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
Content-Length: 220
Is my above objective even possible? If so how does one even achieve it as I don't understand the process of modifying a dictionary I can't see.
Again I would like to you to note I am still very green in the world of coding and trying to "think like a coder" thus keeping responses a little less technical would be highly appreciated, just so I can understand your response and advice. Any help would be great!
I was a able to find the answer with a quite a bit of research.
Instead of trying to edit it manually I did the following:
import requests
session_requests = requests.session()
login_url = "http://*host ip*/v2_Website/aspx/Login.aspx"
result = session_requests.get(login_url)
result = session_requests.post(login_url,
headers= dict(referer=login_url))
This pulls through the needed cookie and adds everything as need. My headers come back as follows:
POST /_v2_Website/aspx/Login.aspx HTTP/1.1
Host: *host IP*
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
referer: http://*hostIP*/v2_Website/aspx/Login.aspx
Cookie: ASP.NET_SessionId=3crqoo45hnn21anuqulmmr55; CookieLoginAttempts=5
Content-Length: 79
Content-Type: application/x-www-form-urlencoded

Form Submission/POST using Requests in Python

I'm trying to submit a form to a website using the requests module in Python but my form is not submitting correctly. I can submit the form correctly manually on the site but I assume something is wrong with my code that is causing the Python submission to fail. I can successfully login to the website and visit pages/issue GET requests using Python. I can issue a GET request to the page that I submit the form to and it will successfully load, i.e. the requests login works. I have included all of my output below including the Python code, an invalid form submission in Python and a valid form submission from the browser. This may be overkill but I am inexperienced with this and am not sure what is necessary.
My code to login is:
s = requests.Session()
data = s.get(login_url)
authToken = re.search(('name="authenticity_token"[\s]'
'type="hidden"[\s]+value="(.+)"'), \
data.text).group(1)
data_dict = {
'utf8': '✓',
'authenticity_token': authToken,
'admin[email]': username,
'admin[password]': password,
'admin[remember_me]': '1',
'commit': 'Sign in'
}
s.post(login_url, data_dict)
This successfully logs me in and I can submit GET requests to any page and get valid results.
My code to submit the form:
payload = {
'utf8': '✓',
'authenticity_token': authToken,
'progress_course[name]': name,
'progress_course[description]': desc,
'commit': 'Create Course'
}
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Content-Length':'176',
'Content-Type':'application/x-www-form-urlencoded',
'Host':'xxx.com',
'Origin':'https://xxx.com',
'Referer':'https://xxx.com/workshop/progress/courses/new',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36'
}
response = s.post(path,data=payload,headers=headers)
My form submission does not work. Here is the python logging module output:
The POST:
send: 'POST /workshop/progress/courses HTTP/1.1
Origin: https://xxx.com\r\nContent-Length: 181
Accept-Language: en-US,en;q=0.8
Accept-Encoding: gzip, deflate
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Host: xxx.com
Referer: https://xxx.com/workshop/progress/courses/new
Cache-Control: max-age=0
Cookie: _brainfit_session=xxx; remember_admin_token=xxx
Content-Type: application/x-www-form-urlencoded
progress_course%5Bdescription%5D=pytest&utf8=%26%23x2713%3B&commit=Create+Course&progress_course%5Bname%5D=pytest&authenticity_token=xxx
The reply to the POST. You can see that this comes from the sign-in page rather than generating a new page as seen in the correct output further down.
reply: 'HTTP/1.1 302 Found\r\n'
header: Server: nginx
header: Date: Tue, 02 Dec 2014 01:57:46 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Status: 302 Found
header: Strict-Transport-Security: max-age=31536000
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Content-Type-Options: nosniff
header: Location: https://xxx.com/admins/sign_in
header: Cache-Control: no-cache
header: X-Request-Id: 983472d4-8954-4106-904a-38ea3b6a76a1
header: X-Runtime: 0.039963
DEBUG:requests.packages.urllib3.connectionpool:"POST /workshop/progress/courses HTTP/1.1" 302 None
Redirect GETs and replies. I may be mistaken in what these actually are:
send: 'GET /admins/sign_in HTTP/1.1
Origin: https://xxx.com
Accept-Language: en-US,en;q=0.8
Accept-Encoding: gzip, deflate
Host: xxx.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Connection: keep-alive
Referer: https://xxx.com/workshop/progress/courses/new
Cache-Control: max-age=0
Cookie: _brainfit_session=xxx; remember_admin_token=xxx
Content-Type: application/x-www-form-urlencoded
reply: 'HTTP/1.1 302 Found\r\n'
header: Server: nginx
header: Date: Tue, 02 Dec 2014 01:57:46 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Status: 302 Found
header: Strict-Transport-Security: max-age=31536000
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Content-Type-Options: nosniff
header: Location: https://xxx.com/workshop/progress/courses
header: Cache-Control: no-cache
header: Set-Cookie: _brainfit_session=xxx; path=/; secure; HttpOnly
header: X-Request-Id: 9e5535d6-143c-4e98-8bb4-35dd29ab045d
header: X-Runtime: 0.007505
DEBUG:requests.packages.urllib3.connectionpool:"GET /admins/sign_in HTTP/1.1" 302 None
send: 'GET /workshop/progress/courses HTTP/1.1
Origin: https://xxx.com
Accept-Language: en-US,en;q=0.8
Accept-Encoding: gzip, deflate
Host: xxx.com
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Connection: keep-alive
Referer: https://xxx.com/workshop/progress/courses/new
Cache-Control: max-age=0
Cookie: _brainfit_session=exxx; remember_admin_token=xxx
Content-Type: application/x-www-form-urlencoded
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
header: Date: Tue, 02 Dec 2014 01:57:46 GMT
header: Content-Type: text/html; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: Status: 200 OK
header: Strict-Transport-Security: max-age=31536000
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1; mode=block
header: X-Content-Type-Options: nosniff
header: Cache-Control: max-age=0, private, must-revalidate
header: Set-Cookie: _brainfit_session=xxx; path=/; secure; HttpOnly
header: X-Request-Id: 4e759b60-af1c-4f39-b033-71ff99b62df4
header: X-Runtime: 0.019001
header: Content-Encoding: gzip
DEBUG:requests.packages.urllib3.connectionpool:"GET /workshop/progress/courses HTTP/1.1" 200 None
The POST headers of a properly submitted form on the site:
Remote Address:xxx
Request URL:https://xxx.com/workshop/progress/courses
Request Method:POST
Status Code:302 Found
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:176
Content-Type:application/x-www-form-urlencoded
Cookie:__utma=xxx; __utmc=xxx; __utmz=xxx.utmcsr=google|utmccn=(organic)|utmcmd=organic|
utmctr=xxx; km_lv=x; kvcd=xxx; km_ai=xxx; km_ni=xxx; km_uq=xxx; has_logged_in=true; WT_FPC=id=xxx; _ga=xxx; _brainfit_session=xxx
Host:xxx.com
Origin:https://xxx.com
Referer:https://xxx.com/workshop/progress/courses
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Form Data
utf8:✓
authenticity_token:xxx
progress_course[name]:test03
progress_course[description]:test03
commit:Create Course
Response Headers
Cache-Control:no-cache
Connection:keep-alive
Content-Type:text/html; charset=utf-8
Date:Mon, 01 Dec 2014 14:01:37 GMT
Location:https://xxx.com/workshop/progress/courses/44
Server:nginx
Set-Cookie:_brainfit_session=xxx; path=/; secure; HttpOnly
Status:302 Found
Strict-Transport-Security:max-age=31536000
Transfer-Encoding:chunked
X-Content-Type-Options:nosniff
X-Frame-Options:SAMEORIGIN
X-Request-Id:8bb04242-bc5a-408f-99a5-9d8bf4eeb611
X-Runtime:0.028559
X-XSS-Protection:1; mode=block
Once the form is correctly submitted there is also a GET response from the page it redirects to. I've excluded additional output here but it can be added if it will help resolve the problem.
Remote Address:xxx
Request URL:https://xxx.com/workshop/progress/courses/44
Request Method:GET
Status Code:200 OK

yahoo finance options data using python requests library

I am trying to scrape the following url as it appears to me in chrome:
http://finance.yahoo.com/q/op?s=MSFT&date=1414713600
For chrome 37 and linux, it appears with the table headers:
Strike, Contract Name, Last, Bid, Ask, Change, %Change, Volume, Open Interest, Implied Volatility
When I grab it with python requests library, it returns a table with the headers.
"Strike Symbol Last Chg Bid Ask Vol Open Interest"
url = "http://finance.yahoo.com/q/op?s=MSFT&date=1414713600"
text = requests.get(url)
Is there any way to use requests to get the data in the same manner as it appears in chrome? I assume I have to send a change in headers, but I have tried a number of different things that have not worked.
my last try was the following:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip,deflate,sdch',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36',
'content-type': 'application/json'
}
here are the headers chrome sends for me to get the website I want:
GET /q/op?s=MSFT&date=1413590400 HTTP/1.1
Host: finance.yahoo.com
Accept: */*
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8,es;q=0.6
Cookie: PREF=ID=a6c1e76064f94d39:U=827f4a5f7ace80ef:FF=0:LD=en:TM=1405040931:LM=1410201670:GM=1:S=btbAGy8zLyRsY3Qr; SID=DQAAACwCAADXppdnRkNOk3FSR5EzCt5fvxdXMAH87uz4j_4Xm2aM-zbhU5JEXVw93LhmIjRCf2B-nonvZv4hXuO_V2FYXsdOYCxigM7jpBxhd1RcK-zfzM1w5SrEa6Q5aFfBHvU6p5KkPQ8v4m8srb_k0cxAzf7x5KRlkZ5llB5oDLpiPPUhFORd7y2xobzHvIo9YJ92pv8kmCnPWnNfMUUMK6XEoov56B1SmfJ8XUTyRN3oBIe0MKAPodTznhLlB0_qzqG1ohjEIv_WoiauvWNzmLGuzmQqAuHlMswIBnF5S0ECJ7sb-6Nap9FITjCnJfrelTqTH782wJL9pC7SGlU0L1LTkBb_j1Zmy4HI2eItaNhFXy-LIWF2hX88ijAQkYas0SrBvLC7d-SC58mLuvqwW4ZHntTCkIvRv1ZUs8fQPFCLl-WbmHcHhgavmVeEGXuCNAmzcAkslJ_eBghQUlbPPOqEAzNMgWrJkAzVcQBTxhYi2P8mQ4_c91L7mFeLe_NspFLCoWLnlIyfxwfVma6fpOGpGgg3eWydiOWT-WcNzNnY6P1TrN0gIL4QZvaD7VdBkvMJVtUeFHhxjZE5535Gddg1g1N469MesuGcIEISjOtjoBMpYMjmbRvIV4UyKSTFzSiqxZxOI0uSn1w-acf5g6G4tBmbfU_kSz1ass8ZV0rmiSrnTniNGWEg635jKCcTFND4KYuUJ1T21RlRP3q3btvejbzx29uuJxwWkR5KX0QR7CvVNQ; HSID=A2inQ1hxY_Akxsjaz; SSID=A9udoa1_RoYRXqu3b; APISID=LiX-ftG4pPZq-K4a/ALRoyKU_0PfrlgD5X; SAPISID=iVLeNwxPVKei4Q0q/Ayrp0XHMIQ5iq33wu; S=quotestreamer=6kIdj_SK-1Tobo0ctY2M3Q; NID=67=uQYk0bBnV8JbGB3UkfzG7f_1_vzYbCxH5BYU7_LjzwvN7WwEqe38sg93Um1eUxMlKDfgvuG2jpjL9UyjBI4gdGZDnMTDD21th6S0ptMJRh7HVCqxqzVMZMWtzQqaFOcCmHfNMp3wJBNwoynu2bsrJtNy5Kfz0TvaMkm-T6-_7h9PCTds7yzia46XVioSQl-8JYBbePrm9iaYYArRxMPl9mYs_5sEv5Noo4KRtKxUM4VpMGRx3Fykp1PMvMIU2T9cl2w9JiOGgryDuK-3b41Ihx326qpAdlAkB897bCf2TPl1AezTWf1UQOzTzv8XdPsvzd2sa4Sx0hND_Pi7rwPLHgzVOg9Sh3cqtUkjXycsKYWrVFIKfQCyy4g8sjKqwgvPQA; S=quotestreamer=6kIdj_SK-1Tobo0ctY2M3Q:talkgadget=aHfB4TSaCOCYGPzGrmtTLA
DNT: 1
Referer: https://579.talkgadget.google.com/talkgadget/d?token=AHRlWrqRUYIuQjAi1pH4ZFQhJWe92QrASI-PuFy3th_zCVENP71Zh90Pzs9Ch2JvJAf2qFw21V5oopm3dZRnzTTLl8UBAJ4co-b4WztC8owNbJhlYoVy-dS2lnsdBlXzcVph10ii8uFf&xpc=%7B%22cn%22%3A%22dV1z5gMTwj%22%2C%22tp%22%3Anull%2C%22osh%22%3Anull%2C%22ppu%22%3A%22https%3A%2F%2Fmightytext.net%2F_ah%2Fchannel%2Fxpc_blank%22%2C%22lpu%22%3A%22https%3A%2F%2F579.talkgadget.google.com%2Ftalkgadget%2Fxpc_blank%22%7D
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36
X-Client-Data: CJa2yQEIprbJAQiptskBCN6WygE=
HTTP/1.1 200 OK
Age: 1
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html
Date: Wed, 15 Oct 2014 12:08:23 GMT
Server: ATS
Transfer-Encoding: chunked
Vary: X-Ssl
Via: http/1.1 media-border67.global.media.bf1.yahoo.com (ApacheTrafficServer [cMsSf ]), http/1.1 r18.ycpi.dcb.yahoo.net (ApacheTrafficServer [cMsSf ])
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

Categories

Resources