Python: request to website doesn't work in any cases - python

I want to write a function which collects data from yahoo finance site. The website request looks like that:
import requests
def yahoo_summary_stats(stock):
response = requests.get(f"https://finance.yahoo.com/quote/{stock}")
print(response.reason)
if I call the function with parameter 'ALB':
yahoo_summary_stats('ALB')
everything works fine and the request is ok. He correctly leads me to:
https://finance.yahoo.com/quote/ALB
The call:
yahoo_summary_stats('AEE')
on other hand should lead me to the site https://finance.yahoo.com/quote/AEE, which I can call without any problems in firefox.
The program for some reason gives me a 'Not found' error. What is the problem of my request to that website?

Try to set User-Agent in headers...
def yahoo_summary_stats(stock):
response = requests.get(f"http://finance.yahoo.com/quote/{stock}", headers={'User-Agent': 'Custom user agent'})
print(response.status_code)
print(response.reason)
yahoo_summary_stats('ALB')
yahoo_summary_stats('AEE')

Related

requests.Session() not working even though the post is working

I'm trying to get a bunch of PDFs off of a site that sits behind a login so I don't manually have to download each and every one. I figured this would be easy, but I'm getting a "Missing Key-Pair-Id query parameter" error back. Here's what I have:
payload={'username':'user','password':'pass'}
with requests.Session() as session:
post = session.post('https://website.com/login.do', data=payload)
r = session.get('https://files.website.com/1.pdf')
print(r.text)
I'm printing r.text just because that's where I'm getting the above message. My post variable is giving me a response of 200 and the contents post.text is a redirect link with "code: success", too. If I click that link (or copy paste it into a private browser), I'm logged in just fine. And browsing to the pdf link works just fine. What am I missing here? Thanks.

How to send cookie (header) using Python requests library

Hi I am new with python requests and would like to have some help.
When I try to use python requests and get the session cookie, use the following command:
session_req = requests.session()
result = session_req.get(
get_url
)
after execute GET from requests, I use the '.cookies' property ant the respective key I want to send at the POST Header, I get the value successfully, but the POST action is not working.
session_req.cookies['IFCSHOPSESSID']
but when I get the request from the same API via POSTMAN and try to get the cookie property (exporting the code as python requests) I found some differences, and if I use this same cookie exported from POSTMAN it works.
POSTMAN EXAMPLE
'cookie': 'IFCSHOPSESSID=hrthhiqdeg0dvf4ecooc83lui3; nikega=GA1.4.831513767.1599354095; nikega_gid=GA1.4.1839484382.1599354095; _ga=GA1.3.831513767.1599354095; _gid=GA1.3.733956911.1599354099; chaordic_browserId=0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_anonymousUserId=anon-0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_testGroup=%7B%22experiment%22%3Anull%2C%22group%22%3Anull%2C%22testCode%22%3Anull%2C%22code%22%3Anull%2C%22session%22%3Anull%7D; user_unic_ac_id=bec863cf-4e06-0ab1-d881-b566595d3e8f; _gcl_au=1.1.1305519862.1599354100; _fbp=fb.2.1599354100232.504934336; smeventsclear_16df2784b41e46129645c2417f131191=true; smViewOnSite=true; __pr.cvh=4ftsyf8x16; _gaexp=GAX1.3.tupm6REJTMeD-piAakRDMA.18557.0; blueID=75a502b6-e7c2-4eb3-8442-75aea5d95fdc; _cm_ads_activation_retry=false; sback_client=5816989a58791059954e4c52; sback_partner=false; sb_days=1599356617672; sback_refresh_wp=no; smClickOnSite=true; smClickOnSite_652c0aaee02549a3a6ea89988778d3fc=true; _rtbhouse_source_=socialminer; RKT=false; dedup=socialminer; lmd_cj=socialminer; advcake_url=https%3A%2F%2Fwww.nike.com.br%2Flancamentos%3Futm_source%3Dsocialminer%26utm_medium%3Dsocialminer_onsitedesktop%26utm_campaign%3Dsocialminer_onsitedesktop_lancamentos_desk%26smid%3D3-17; advcake_trackid=dd7e2ef0-dd50-889a-aeea-559a0d8bcd22; advcake_utm_content=socialminer_onsitedesktop_lancamentos_desk; advcake_utm_campaign=socialminer; Campanha=; Parceiro=; Midia=; AMCVS_F0935E09512D2C270A490D4D%40AdobeOrg=1; s_cc=true; lmd_orig=direct; SIZEBAY_SESSION_ID=0AC1A70CB19F4f03610665d04bb088ef3b9af0942fc8; sback_customer_w=true; sback_browser=0-87718800-1599408894bff13e290b9fee5fc2b430382f639b87dd9cf25112334287575f550afed62983-14051381-17920887216,13017640152-1599408894; sback_access_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcGkuc2JhY2sudGVjaCIsImlhdCI6MTU5OTQwODg5NSwiZXhwIjoxNTk5NDk1Mjk1LCJhcGkiOiJ2MiIsImRhdGEiOnsiY2xpZW50X2lkIjoiNTgxNjk4OWE1ODc5MTA1OTk1NGU0YzUyIiwiY2xpZW50X2RvbWFpbiI6Im5pa2UuY29tLmJyIiwiY3VzdG9tZXJfaWQiOiI1ZjU0M2VjODA5ZjFkMDkzMmQzMjQ2OTUiLCJjdXN0b21lcl9hbm9ueW1vdXMiOmZhbHNlLCJjb25uZWN0aW9uX2lkIjoiNWY1NDNlYzgwOWYxZDA5MzJkMzI0Njk2IiwiYWNjZXNzX2xldmVsIjoiY3VzdG9tZXIifX0.K6FYVBasHjMg_PLbT1yZfrnIp97USqijoMObF4eUSms.WrWrDrHeHezRqBiYiYHeDr; sback_customer=$2gSxATWYdVYOVGMI10bUdkW2pWeoZERU1kc1YWWhd1SNR0aMJ0QUVzTHpHdJZERnpVS6FTSkRUTOBjMys2bUdnT2$12; sback_pageview=false; ak_bmsc=B6177778CB59637165F7EC43342C1559C9063147DA220000234E555F8D78F831~plACNrc4cNxoHZNcO7aF4o+U0KQNKjzPECGSfb42NdayPvdNkBWwUT9QOhGjuLJJ3vStuFIRkiI/35wsHEyUE3/h2guphhaEy71BnfekvDtb/6F84hS+fWhPxxVG5RAlph8WzGpYMn6NZESNVcgnZYfH4HoZ/IzBPR6AMG9UGn6W4xm/j/j9kOfef8v/fZf2pXw4mxJuiN5Cxc7g2sV4nCdoEW98Q4AgqplzxWZjpamZk=; bm_sz=6586256DDAFC895D740341E4214D0D40~YAAQRzEGybYT7yN0AQAAfDw5ZQnXjJtKI2SxkwQFV9vLZpF5mACXNUtUFDSkidKuYM2fac5sQgRozU9fA3+017dht/PUtH+wtibATtTmoVOlpKnW+V76+1rySk3HK6q83Q9rtQc/LaaQ8VYtK/tDi0VOc7/0wLyKy/+Z4OLtgUpySYZZcEX4k8/46no8rFD6OQ==; AMCV_F0935E09512D2C270A490D4D%40AdobeOrg=359503849%7CMCIDTS%7C18512%7CMCMID%7C56897587165425478193529762442649463163%7CMCAAMLH-1600030892%7C4%7CMCAAMB-1600030892%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1599433292s%7CNONE%7CMCSYNCSOP%7C411-18519%7CvVersion%7C5.0.1; sback_total_sessions=3; sback_session=5f554e3c73a63da56d739d87; lmd_traf=direct-1599402359608&direct-1599408890286&direct-1599414284313&direct-1599427194077; chaordic_realUserId=2962653; chaordic_session=1599429266491-0.4343169041143473; _st_ses=49222273669791505; _st_cart_script=helper_nike.js; _st_cart_url=/; _sptid=1592; _spcid=1592; _st_id=cnVkc29ucmFtb25AZ21haWwuY29t; _st_idb=cnVkc29ucmFtb25AZ21haWwuY29t; lx_sales_channel=%5B%222%22%5D; sback_cart=5f555ba24f507d767721c387; CSRFtoken=1ac8a198f88ac1ccc1f8555ab41c8a95; gpv_v70=nikecombr%3Echeckout%3Eaddress; pv_templateName=CHECKOUT; gptype_v60=checkout%3Aaddress; stc119288=env:1599429270%7C20201007215430%7C20200906222939%7C5%7C1088071:20210906215939|uid:1599354102799.1149977977.6705985.119288.1871143352:20210906215939|srchist:1088071%3A1599429270%3A20201007215430:20210906215939|tsa:1599429270805.1898407973.364911.7635034620790062.2:20200906222939; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhZwbfHPtnzaT3ASbIZnf2Ts1TRt+ZAescJJwrNPs4oV2If7vyiWi2AYILFvCstCTS8=; _uetsid=a9a0bfd4fe4e4db52bcd4ca66850a785; _uetvid=9ba47ed116a48f496f6b1a9844e21c95; __udf_j=f08aeb668454efbf6ddc83dd9d4b7a8385abde9f9fbd92526f1de0441da2126ec40330dfc36d0b9c3eae98557c94447d; _spl_pv=40; s_sq=lojanike-new-production%252Clojanike-nikebr%3D%2526c.%2526a.%2526activitymap.%2526page%253Dnikecombr%25253Echeckout%25253Eaddress%2526link%253DSeguir%252520para%252520pagamento%2526region%253Didentificacao-form%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253Dnikecombr%25253Echeckout%25253Eaddress%2526pidt%253D1%2526oid%253DSeguir%252520para%252520pagamento%2526oidt%253D3%2526ot%253DSUBMIT; RT="z=1&dm=nike.com.br&si=92b42534-25ee-4155-aa1a-e7d127581869&ss=kermvxyl&sl=9&tt=17e8&bcn=%2F%2F173e2544.akstat.io%2F"; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGycno+iJ0AQAAmtRxZQT8kxLFalTup4dkYT5+cq/PavPcY4/0zAeJv4GoSQQwYVj4EWydkfxbJR3Rgaa4k6ma+5O72J/lsiajATrx0oaZJuB5b/FIP6RymanPRVGlb3kLJXpBQDkCmVv62kkxLKxySrlAYDCg0ORCpSXlTCbFBVEchC9ih5t094egSeVdM6VjfQSO9uDKISBoP4923qkJMTpbk9B1nOoiylKK+y+FGFu8pzEpQqZYj7tIMTJVpqe0OpXaQ8m8nPyp0K+PmBcAndIHcBMTZUEqma9/72Enx8yvGbKXrYbAzNDw6ZtKY9OAbNuVeqprza/Af0aUkinm0l3JqxjTH1LpglNxNN4=~-1~-1~-1; CSRFtoken=20a208bad599aa3ead0bbe944b27a368; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhbSXhHWWrgkUsOTt9033P5Wxu1qmo5M6w0VAWeAzBaCN7yZC2Ll7DiGq0CwpjxlOW4=; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGyRKO+iJ0AQAA+4U9ZQSNIWTEz/60Uk5gz2tnzVtbMbX0hpaMbkbeJxSYSMD1xo7TTedXnJ0UuTLxxcHhLVrRRCrZfSjZ+yH00Ld6FLIajmYFefKPehzA6GgwjnLyucI1O6nDw2ZU1CV0WJLeWGgcmX7sinsLr3DVtmoGJyNR1Q9EWpvq71/W1Ys4Bqhq1628YKEz/0Z1Ic1bWMujcG03064ZZYYXTSTz9jrkxHKaEoJQNQgyUg9NXQhv4EFoMSESy/AIKRy+hVCULLJscbkpH8WakuvYQ1raghVfheks/Xra9AmiUoOqAbWAPXOij1nWQ9PSV2hxQZfkibD0+YP14pTXPoCAUA9jCQHRJIw=~0~-1~-1'
session_req.cookies['IFCSHOPSESSID'] EXAMPLE
qnabtagl4pu7gm2jg3sij03cu6
Other curious thing is that when I use the '.cookies' property, my POST call return sucess even without update the cart where it should be inserting a new register.
As I am trying to develop one site bot, I would like to generate this same cookie via python requests code. Can anyone try to help me on it?
This is an example with python 3. You can customize it.
import requests
data ="param_1=value_1&param_2=value_2&.....&param_n=value_n"; #your request parameters.
cookie = "cookie_name=xxxxxxxx;....." #define cookie
url_endpoint = "htpps://........." # your url endpoint
# add cookies to endpoints
resp = requests.get(url_endpoint, data=data.encode('utf-8'),cookies=cookie)
if(resp.status_code==200):
print("success ")
else:
print("error ")

invalid response from proxy with python requests

I am using Requests API with Python2.7.
I am trying to download certain webpages through proxy servers. I have a list of available proxy servers. But not all proxy servers work as desired. Some proxies require authentication, others redirect to advertisement pages etc. In order to detect/verify incorrect responses, I have included two checks in my url requests code. It looks similar to this
import requests
proxy = '37.228.111.137:80'
url = 'http://www.google.ca/'
response = requests.get(url, proxies = {'http' : 'http://%s' % proxy})
if response.url != url or response.status_code != 200:
print 'incorrect response'
else:
print 'response correct'
print response.text
There are some proxy servers with which the requests.get call is successful and they pass these two conditions and still contain invalid html source in response.text attribute. However, if I use the same proxy in my FireFox browser and try to open the same webpage, I am displayed an invalid webpage, but my python script says that the response should be valid.
Can someone point to me that what other necessary checks I am missing to weed out incorrect html results?
or
How can I successfully verify if the webpage I intended to receive is correct?
Regards.
What is an "invalid webpage" when displayed by your browser? The server can return a HTTP status code of 200, but the content is an error message. You understand it to be an error message because you can comprehend it, a browser or code can not.
If you have any knowledge about the content of the target page, you could check whether the returned HTML contains that content and accept it on that basis.

Python: urllib2 get nothing which does exist

I'm trying to crawl my college website and I set cookie, add headers then:
homepage=opener.open("website")
content = homepage.read()
print content
I can get the source code sometimes but sometime just nothing.
I can't figure it out what happened.
Is my code wrong?
Or the web matters?
Does one geturl() can use to get double or even more redirect?
redirect = urllib2.urlopen(info_url)
redirect_url = redirect.geturl()
print redirect_url
It can turn out the final url, but sometimes gets me the middle one.
Rather than working around redirects with urlopen, you're probably better off using a more robust requests library: http://docs.python-requests.org/en/latest/user/quickstart/#redirection-and-history
r = requests.get('website', allow_redirects=True)
print r.text

In Python why does urllib.urlopen make Google give an http status "302 Moved"?

Using Python 2.6.6 on CentOS 6.4
import urllib
#url = 'http://www.google.com.hk' #ok
#url = 'http://clients1.google.com.hk' #ok
#url = 'http://clients1.google.com.hk/complete/search' #ok (blank)
url = 'http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc' #fails
print url
page = urllib.urlopen(url).read()
print page
Using the first 3 URLs, the code works. But with the 4th URL, Python gives the following 302:
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
The URL in my code is the same as the URL it tells me to use:
My URL: http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc
Its URL: http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc
Google says URL moved, but the URLs are the same. Any ideas why?
Update: The URLs all work fine in a browser. But in Python command line the 4th URL is giving a 302.
urllib is ignoring the cookies and sending the new request without cookies, so it causes a redirect loop at that URL. To handle this you can use urllib2 (which is more up-to-date) and add a cookie handler:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
response = opener.open('http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc')
print response.read()
It most likely has to do with the headers and perhaps cookies. I did a quick test on the command-line using curl. It also gives me the 302 moved. The Location header it provides is different, as is the one in the document. If I follow the body URL I get a 204 response (weird). If I follow the Location header I end up getting a circular response like you indicate.
Perhaps important is the Set-Cookie header. It may be redirecting until it gets an appropriate cookie set. It may also be scanning the User-Agent and doing something based on that. Those are the big aspects that differentiate a browser from a tool like requests, or urlib. The browser creates sessions, stores cookies, and sends different headers.
I don't know why urllib fails (I get the same response), however requests lib works perfectly:
import requests
url = 'http://clients1.google.com.hk/complete/search?output=toolbar&hl=zh-CN&q=abc' # fails
print (requests.get(url).text)
If you use your favorite web debugger (Fiddler for me) and open up that URL in your browser, you'll see that you also get that initial 302 response. Your browser is just smart enough to redirect you automatically. So your code is returning the correct response. If you want your code to redirect to the new URL automatically, then you have to make your code smart enough to do so.

Categories

Resources