I have a problem with a website called www.webhallen.com. I want to put any numbers and character after www.webhallen.com/ for example www.webhallen.com/3459gdfg. I then want to se if that website exist the problem is that the site rederect me to there own 404 page. How can I via requests see if they redirect me?
You can check for the response status code. If you get response.status_code as 404. You will have been redirected to there 404 site.
Related
Hay ! I am new here so let me describe clearly my issue,Please Ignore mistakes.
I am making request on a page which literlaly works on js.
Acually its the page of paytm payemnt response through UPI.
When ever i do the requests the response is {'POLL_STATUS':"STOP_POLLING"}
But the problem is the reqest is giving this response while the browser is giving another response with loaded html.
I tried everyting like stopeed redirects and printing raw content nothing works.
I just think may be urllib post request may be work but i do not know the uses.
Can anyone please tell me how to get the exact html response as the browser gives.
Note[0]:Please dont provide answer of selenium because this issue having in the middle of my script.
Note[1]:Friendly answer appriciated.
for i in range(0,15):
resp_check_transaction=self.s.post("https://secure.website.in/theia/upi/transactionStatus?MID="+str(Merchant_ID)+"&ORDER_ID="+str(ORDER_ID),headers=check_transaction(str(ORDER_ID)),data=check_transaction_payload(Merchant_ID,ORDER_ID,TRANSID,CASHIERID))
print(resp_check_transaction.text)
resp_check_transaction=resp_check_transaction.json()
if resp_check_transaction['POLL_STATUS']=="STOP_POLLING":
print("Breaking looop")
break
time.sleep(4)
self.clear_header()
parrms={
"MID": str(Merchant_ID),
"ORDER_ID": str(ORDER_ID)
}
resp_transaction_pass=requests.post("https://secure.website.in/theia/upi/transactionStatus",headers=transaction_pass(str(ORDER_ID)),data=transaction_pass_payload(CASHIERID,UPISTATUSURL,Merchant_ID,ORDER_ID,TRANSID,TXN_AMOUNT),params=parrms,allow_redirects=True)
print("Printing response")
print(resp_transaction_pass.text)
print(resp_transaction_pass.content)
And in the web browser its showing that Status Code: 302 Moved Temporarily in the bank response of Bank response. :(
About the 302 status code
You mention that the web browser is sends a 302 status code in response to the request. In the simplest terms the 302 status code is just the web servers way of saying "Hey I know what you're looking for but it is actually located at this other URL.".
Basically all modern browsers and HTTP request libraries like Python's Requests will automatically follow a 302 redirect and act as though you send the request to the new URL instead. (Your browser's developer tools may show that a 302 redirect has happened but as far as the JavaScript is concerned it just got a normal 200 response).
If you really want to see if your Python script receives a 302 status you can do so by setting the allow_redirects option to False, but this means you will manually have to get the stuff from the new URL.
import requests
r1 = requests.get('https://httpstat.us/302', allow_redirects=False)
r2 = requests.get('https://httpstat.us/302', allow_redirects=True)
print('No redirects:', r1.status_code) # 302
print('Redirects on:', r2.status_code) # 200 (status code of page it redirects to)
Note that allow_redirects is already set to True by default, I just wanted to make the example a bit more verbose so the difference is obvious.
So why is the response content different?
So even though the browser and the Requests library are both automatically following the 302 redirect the response they get is still different, you didn't share any screenshots of the browsers requests or responses so I can only give a few educated guesses but it boils down to the fact that the request made by your Python code is somehow different from the JavaScript loaded by the web browser.
Some things to consider:
Are you sure you are using the he correct HTTP method? Is the browser also making a POST request?
If so are you sure the body of the request is the same/of the same format as the one sent by the web browser?
Perhaps the browser has a session cookie it is sending along with the request (Note this usually not explicitly said in the JS but happens automatically).
Alternatively the JS might include some API key/credentials in the HTTP auth header (this should be explicitly visible in JS).
Although unlikely it could be that whatever API you're trying to query is trying to block reverse engineering attempts by blocking the Requests library's user agent string.
Luckily all of these differences can be easily examined with some print statements and your browser's developer tools :p.
Here is my scenario.
I have a lot of links. I want to know if any of them redirect to a different site (maybe a particular one) and only get those redirect URLs.(I want to preserve them for further scraping).
I don't want to get contents of webpage. I only want to get the link it redirects to. If there are multiple redirects, I may want to get the urls until say the 3rd redirect (So, that I'm not in a redirect loop).
How do I achieve this?
Can I do this in requests?
Requests seems to have a r.status, but it only works after fetching the page.
You can use requests.head(url, allow_redirects=True) which will only get the headers. If the response has the Location header it will follow the redirect and head the next url.
import requests
response = requests.head('http://httpbin.org/redirect/3', allow_redirects=True)
for redirect in response.history:
print(redirect.url)
print(response.url)
Output:
http://httpbin.org/redirect/3
http://httpbin.org/relative-redirect/2
http://httpbin.org/relative-redirect/1
http://httpbin.org/get
I do a request to 'someurl' and have redirected. The redirect link does't exist, but in the link I have access_token and others important data. I like to get this redirect url with out program crash.
a = opener.open('https://connect.ok.ru/oauth/authorize?client_id=1247511808&scope=VALUABLE_ACCESS&response_type=token&redirect_uri=https://smasu.ru')
So, redirect to site smasu.ru, but this site does't exist. The link will have type as:
https://smasu.ru/#access_token=8.7fd19d96afcfc687b92bd50e2df6011837b94753e09f315818c0328e9&session_secret_key=363b5dd2ab1a1a44c25e423e892732ce&permissions_granted=VALUABLE_ACCESS&expires_in=1800
Where you can see acces_token, session_secret_key, that I want take for my program. How to do a request with ignore HTTP Error and get the redirect url?
Traceback:
I have a public folder in Google Drive, in which I store pictures.
In Python, I am trying to detect if a picture with a particular name exist or not. I am using this code:
import urllib2
url = "http://googledrive.com/host/0B7K23HtYjKyBfnhYbkVyUld3YUVqSWgzWm1uMXdrMzQ0NlEwOXVUd3o0MWVYQ1ZVMlFSNms/0000.png"
resp = urllib2.urlopen(url)
print resp.getcode()
And even though there is no file with this name in this folder, this code is not throwing an exception and is printing "200" as the return code. I have checked in my browser and this URL (http://googledrive.com/host/0B7K23HtYjKyBfnhYbkVyUld3YUVqSWgzWm1uMXdrMzQ0NlEwOXVUd3o0MWVYQ1ZVMlFSNms/0000.png) does return a 404, after a few redirects.
Why doesn't urllib2 detect that this file actually doesn't exist?
When you make the request, your request goes to google's web servers and is processed there. If and only if google's servers were to return a 404, would you see a 404 on your end; urllub2 simply encapsulates the underlying handshaking and data transfer logic.
In this particular case, google's server side code requires the request to be authenticated, and your request url is simply unauthenticated. As such, the request is redirected to the login page, and since this is a valid existing page/response, urllib2 shows the correct code 200. You can get the same page if you open the link in a private window.
However, if you are authenticated and then open the url (basically logged into your gmail/googgle docs account), you would get the 404 error.
All,
Im getting a 400 redirect_url_mismatch error upon attempting to authenticate through google. I'm using python-socal-auth through a django application to achieve this process.
Everything works smoothly, until I get to the final stages of the process where I hit a redirect_uri_mismatch issue.
On google, I receive this message.
"The redirect URI in the request: http://localhost:8000/something/complete/google-oauth2/ did not match a registered redirect URI"
`Request Details
from_login=1
scope=https://www.googleapis.com/auth/userinfo.email https://www.googleapis.com/auth/userinfo.profile
response_type=code
redirect_uri=http://localhost:8000/something/complete/google-oauth2/
state=qT1RLLMa72F8NxFFubHwCVe3GgLDNcgZ
as=-55f896f3314b21af
pli=1
client_id=160177117398
authuser=0
hl=en`
Included below is a screenshot of the client ID's redirect URI.
What am I doing wrong?
Thanks!
One thing to note here is that redirect url should be exactly same till the last trailing slash. In my case, it was like http://localhot:8000/something/complete/google-oauth2
This should have been
http://localhot:8000/something/complete/google-oauth2/
This resulted in redirect_uri_mismatch.
Also define http: and https: in the console for redirect url because redirect url generated by social auth still sends http regardless of ssl setting in your server.
Make sure you added the social-auth in the urls.py as
url('', include('social.apps.django_app.urls', namespace='social')),
and on console.developers.google.com set the Authorized redirect URIs to http://localhost:8000/something/complete/google-oauth2/