Can't get the html of a page python - python

So I have been trying to solve this for the past 3 days and just can't know why.
I'm trying to access the html of this site that requires login first.
I tried everyway I could and all return with the same problem.
Here is what I tried:
response = requests.get('https://de-legalization.tlscontact.com/eg/CAI/myapp.php', headers=headers, params=params, cookies=cookies)
print(response.content)
payload = {
'_token': 'TOKEN HERE',
'email': 'EMAIL HERE',
'pwd': 'PASSWORDHERE',
'client_token': 'CLIENT_TOKEN HERE'
}
with requests.session() as s:
r = s.post(login_url, data=payload)
print(r.text)
I also tried using URLLIB but they all return this:
<script>window.location="https://de-legalization.tlscontact.com/eg/CAI/index.php";</script>
Anyone knows why this is happening.
Also here is the url of the page I want the html of:
https://de-legalization.tlscontact.com/eg/CAI/myapp.php

You see this particular output because it is in fact the content of the page you are downloading.
You can test it in chrome by opening the following url:
view-source:https://de-legalization.tlscontact.com/eg/CAI/myapp.php
This is how it looks like in Chrome:
This is happening because you are being redirected by the javascript code on the page.
Since the page you are trying to access requires login, you cannot access it just by sending http request to the internal page.
You either need to extract all the cookies and add them to the python script.
Or you need to use a tool like Selenium that allows you to control a browser from your Python code.
Here you can find how to extract all the cookies from the browser session:
How to copy cookies in Google Chrome?
Here you can find how to add cookies to the http request in Python:
import requests
cookies = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
r = requests.post('http://wikipedia.org', cookies=cookies)

Related

Using Python requests cannot access a private website while it works by using a browser

I'm trying to download some (.csv) files from a private website with Python requests method.
I can access the website by using a browser. After typing in the url, it pops up a window to fill in username and password.
After that, it starts to download a (.csv) file.
However, it failed when I used Python requests method.
Here is my code.
import requests
# username and pwd in base64
b64_IDpass = '******'
tics_headers = {
"Host": 'http://tics-sign.com',
"Authorization": 'Basic {}'.format(b64_IDpass)
}
# company internet proxy
proxy = {'http': '*****'}
# url
url_get = 'http://tics-sign.com/getlist'
r = requests.get(url_get,
headers=tics_headers,
proxies=proxy)
print(r)
# <Response [404]>
I've checked the headers in a browser, there is no problem.
But why it returns <Response [404]> when using Python?
You need to post your password and username before you get the link.
So you could try this:
request.post("http://tics-sign.com", tics_headers)
And then get the info:
request.get(url_get, proxies=proxy)
This has worked for me in all the previous sites have scraped which need authentication.
The problem is that each site has a different way for accepting authentication. So it may
not even work.
It also may be that python is not getting redirected to http://tics-sign.com/displaypanel/login.aspx. curl didn't for me.
Edit:
I looked at the HTML source of your website and I came up with this:
login_data = {"logName": your_id, "pwd": your_password}
request.post(http://tics-sign.com/displaypanel/login.aspx, login_data)
r = request.get(url_get, proxies=proxy)
You can look at my blog for more info.

Remote login to decoupled website with python and requests

I am trying to login to a website www.seek.com.au. I am trying to test the possibility to remote login using Python request module. The site is Front end is designed using React and hence I don't see any form action component in www.seek.com.au/sign-in
When I run the below code, I see the response code as 200 indicating success, but I doubt if it's actually successful. The main concern is which URL to use in case if there is no action element in the login submit form.
import requests
payload = {'email': <username>, 'password': <password>}
url = 'https://www.seek.com.au'
with requests.Session() as s:
response_op = s.post(url, data=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
When i examine the output data (response_op.text), i see word 'Sign in' and 'Register' in output which indicate the login failed. If its successful, the users first name will be shown in the place. What am I doing wrong here ?
P.S: I am not trying to scrape data from this website but I am trying to login to a similar website.
Try this code:
import requests
payload={"email": "test#test.com", "password": "passwordtest", "rememberMe": True}
url = "https://www.seek.com.au:443/userapi/login"
with requests.Session() as s:
response_op = s.post(url, json=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
You are sending the request to the wrong url.
Hope this helps

Can't login into a webpage using python

I am a python newbie and I've been trying to scrap my University's Website to get a list of my grades with no luck so far.
I try to log in making a POST request with the form-data that's shown in the code and then make a GET request for request_URL. However when I try to print the requested url's content login url's content comes up. So it is pretty clear that I can't get past login page.
import requests
from bs4 import BeautifulSoup
login_URL = 'http://gram-web.ionio.gr/unistudent/login.asp'
request_URL = 'http://gram-web.ionio.gr/unistudent/stud_CResults.asp?studPg=1&mnuid=mnu3&'
payload = {
'userName': 'username',
'pwd': 'password',
'submit1': 'Login',
'loginTrue': 'login'
}
with requests.Session() as session:
post = session.post(login_URL, data=payload)
r = session.get(request_URL)
root = BeautifulSoup(r.text, "html.parser")
print(root)
I assume there is some kind of token value involved in the process because that's what the POST request looks like when I try to log in manually. Has anybody seen this before ? Is this the cause why I cannot log in ?
These are also the request headers.

python requests cannot get html

I tried to get html code from a site name dcinside in Korea, i am using requests but cannot get html code
and this is my code
import requests
url = "http://gall.dcinside.com/board/lists/?id=bitcoins&page=1"
req = requests.get(url)
print (req)
print (req.content)
but the result was
Why I cannot get html codes even using requests??
Most likely they are detecting that you are trying to crawl data dynamically, and not giving any content as a response. Try pretending to be a browser and passing some User-Agent headers.
headers = {
'User-Agent': 'My User Agent 1.0',
'From': 'youremail#domain.com'
}
response = requests.get(url, headers=headers)
# use authentic mozilla or chrome user-agent strings if this doesn't work
Take a look at this:
Python Web Crawlers and "getting" html source code
Like the guy said in the aforementioned post, you should use urllib2 which will allow you to easily obtain web resources.

Can't emulate browser behavior with requests

I'm trying to send a post request to a website to get a json response. I can see the json response in Chrome Inspector when I click on a link, but I can get it using requests.
Firstly I tried to used requests Session to get the cookies first and use them in the post request, to no avail.
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print response.text
Secondly I used Selenium+PhantomJS to get the cookies and used them in requests, no results!
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
all_cookie = {}
for cookie in browser.get_cookies():
all_cookie[cookie['name']] = cookie['value']
rep = requests.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997', cookies=all_cookie)
It only works when I manually take the cookies from Chrome.
I can't see what's the problem!
session = requests.Session()
session.get('http://www.auchandrive.fr/drive/pagestatique.pagetemplate.popuphandler.popinchangementmagasin.changermag/537?t:ac=PAGE_STATIQUE_ENGAGEMENTS')
response = session.post('http://www.auchandrive.fr/drive/rayon.productlist.pagination_0.topage/1?t:ac=3686973/3686997')
print(response.json)
Using the json attribute will fetch the JSON response. You can also use requests to make a persistent session, so the cookies are provided.
response.cookies #The cookies attribute

Categories

Resources