Using Python, I'm trying to submit a form to a url and get a response.. This is what I'm doing:
import urllib, urllib2
data = {
'date': '300186',
'search_type': 'state',
'search_state': 'NY',
}
req = urllib2.Request(
url='https://services.aamc.org/20/mcat/findsite',
data=urllib.urlencode(data),
headers={"Content-type": "application/x-www-form-urlencoded"}
)
response = urllib2.urlopen(req)
print(response.read())
However, I'm getting this:
<script>location.replace('https://services.aamc.org/20/mcat');</script>
Which I guess simply means a redirection to the main page... Did I miss something, or is the AAMC website doing this on purpose..?
Thanks
EDIT:
So I'm basically trying to connect to url "https://services.aamc.org/20/mcat/findsite/findexam?date=3001816search_type=state&search_state=NY"
and this works fine when I enter this on my browser.. So I guess there's nothing wrong with the query
I suppose that you have already log into the site from your browser. So the site may have put a cookie in your browser to identify you on next attemp an automatically log you (as do stackoverflow.com). But when you send the request from the python script, there is nothing to identify you, and you are redirected to login page (I tried the URL you show and was ...).
You will have to do the connection in your script to pass the login page. But for it to work, you will have to add a HttpCookieProcessor handler to an Opener
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
then use
response = opener.open(req)
instead of
response = urllib2.urlopen(req)
Related
So I have been trying to solve this for the past 3 days and just can't know why.
I'm trying to access the html of this site that requires login first.
I tried everyway I could and all return with the same problem.
Here is what I tried:
response = requests.get('https://de-legalization.tlscontact.com/eg/CAI/myapp.php', headers=headers, params=params, cookies=cookies)
print(response.content)
payload = {
'_token': 'TOKEN HERE',
'email': 'EMAIL HERE',
'pwd': 'PASSWORDHERE',
'client_token': 'CLIENT_TOKEN HERE'
}
with requests.session() as s:
r = s.post(login_url, data=payload)
print(r.text)
I also tried using URLLIB but they all return this:
<script>window.location="https://de-legalization.tlscontact.com/eg/CAI/index.php";</script>
Anyone knows why this is happening.
Also here is the url of the page I want the html of:
https://de-legalization.tlscontact.com/eg/CAI/myapp.php
You see this particular output because it is in fact the content of the page you are downloading.
You can test it in chrome by opening the following url:
view-source:https://de-legalization.tlscontact.com/eg/CAI/myapp.php
This is how it looks like in Chrome:
This is happening because you are being redirected by the javascript code on the page.
Since the page you are trying to access requires login, you cannot access it just by sending http request to the internal page.
You either need to extract all the cookies and add them to the python script.
Or you need to use a tool like Selenium that allows you to control a browser from your Python code.
Here you can find how to extract all the cookies from the browser session:
How to copy cookies in Google Chrome?
Here you can find how to add cookies to the http request in Python:
import requests
cookies = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
r = requests.post('http://wikipedia.org', cookies=cookies)
I am trying to login to a website www.seek.com.au. I am trying to test the possibility to remote login using Python request module. The site is Front end is designed using React and hence I don't see any form action component in www.seek.com.au/sign-in
When I run the below code, I see the response code as 200 indicating success, but I doubt if it's actually successful. The main concern is which URL to use in case if there is no action element in the login submit form.
import requests
payload = {'email': <username>, 'password': <password>}
url = 'https://www.seek.com.au'
with requests.Session() as s:
response_op = s.post(url, data=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
When i examine the output data (response_op.text), i see word 'Sign in' and 'Register' in output which indicate the login failed. If its successful, the users first name will be shown in the place. What am I doing wrong here ?
P.S: I am not trying to scrape data from this website but I am trying to login to a similar website.
Try this code:
import requests
payload={"email": "test#test.com", "password": "passwordtest", "rememberMe": True}
url = "https://www.seek.com.au:443/userapi/login"
with requests.Session() as s:
response_op = s.post(url, json=payload)
# print the response status code
print(response_op.status_code)
print(response_op.text)
You are sending the request to the wrong url.
Hope this helps
I am a python newbie and I've been trying to scrap my University's Website to get a list of my grades with no luck so far.
I try to log in making a POST request with the form-data that's shown in the code and then make a GET request for request_URL. However when I try to print the requested url's content login url's content comes up. So it is pretty clear that I can't get past login page.
import requests
from bs4 import BeautifulSoup
login_URL = 'http://gram-web.ionio.gr/unistudent/login.asp'
request_URL = 'http://gram-web.ionio.gr/unistudent/stud_CResults.asp?studPg=1&mnuid=mnu3&'
payload = {
'userName': 'username',
'pwd': 'password',
'submit1': 'Login',
'loginTrue': 'login'
}
with requests.Session() as session:
post = session.post(login_URL, data=payload)
r = session.get(request_URL)
root = BeautifulSoup(r.text, "html.parser")
print(root)
I assume there is some kind of token value involved in the process because that's what the POST request looks like when I try to log in manually. Has anybody seen this before ? Is this the cause why I cannot log in ?
These are also the request headers.
I have a script for Python 2 to login into a webpage and then move inside to reach a couple of files pointed to on the same site, but different pages. Python 2 let me open the site with my credentials and then create a opener.open() to keep the connection available to navigate to the other pages.
Here's the code that worked in Python 2:
$Your admin login and password
LOGIN = "*******"
PASSWORD = "********"
ROOT = "https:*********"
#The client have to take care of the cookies.
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
#POST login query on '/login_handler' (post data are: 'login' and 'password').
req = urllib2.Request(ROOT + "/login_handler",
urllib.urlencode({'login': LOGIN,
'password': PASSWORD}))
opener.open(rep)
#Set the right accountcode
for accountcode, queues in QUEUES.items():
req = urllib2.Request(ROOT + "/switch_to" + accountcode)
opener.open(req)
I need to do the same thing in Python 3. I have tried with request module and urllib, but although I can establish the first login, I don't know how to keep the opener to navigate the site. I found the OpenerDirector but it seems like I don't know how to do it, because I haven't reached my goal.
I have used this Python 3 code to get the result desired but unfortunately I can't get the csv file to print it.
enter image description here
Question: I don't know how to keep the opener to navigate the site.
Python 3.6ยป Documentation urllib.request.build_opener
Use of Basic HTTP Authentication:
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.example.com/login.html')
csv_content = f.read()
Use python requests library for python 3 and session.
http://docs.python-requests.org/en/master/user/advanced/#session-objects
Once you login your session will be automatically managed. You dont need to create your own cookie jar. Following is the sample code.
s = requests.Session()
auth={"login":LOGIN,"pass":PASS}
url=ROOT+/login_handler
r=s.post(url, data=auth)
print(r.status_code)
for accountcode, queues in QUEUES.items():
req = s.get(ROOT + "/switch_to" + accountcode)
print(req.text) #response text
I trying to login on a website and do automated clean-up jobs.
The site where I need to login is : http://site.com/Account/LogOn
I tried various codes that I found it on Stack, like Login to website using python (but Im stuck on this line
session = requests.session(config={'verbose': sys.stderr})
where my JetBeans doesnt like 'verbose' telling me that i need to do something, but doesnt explain exactly what).
I also tried this: Browser simulation - Python, but no luck with this too.
Can anyone help me? All answers will be appreciate. Thanks in advance.
PS: I started learning Python 2 weeks ago so please elaborate your answer for my "pro" level of undersanding :)
-------------------------UPDATE:-----------------------------
I manage to login, but when I'm trying to move on other page and push a button, it says Please Log in!
I use this code:
url = 'http://site.com/Account/LogOn'
values = {'UserName': 'user',
'Password': 'pass'}
data = urllib.urlencode(values)
cookies = cookielib.CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cookies))
response = opener.open(url, data)
the_page = response.read()
http_headers = response.info()
print response
After I log in I need to swith a menu value, that looks like this in HTML:
<select id="menu_uid" name="menu_uid" onchange="swapTool()" style="font-size:8pt;width:120px;">
<option value="1" selected>MyProfile</option>
...
<option value="6" >DeleteTree</option>
but I also can do it directly if I form a URL like this:
http://site.com/Account/management.html?Category=6&deltreeid=6&do=Delete+Tree
So , how can I build this URL and submit it? Thanks again!
Save yourself a lot of headache and use requests:
url = 'http://site.com/Account/LogOn'
values = {'UserName': 'user',
'Password': 'pass'}
r = requests.post(url, data=values)
# Now you have logged in
params = {'Category': 6, 'deltreeid': 6, 'do': 'Delete Tree'}
url = 'http://site.com/Account/management.html'
# sending cookies as well
result = requests.get(url, data=params, cookies=r.cookies)
Well 1st things
it sends a POST request to /Account/LogOn.
The fields are called UserName and Password.
Then you can use python's httplib to do HTTP requests
http://docs.python.org/2/library/httplib.html
(There is an example in the end on how to do a POST).
Then you will get a response containing a session cookie probably, within a HTTP header. You need to store that cookie in a variable and send it in all the subsequent requests to be authenticated.