I want to download an affymetrix annotation file. But it needs to log in first.
The log in page is https://www.affymetrix.com/estore/user/login.jsp
The file I want to download is:
http://www.affymetrix.com/Auth/analysis/downloads/na32/genotyping/GenomeWideSNP_6.na32.annot.db.zip
I have try some method but I cannot figure it out.
#
from requests import session
payload = {
'action': 'login',
'username': 'username', #This part should be changed
'password': 'password' #This part should be changed
}
with session() as c:
c.post('https://www.affymetrix.com/estore/user/login.jsp', data=payload)
request = c.get('http://www.affymetrix.com/Auth/analysis/downloads/na32/genotyping/GenomeWideSNP_6.na32.annot.db.zip')
print request.headers
print request.text
#
I also try urllib2,
import urllib, urllib2, cookielib
username = 'username'
password = 'password'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://www.affymetrix.com/estore/user/login.jsp', login_data)
resp = opener.open('http://www.affymetrix.com/Auth/analysis/downloads/na32/genotyping/GenomeWideSNP_6.na32.annot.db.zip')
resp.read()
Here's the URL that the information is getting posted to.
https://www.affymetrix.com/estore/user/login.jsp?_DARGS=/estore/user/login.jsp
here is the information that is being posted.
Related
Hi I am trying to login to an outlook web application using python web crawler but I am not getting through the login page. From what I noticed the site will redirect upon the get request and set a cookie; namely OutlookSession. Then the post request goes to the same url having this cookie and this is the reason I am using requests.Session().
This is my code:
import requests
URL = "https://mail.guc.edu.eg/owa"
username = "username"
password = "password"
s = requests.Session()
s.get(URL)
login_data={"username":username, "password":password}
r = s.post("https://mail.guc.edu.eg/owa", data=login_data)
To expand on A Magoon's answer, there happens to be three additional form fields that OWA expects. This is what worked for me:
import requests
owa_login_form_url = 'https://mail.yourdomain.com/owa'
user_name = 'user'
pwd = 'pwd'
flags = '4'
forcedownlevel = '0'
sess = requests.Session()
payload = {'username': user_name, 'password': pwd, 'destination': owa_login_form_url, 'flags': flags, 'forcedownlevel': forcedownlevel }
resp = sess.post(owa_login_form_url + '/auth.owa', data=payload)
It looks like the form posts to https://mail.guc.edu.eg/owa/auth.owa.
import requests
URL = "https://mail.guc.edu.eg/owa"
username = "username"
password = "password"
s = requests.Session()
s.get(URL)
login_data={"username":username, "password":password}
r = s.post("https://mail.guc.edu.eg/owa/auth.owa", data=login_data)
I am new to Python and Web Scraping and I am trying to write a very basic script that will get data from a webpage that can only be accessed after logging in. I have looked at a bunch of different examples but none are fixing the issue. This is what I have so far:
from bs4 import BeautifulSoup
import urllib, urllib2, cookielib
username = 'name'
password = 'pass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()
As of right now when I print the page it just prints the contents of the page as if I was not logged in. I think the issue has something to do with the way I am setting the cookies but I am really not sure because I do not fully understand what is happening with the cookie processor and its libraries.
Thank you!
Current Code:
import requests
import sys
EMAIL = 'usr'
PASSWORD = 'pass'
URL = 'https://connect.lehigh.edu/app/login'
def main():
# Start a session so we can have persistant cookies
session = requests.session(config={'verbose': sys.stderr})
# This is the form data that the page sends when logging in
login_data = {
'username': EMAIL,
'password': PASSWORD,
'LOGIN': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')
if __name__ == '__main__':
main()
You can use the requests module.
Take a look at this answer that i've linked below.
https://stackoverflow.com/a/8316989/6464893
I am using requests and cfscrape library to login to https://kissanime.to/Login
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=payload)
print(a.text)
print(a.status_code)
a.text gives me the same login page
a.status_code gives me 200
That means my login is not working at all. Am I missing something? According to chrome's network monitor, I should also get status code 302
POST DATA image:
I solved it using mechanicalsoup
Code:
import mechanicalsoup
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
# Creating cfscrape instance
self.r = cfscrape.create_scraper()
login_page = self.r.get(login_url)
# Creating a mechanicalsoup browser instance with
# response object of cfscrape
browser = mechanicalsoup.Browser(self.r)
soup = BeautifulSoup(login_page.text, 'html.parser')
# grab the login form
login_form = soup.find('form', {'id':'formLogin'})
# find login and password inputs
login_form.find('input', {'name': 'username'})['value'] = usr
login_form.find('input', {'name': 'password'})['value'] = pw
browser.submit(login_form, login_page.url)
This content is from Requests Documentation:
Many web services that require authentication accept HTTP Basic Auth. This is the simplest kind, and Requests supports it straight out of the box.
requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
You have to send the payload as JSON..
import requests,json
'''Login to website'''
def login(self, usr, pw):
login_url = 'https://kissanime.to/Login'
sess = requests.Session()
# login credentials
payload = {
'username': usr,
'password': pw,
'redirect': ''
}
# Creating cfscrape instance of the session
scraper_sess = cfscrape.create_scraper(sess)
a = scraper_sess.post(login_url, data=json.dumps(payload))
print(a.text)
print(a.status_code)
Reference: http://docs.python-requests.org/en/master/user/authentication/
I am trying to login to this website with the following code:
import requests
def ReadNeopets():
username = 'notarealone'
LoginUrl = 'http://www.neopets.com/login/'
with requests.Session() as s:
s.get(LoginUrl)
payload = {'destination': '', 'username': username, 'password': 'notrealeither'}
print('Logging in to the site.')
r = s.post(LoginUrl, data=payload)
print(r.content)
r = s.get('http://neopets.com/bank.phtml')
Text = r.content.decode()
print(r.cookies)
if username in Text:
print("Logged into my bank")
elif bytes("Sign up") in Text:
print("Failed to log in.")
return;
def main():
ReadNeopets()
if __name__ == "__main__":
main()
Unfortunately it seems that the post does not work and when checking for the presence of the username it fails as the login was not successful or not attempted?
I am not certain what is happening here and would like to understand better as I am trying to move away from urllib and urllib2.
I added the cookie print and the text print to help try and understand if there were any errors I could try to comprehend but unfortunately not.
The post request goes to a different url, once you change the data as below your login will be successful:
login_url = 'http://www.neopets.com/login.phtml'
with requests.Session() as s:
payload = {'destination': '', 'username': username, 'password': 'notrealeither'}
r = s.post(login_url, data=payload)
print(r.text)
If you want to see how a request works, open firebug or developer tools, for this request you can see it under the other tab:
I know my question may be is not really good. but as a person who is new with python I have a question:
I wrote a code with python that make me login to my page:
import urllib, urllib2, cookielib
email = 'myuser'
password = 'mypass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'email' : email, 'password' : password})
opener.open('http://test.com/signin', login_data)
resp = opener.open('http://test.com/dashboard')
print resp.read()
and now I an connected to my page....
This is my tamper data when I want to send message to site:
How can I send hello with python now?
could you possibly complete the code and tell me how it is done?
UPDATE
I changed my code like so:
import requests
url1 = 'http://test.com/signin'
data1 = {
'email': 'user',
'password': 'pass',
}
requests.post(url1, data=data1)
url2 = 'http://test.com/dashboard'
data2 = {
'post_temp_id': '61jm5by188',
'message': 'hello',
}
requests.post(url2, data=data2)
But no result
Thank you
Although you could start off using urllib, you'll be happier using requests. How to use the POST method:
import requests
resp = requests.post('http://test.com/dashboard', data={'post_temp_id': '61jm5by188', 'message': 'hello'})
Pretty simple, right? Dictionaries can be used to define headers, cookies, and whatever else you'd want to include in your request. Most requests will only need a single line of code.
EDIT1: I don't have a test.com account, but you may try using this script to test out the POST method. This website will echo what you submit in the form, and the script should get you the same response:
import requests
resp = requests.post('http://hroch486.icpf.cas.cz/cgi-bin/echo.pl',
data={'your_name': 'myname',
'fruit': ['Banana', 'Lemon', 'Plum']})
idx1 = resp.text.index('Parsed values')
idx2 = resp.text.index('No cookies')
print resp.text[idx1:idx2]
From the HTML you received, here's what you should see:
Parsed values</H2>
<UL>
<LI>fruit:
<UL compact type=square>
<LI>Banana
<LI>Lemon
<LI>Plum
</UL>
<LI>your_name = myname
</UL>
<H2>
EDIT2: How to use a session object:
from requests import Session
s = Session()
# Don't just copy this; set your data accordingly...
url1 = url2 = data1 = data2 = ...
resp1 = s.post(url1, data=data1)
resp2 = s.post(url2, data=data2)
The advantage of a session object is that it stores any cookies and headers from previous responses.