I'm dead lost :)
Goal is to logon to a web site that uses OAuth2. However the section I need to run, doesn't have an API associated with it. So I need to login, just using the username and password, and then navigate to the page in question and do a screen scrape to get my data.
I'm sure the problem isn't at the web site it's sitting at this keyboard. But i've searched for examples and tried a whole bunch of guesses, but nothing is working
Help would be gratefully accepted.
import sys
import requests
import oauth2 as oauth
r = requests.get(logon_url)
consumer = oauth.Consumer(key=user, secret=password)
client = oauth.Client(consumer)
resp, content = client.request(r.url, "GET")
token_url = resp['content-location']
# At this point i'm lost i'm just guessing on the rest
# the next doesn't give an error but i'm sure it's wrong
resp2, content2 = client.request(token_url, 'GET')
# save the cookie, i do have a cookie but not sure what i have
auth_token = resp['set-cookie']
Like so many things, it's just a user error
code to get me to the page is so simple. And the following code does the trick. Thanks to Furas for the pointer.
with requests.session() as s1:
# get login form
r = s1.get(logon_url)
# post the username and password
resp = s1.post(r.url,data=payload)
# get the admin page
resp2 = s1.get(page_url)
Related
I have this webpage: https://www.dsbmobile.de. I would like to automate a bot that checks in and gets the newest file. I have an account and login credentials so I tried this with python:
import requests
url = "https://www.dsbmobile.de/Login.aspx?ReturnUrl=%2f"
payload = {'txtUser': 'username', 'txtPass': 'password'}
x = requests.post(url, data=payload)
print(x.text)
I get a result but its just the login page instead of the new page I should be ridirected to .
When looking at the source, I saw that there are hidden input fields such as "__EVENTVALIDATION"
do I need to send them too? Or maybe I need to set something into the header idk. It would be very nice if someone could tell me how to write the post request just like the browser sends it so that I get the right response
I am new but it would be extremely useful to me if I could automate that process.
Thank you very much for trying to help me
So im quite unsure how to explain my issue. So - I try to scrape a schedule page (of my school) to make It easier to read. Unfortunately i couldnt figure how to pass the creditals to the login prompt with python.
url = "https://www.diltheyschule.de/vertretungsplan/
or rather this one due to it contains the actual data.
url = https://www.diltheyschule.de/vertretungsplan/f1/subst_001.htm
I do know the password and username.
Login prompt looks like this :
As you might have guessed - i want to pass password and username to this prompt.
This code doesnt work for me - it returns unauthorized error.
import requests
session = requests.Session()
r = session.post("https://www.diltheyschule.de/vertretungsplan/",data={"log":"xxx","pwd":"xxx"})
#or
r = session.post("https://www.diltheyschule.de/vertretungsplan/f1/subst_001.htm",data={"log":"xxx","pwd":"xxx"})
print(r.content)
output
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>401 Unauthorized</title>
</head><body>
<h1>Unauthorized</h1>
<p>This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.</p>
<hr>
<address>Apache Server at www.diltheyschule.de Port 443</address>
</body></html>
prolly essential information :
the goal is to scrape 'https://www.diltheyschule.de/vertretungsplan/f1/subst_001.htm'
passing pwd and log to the prompt (most likely without gui support (e.g. selenium))
This directory is secured by basic auth authentication. This is the easiest method of authentication where you can log in with the appropriate headers.
Are you also sure that you want to use POST method for see what is in .html page?
Please, try this:
import requests
session = requests.Session()
r = session.get("https://www.diltheyschule.de/vertretungsplan/f1/subst_001.htm",auth=requests.auth.HTTPBasicAuth('user', 'pass'))
print(r.content)
Hi I have researched this but I can not find any answers this question. I need to download a sub directory of a web page to a string for a search, I know have to do this but the only problem is the site is encrypted and requires a login to acces the directory. I know I need to send the cookies to request the download but I am unsure how to do this. I am coding python. feel free to ask for more info.
import urllib
import urllib2
import cookielib
import time
# All your cookie related things are done by this.
cookie_jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
urllib2.install_opener(opener)
#POST Parameters for login page.
request_body_params = {'your_parameter_name': 'its_value', 'another_parameter_name': 'its_value'}
data_encoding = urllib.urlencode(request_body_params)
url_main = 'https://your_site.com/login'
main_request = urllib2.Request(url_main, data_encoding)
#Any headers required goes here.
main_request.add_header('Accept-encoding', 'gzip')
# This is the response of login. You don't want to read this.
main_response = urllib2.urlopen(main_request)
# You want data from this link.
url_results = 'https://your_site.com/sub_directory'
results_response = urllib2.urlopen(url_results)
print results_response.read()
To check the POST Parameters, go to the site from a browser, click on 'View Source', go to 'Network' in view source. Then as you login in the browser, there will be network logs generated, click on the link and check out it's POST Parameters and headers.
Environment - Python 2.7.3, webpy.
I'm trying a simple oauth 3 way authentication for github using Python web.py. Per the basic oauth guide on github I'm doing something like this:
import web,requests
import oauth2,pymongo,json
from oauth2client.client import OAuth2WebServerFlow
urls=('/', 'githublogin',
'/session','session',
'/githubcallback','githubCallback');
class githublogin:
def GET(self):
new_url = 'https://github.com/login/oauth/authorize'
pay_load = {'client_id': '',
'client_secret':'',
'scope':'gist'
}
headers = {'content-type': 'application/json'}
r = requests.get(new_url, params=pay_load, headers=headers)
return r.content
This is sending me to the GH login page. Once I sign in - GH is not redirecting me to the callback. The redirect_uri parameter is configured in the github application. I've double checked to make sure that's correct.
class githubCallback:
def POST(self):
data = web.data()
print data
def GET(self):
print "callback called"
Instead in the browser I see
http://<hostname>:8080/session
and a 404 message, because I haven't configured the session URL. That's problem no 1. Problem no 2 - If I configure the session URL and print out the post message
class session:
def POST(self):
data = web.data()
print data
def GET(self):
print "callback called"
I can see some data posted to the URL with something called 'authenticity_token'.
I've tried to use the python_oauth2 library but can't get past the authorization_url call. So I've tried this much simpler requests library. Can someone please point out to me whats going wrong here.
So here's how I solved this. Thanks to #Ivanzuzak for the requestb.in tip.
I'm using Python webpy.
import web,requests
import oauth2,json
urls=('/', 'githublogin',
'/githubcallback','githubCallback');
render = web.template.render('templates/')
class githublogin:
def GET(self):
client_id = ''
url_string = "https://github.com/login/oauth/authorize?client_id=" + client_id
return render.index(url_string)
class githubCallback:
def GET(self):
data = json.loads(json.dumps(web.input()))
print data['code']
headers = {'content-type': 'application/json'}
pay_load = {'client_id': '',
'client_secret':'',
'code' : data['code'] }
r = requests.post('https://github.com/login/oauth/access_token', data=json.dumps(pay_load), headers=headers)
token_temp = r.text.split('&')
token = token_temp[0].split('=')
access_token = token[1]
repo_url = 'https://api.github.com/user?access_token=' + access_token
response = requests.get(repo_url)
final_data = response.content
print final_data
app = web.application(urls,globals())
if __name__ == "__main__":
app.run()
I was not using a html file before, but sending the request directly from the githublogin class. That didn't work. Here I'm using a html to direct the user first from where he'll login to gh. With this I added a html and rendered it using the templator.
def with (parameter)
<html>
<head>
</head>
<body>
<p>Well, hello there!</p>
<p>We're going to now talk to the GitHub API. Ready? <a href=$parameter>Click here</a> to begin!</a></p>
<p>If that link doesn't work, remember to provide your own Client ID!</p>
</body>
</html>
This file is taken straight from the dev guide, with just the client_id parameter changed.
Another point to be noted is that in the requests.post method - passing the pay_load directly doesn't work. It has to be serialized using json.dumps.
I'm not sure what the problem is at your end, but try reproducing this flow below, first manually using the browser, and then using your python library. It will help you debug the issue.
create a request bin on http://requestb.in/. A request bin is basically a service that logs all HTTP requests sent to it. You will use this instead of the callback, to log what is being sent to the callback. Copy the URL of the request bin, which is something like http://requestb.in/123a546b
Go to your OAuth application setup on GitHub (https://github.com/settings/applications), enter the setup of your specific application, and set the Callback URL to the URL of the request bin you just created.
Make a request to the GitHub OAuth page, with the client_id defined. Just enter this URL below into your browser, but change the YOUR_CLIENT_ID_HERE to be the client id of your OAuth application:
https://github.com/login/oauth/authorize?client_id=YOUR_CLIENT_ID_HERE
Enter your username and password and click Authorize. The GitHub app will then redirect you to the request bin service you created, and the URL in the browser should be something like (notice the code query parameter):
http://requestb.in/YOUR_REQUEST_BIN_ID?code=GITHUB_CODE
(for example, http://requestb.in/abc1def2?code=123a456b789cdef)
Also, the content of the page in the browser should be "ok" (this is the content returned by the request bin service).
Go to the request bin page that you created and refresh it. You will now see a log entry for the HTTP GET request that the GitHub OAuth server sent you, together with all the HTTP headers. Basically, you will see there the same code parameter that is present in the URL that you were redirected to. If you get this parameter, you are now ready to make a POST request with this code and your client secret, as described in step 2 of the guide you are using: http://developer.github.com/v3/oauth/#web-application-flow
Let me know if any of these steps are causing problems for you.
I'm trying to create a mirror of specific moderator pages (i.e. restricted) of a subreddit on my own server, for transparency purposes. Unfortunately my python-fu is weak and after struggling a bit with the reddit API, its python wrapper and even some answers in here, I'm no closer to having a working solution.
So what I need to do is login to reddit with a specific user, access a moderator only page and copy its html to a file on my own server for others to access
The problem I'm running into is that the API and its wrapper is not very well documented so I haven't found if there's a way to retrieve a reddit page after logging in. If I can do that, then I could theoretically copy the result to a simple html page on my server.
When trying to do it outside the python API, I can't figure out how to use the built-in modules of python to login and then read a restricted page.
Any help appreciated.
I don't use PRAW so I'm not sure about that, but if I were to do what you wanted to do, I'd do something like: login, save the modhash, grab the HTML from the url of the place you want to go:
It also looks like it's missing some CSS or something when I save it, but it's recognizable enough as it is. You'll need the requests module, along with pprint and json
import requests, json
from pprint import pprint as pp2
#----------------------------------------------------------------------
def login(username, password):
"""logs into reddit, saves cookie"""
print 'begin log in'
#username and password
UP = {'user': username, 'passwd': password, 'api_type': 'json',}
headers = {'user-agent': '/u/STACKOVERFLOW\'s API python bot', }
#POST with user/pwd
client = requests.session()
r = client.post('http://www.reddit.com/api/login', data=UP)
#if you want to see what you've got so far
#print r.text
#print r.cookies
#gets and saves the modhash
j = json.loads(r.text)
client.modhash = j['json']['data']['modhash']
print '{USER}\'s modhash is: {mh}'.format(USER=username, mh=client.modhash)
#pp2(j)
return client
client = login(USER, PASSWORD)
#mod mail url
url = r'http://www.reddit.com/r/mod/about/message/inbox/'
r = client.get(url)
#here's the HTML of the page
pp2(r.text)