I'm trying to write a simple script to log into Wikipedia and perform some actions on my user page, using the Mediawiki api. However, I never seem to get past the first login request (from this page: https://en.wikipedia.org/wiki/Wikipedia:Creating_a_bot#Logging_in). I don't think the session cookie that I set is being sent. This is my code so far:
import Cookie, urllib, urllib2, xml.etree.ElementTree
url = 'https://en.wikipedia.org/w/api.php?action=login&format=xml'
username = 'user'
password = 'password'
user_data = [('lgname', username), ('lgpassword', password)]
#Login step 1
#Make the POST request
request = urllib2.Request(url)
data = urllib.urlencode(user_data)
login_raw_data1 = urllib2.urlopen(request, data).read()
#Parse the XML for the login information
login_data1 = xml.etree.ElementTree.fromstring(login_raw_data1)
login_tag = login_data1.find('login')
token = login_tag.attrib['token']
cookieprefix = login_tag.attrib['cookieprefix']
sessionid = login_tag.attrib['sessionid']
#Set the cookies
cookie = Cookie.SimpleCookie()
cookie[cookieprefix + '_session'] = sessionid
#Login step 2
request = urllib2.Request(url)
session_cookie_header = cookieprefix+'_session='+sessionid+'; path=/; domain=.wikipedia.org; HttpOnly'
request.add_header('Set-Cookie', session_cookie_header)
user_data.append(('lgtoken', token))
data = urllib.urlencode(user_data)
login_raw_data2 = urllib2.urlopen(request, data).read()
I think the problem is somewhere in the request.add_header('Set-Cookie', session_cookie_header) line, but I don't know for sure. How do I use these python libraries to send cookies in the header with every request (which is necessary for a lot of API functions).
The latest version of requests has support for sessions (as well as being really simple to use and generally great):
with requests.session() as s:
s.post(url, data=user_data)
r = s.get(url_2)
Related
So I have a username and password, and I have a ClientID. With JSON I can just create a header field to add these in and do a requests.post(endpoint, json=payload, headers=my_headers), but for SOAP the header fields are {content-type': 'application/soap+xml'} and auth= takes username and password. So what happens to my ClientID or where do I initialize it?
First I will start off with what I tried using Zeep for Python.
from requests.auth import HTTPBasicAuth
from requests import Session
from zeep.transports import Transport
from zeep import Client, Settings
#Client-ID
client_id = '7777'
username = 'jj'
password = 'tt'
#this endpoint ends with .svc, not sure if that matters but even if I were to add ?wsdl I get 400 error
wsdl = 'endpoint_url_.svc'
settings = Settings(strict=False, xml_huge_tree=True)
session = Session()
session.auth = HTTPBasicAuth(username, password)
client = Client(wsdl, settings=settings, # the Client field here doesn't take ClientID but my endpoint
transport=Transport(session=session))
So Zeep says to run python -mzeep wsdl to get soap methods but this is my output:
Prefixes:
xsd: http://www.w3.org/2001/XMLSchema
Global elements:
Global types:
xsd:anyType
xsd:ENTITIES
xsd:ENTITY
xsd:ID
xsd:IDREF
xsd:IDREFS
xsd:NCName
xsd:NMTOKEN
xsd:NMTOKENS
xsd:NOTATION
xsd:Name
xsd:QName
xsd:anySimpleType
xsd:anyURI
xsd:base64Binary
xsd:boolean
xsd:byte
xsd:date
xsd:dateTime
xsd:decimal
xsd:double
xsd:duration
xsd:float
xsd:gDay
xsd:gMonth
xsd:gMonthDay
xsd:gYear
xsd:gYearMonth
xsd:hexBinary
xsd:int
xsd:integer
xsd:language
xsd:long
xsd:negativeInteger
xsd:nonNegativeInteger
xsd:nonPositiveInteger
xsd:normalizedString
xsd:positiveInteger
xsd:short
xsd:string
xsd:time
xsd:token
xsd:unsignedByte
xsd:unsignedInt
xsd:unsignedLong
xsd:unsignedShort
Bindings:
So even if I do this, none of my methods show at service.
result = client.service.(some method)
Second attempt without Zeep. Now for some reason the soap data example for the body looks like this.
import requests
from requests.auth import HTTPBasicAuth
soap_body = '''
1
D
0
ex
5
'''
From examples and SOAP calls I've read and seen online it is missing envelope, body, and header, but let's assume this is the correct xml for making the post request.
headers = {'content-type': 'application/soap+xml'}
url = 'endpoint_url.svc'
username = 'jj'
password= 'tt'
client_id = '7777'
response = requests.post(url, data=soap_body, headers=headers, auth=HTTPBasicAuth(username, password))
print(response)
print(response.content)
Ouput I get
<Response [415]>
b''
The requests.post doesn't seem to take a ClientID field. How would I initialize this first? Any suggestions or recommendations appreciated. Thank you for your time.
I couldn't find this in a documentation anywhere, but the Client ID can be placed inside the headers as well. headers = {'content-type': 'application/soap+xml', 'Client-ID':'123'}
trying to send Post request with the cookies on my pc from get request
#! /usr/bin/python
import re #regex
import urllib
import urllib2
#get request
x = urllib2.urlopen("http://www.example.com) #GET Request
cookies=x.headers['set-cookie'] #to get the cookies from get request
url = 'http://example' # to know the values type any password to know the cookies
values = {"username" : "admin",
"passwd" : password,
"lang" : "" ,
"option" : "com_login",
"task" : "login",
"return" : "aW5kZXgucGhw" }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
result = response.read()
cookies=response.headers['set-cookie'] #to get the last cookies from post req in this variable
then i searched in google
how to send cookies inside same post request and found
opener = urllib2.build_opener() # send the cookies
opener.addheaders.append(('Cookie', cookies)) # send the cookies
f = opener.open("http://example")
but i don't exactly where should i type it in my code
what i need to do exactly is to
send GET request, put the cookies from the request in variable,then make post request with the value that i got from the GET request
if anyone know answer i need edit on my code
Just create a HTTP opener and a cookiejar handler. So cookies will be retrieved and will be passed together to next request automatically. See:
import urllib2 as net
import cookielib
import urllib
cookiejar = cookielib.CookieJar()
cookiejar.clear_session_cookies()
opener = net.build_opener(net.HTTPCookieProcessor(cookiejar))
data = urllib.urlencode(values)
request = net.Request(url, urllib.urlencode(data))
response = opener.open(request)
As opener is a global handler, just make any request and the previous cookies sent from previous request will be in the next request (POST/GET), automatically.
You should really look into the requests library python has to offer. All you need to do is make a dictionary for you cookies key/value pair and pass it is as an arg.
Your entire code could be replaced by
#import requests
url = 'http://example' # to know the values type any password to know the cookies
values = {"username" : "admin",
"passwd" : password,
"lang" : "" ,
"option" : "com_login",
"task" : "login",
"return" : "aW5kZXgucGhw" }
session = requests.Session()
response = session.get(url, data=values)
cookies = session.cookies.get_dict()
response = reqeusts.post(url, data=values, cookies=cookies)
The second piece of code is probably what you want, but depends on the format of the response.
I'm trying to access data saved by the user. And it keeps returning a 403 error with this being its api end point.
http://www.reddit.com/dev/api#GET_user_{username}_saved
I'm thoroughly confused what to send in my headers to make this request work and the reddit documentation has no mention of it at all. Help?
I'm using Python-requests library to do this.
Referring to line 686 in reddit's code in listingcontroller.py (here) :
if (where in ('saved', 'hidden') and not
((c.user_is_loggedin and c.user._id == vuser._id) or
c.user_is_admin)):
return self.abort403()
you can clearly see that you must be logged in as username or be an admin in order to get the saved or hidden data - otherwise you get a 403 error.
As #zenpoy already mentioned (and which you already know), you have to be logged in. Therefore, you should save the cookie, which you get as a response of a valid call to api/login. I've written some code, which logs a user in and retrieves all saved things:
import urllib
import urllib2
import cookielib
import json
login_url = 'https://ssl.reddit.com/api/login/'
saved_url = 'https://ssl.reddit.com/user/<username>/saved.json'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
def login(username, passwd):
values = {'user': username,
'api_type': 'json',
'passwd': passwd}
data = urllib.urlencode(values)
response = opener.open(login_url, data).read()
print json.loads(response)
def retrieve_saved(username):
url = saved_url.replace('<username>', username)
response = opener.open(url).read()
print json.loads(response)
login(<username>, <passwd>)
retrieve_saved(<username>)
I am trying to write a Python script to POST a multipart form to a site that requires authentication through CAS.
There are two approaches that both solve part of the problem:
The Python requests library works well for submitting multipart forms.
There is caslib, with a login function. It returns an OpenerDirector that can presumably be used for further requests.
Unfortunately, I can't figure out how to get a complete solution out what I have so far.
There are just some ideas from a couple hours of research; I am open to just about any solution that works.
Thanks for the help.
I accepted J.F. Sebastian's answer because I think it was closest to what I'd asked, but I actually wound up getting it to work by using mechanize, Python library for web browser automation.
import argparse
import mechanize
import re
import sys
# (SENSITIVE!) Authentication info
username = r'username'
password = r'password'
# Command line arguments
parser = argparse.ArgumentParser(description='Submit lab to CS 235 site (Winter 2013)')
parser.add_argument('lab_num', help='Lab submission number')
parser.add_argument('file_name', help='Submission file (zip)')
args = parser.parse_args()
# Go to login site
br = mechanize.Browser()
br.open('https://cas.byu.edu/cas/login?service=https%3a%2f%2fbeta.cs.byu.edu%2f~sub235%2fsubmit.php')
# Login and forward to submission site
br.form = br.forms().next()
br['username'] = username
br['password'] = password
br.submit()
# Submit
br.form = br.forms().next()
br['labnum'] = list(args.lab_num)
br.add_file(open(args.file_name), 'application/zip', args.file_name)
r = br.submit()
for s in re.findall('<h4>(.+?)</?h4>', r.read()):
print s
You could use poster to prepare multipart/form-data. Try to pass poster's opener to the caslib and use caslib's opener to make requests (not tested):
import urllib2
import caslib
import poster.encode
import poster.streaminghttp
opener = poster.streaminghttp.register_openers()
r, opener = caslib.login_to_cas_service(login_url, username, password,
opener=opener)
params = {'file': open("test.txt", "rb"), 'name': 'upload test'}
datagen, headers = poster.encode.multipart_encode(params)
response = opener.open(urllib2.Request(upload_url, datagen, headers))
print response.read()
You could write a Authentication Handler for Requests using caslib. Then you could do something like:
auth = CasAuthentication("url", "login", "password")
response = requests.get("http://example.com/cas_service", auth=auth)
Or if you're making tons of requests against the website:
s = requests.session()
s.auth = auth
s.post('http://casservice.com/endpoint', data={'key', 'value'}, files={'filename': '/path/to/file'})
After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?
Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text
For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml