Retrieving only HTTP header without the content in Python [duplicate]

Retrieving only HTTP header without the content in Python [duplicate] - python

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do you send a HEAD HTTP request in Python?
I am using Python's urllib and urllib2 to do an automated login. I am also using HTTPCookieProcessor to automate the handling of the cookies. The code is somewhat like this:
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
# assuming the site expects 'user' and 'pass' as query params
p = urllib.urlencode( { 'username': 'me', 'password': 'mypass' } )
# perform login with params
f = o.open( 'http://www.mysite.com/login/', p )
data = f.read()
f.close()
# second request
t = o.open( 'http://www.mysite.com/protected/area/' )
data = t.read()
t.close()
Now, the point is that I don't want to waste bandwidth in downloading the contents of http://www.mysite.com/login/, since all I want to do is receive the cookies (which are there in the Headers). Also, the site redirects me to http://www.mysite.com/userprofile when I first login (that is, the f.geturl() = http://www.mysite.com/userprofile).
So is there any way that I can avoid fetching the content in the first request?
P.S. Please don't ask me why am I avoiding the small network usage of transferring the content. Although the content is small, I still don't want to download it.

Just send a HEAD request instad of a GET request. You can use Python's httplib to do that.
Something like this:
import httplib, urllib
creds = urllib.urlencode({ 'username': 'me', 'password': 'mypass' });
connection = httplib.HTTPConnection("www.mysite.com")
connection.request("HEAD", "/login/", creds)
response = connection.getresponse()
print response.getheaders()

Related

Python web scraping login

I am trying to login to a website using python.
The login URL is :
https://login.flash.co.za/apex/f?p=pwfone:login
and the 'form action' url is shown as :
https://login.flash.co.za/apex/wwv_flow.accept
When I use the ' inspect element' on chrome when logging in manually, these are the form posts that show up (pt_02 = password):
There a few hidden items that I'm not sure how to add into the python code below.
When I use this code, the login page is returned:
import requests
url = 'https://login.flash.co.za/apex/wwv_flow.accept'
values = {'p_flow_id': '1500',
'p_flow_step_id': '101',
'p_page_submission_id': '3169092211412',
'p_request': 'LOGIN',
'p_t01': 'solar',
'p_t02': 'password',
'p_checksum': ''
}
r = requests.post(url, data=values)
print r.content
How can I adjust this code to perform a login?
Chrome network:

This is more or less your script should look like. Use session to handle the cookies automatically. Fill in the username and password fields manually.
import requests
from bs4 import BeautifulSoup
logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"
posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'
with requests.Session() as s:
s.headers = {"User-Agent":"Mozilla/5.0"}
res = s.get(logurl)
soup = BeautifulSoup(res.text,"lxml")
values = {
'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],
'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],
'p_instance': soup.select_one("[name='p_instance']")['value'],
'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],
'p_request': 'LOGIN',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t01': 'username',
'p_arg_names': soup.select_one("[name='p_arg_names']")['value'],
'p_t02': 'password',
'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],
'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']
}
r = s.post(posturl, data=values)
print r.content

since I cannot recreate your case I can't tell you what exactly to change, but when I was doing such things I used Postman to intercept all requests my browser sends. So I'd install that, along with browser extension and then perform login. Then you can view the request in Postman, also view the response it received there, what's more it provides you with Python code of request too, so you could simply copy and use it then.
Shortly, use Pstman, perform login, clone their request.

download file from web service in python 3

I do see a few methods of downloading a file from HTTP/HTTPS in Python, but for all of these you need to know the exact URL. I'm trying to download from a web service and the URL has methods and post arguments that are sent in order to download the file, I can't figure out what the URL is to send. This is the code snippet:
url = 'https://www.example123.com'
params = { 'user' : 'username', 'pass' : 'password', 'method' : 'getproject', 'getPDF' : 'true' }
data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
request= urllib.request.Request(url, data)
response = urllib.request.urlopen(request)
xdata = response.read()
print(xdata)
The print statement looks as though it's reading the PDF, but I want to save it somewhere and can't find any way to do that? Here is the beginning of the print response:
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n12 0 obj\r<</Lin

You have to open a file and write to it. Right now, you are just storing it in a string variable.
with open('yourfile.pdf', 'w') as f:
f.write(xdata)

How to authenticate a site with Python using urllib2?

After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?

Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text

For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml

pass session cookies in http header with python urllib2?

I'm trying to write a simple script to log into Wikipedia and perform some actions on my user page, using the Mediawiki api. However, I never seem to get past the first login request (from this page: https://en.wikipedia.org/wiki/Wikipedia:Creating_a_bot#Logging_in). I don't think the session cookie that I set is being sent. This is my code so far:
import Cookie, urllib, urllib2, xml.etree.ElementTree
url = 'https://en.wikipedia.org/w/api.php?action=login&format=xml'
username = 'user'
password = 'password'
user_data = [('lgname', username), ('lgpassword', password)]
#Login step 1
#Make the POST request
request = urllib2.Request(url)
data = urllib.urlencode(user_data)
login_raw_data1 = urllib2.urlopen(request, data).read()
#Parse the XML for the login information
login_data1 = xml.etree.ElementTree.fromstring(login_raw_data1)
login_tag = login_data1.find('login')
token = login_tag.attrib['token']
cookieprefix = login_tag.attrib['cookieprefix']
sessionid = login_tag.attrib['sessionid']
#Set the cookies
cookie = Cookie.SimpleCookie()
cookie[cookieprefix + '_session'] = sessionid
#Login step 2
request = urllib2.Request(url)
session_cookie_header = cookieprefix+'_session='+sessionid+'; path=/; domain=.wikipedia.org; HttpOnly'
request.add_header('Set-Cookie', session_cookie_header)
user_data.append(('lgtoken', token))
data = urllib.urlencode(user_data)
login_raw_data2 = urllib2.urlopen(request, data).read()
I think the problem is somewhere in the request.add_header('Set-Cookie', session_cookie_header) line, but I don't know for sure. How do I use these python libraries to send cookies in the header with every request (which is necessary for a lot of API functions).

The latest version of requests has support for sessions (as well as being really simple to use and generally great):
with requests.session() as s:
s.post(url, data=user_data)
r = s.get(url_2)

count cookies in python

I am struggling with python ,I want to Write a Python script that creates a cookie and Counts how many times the cookie is called during a session
I just have tried as per below:
import Cookie
import os
if os.environ.has_key('HTTP_COOKIE'):
cookie=SimpleCookie(os.environ['HTTP_COOKIE'])
cookie=SimpleCookie()
for key in initialvalues.keys():
if not cookie.has_key(key):
cookie[key]=intialvalues[key]
return cookie
if __name__=='__main__':
c=getCookie({'counter':0})
c['counter']=int(c['counter'].value)+1
print c
But I know it is wrong, can someone help me to write down the script?
Any help would be appreciated

I'm confused by your question. What I believe you want to do is request some webpage and count how many times your cookie was found? You can gather cookies using a CookieJar:
import urllib, urllib2, cookielib
url = "http://example.com/cookies"
form_data = {'username': '', 'password': '' }
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
form_data = urllib.urlencode(form_data)
# data returned from this pages contains redirection
resp = opener.open(url, form_data)
print resp.read()
for cookie in jar:
# Look for your cookie

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Retrieving only HTTP header without the content in Python [duplicate] - python

Related

Python web scraping login

download file from web service in python 3

How to authenticate a site with Python using urllib2?

pass session cookies in http header with python urllib2?

count cookies in python

Categories

Resources