No Content-Disposition header in response from mechanize

No Content-Disposition header in response from mechanize - python

I have to download the csv file from AdWords Billing page during the Celery task. And I have no idea what's wrong with my implementation, so need your help.
Log in:
browser = mechanize.Browser()
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = g_email
browser['Passwd'] = g_password
browser.submit()
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
It's OK, I'm on the billing page now. Next, I've parsed the result page for token and ids, analyzed request headers and action url in Chrome debugger and now I want to make POST request and receive my csv file. Response headers (in Chrome) are:
content-disposition:attachment; filename="myclientcenter.csv.gz"
content-length:307479
content-type:application/x-gzip; charset=UTF-8
With mechanize:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
browser.addheaders = [
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('content-type', 'application/x-www-form-urlencoded'),
("accept-encoding", "gzip,deflate,sdch"),
('user-agent', "Mozilla/5.0"),
('referer', "https://adwords.google.com/mcm/Mcm?__u=8183865359&__c=3069937889"),
('origin', "https://adwords.google.com"),
]
browser.set_handle_refresh(True)
browser.set_debug_responses(True)
browser.set_debug_redirects(True)
browser.set_handle_referer(True)
browser.set_debug_http(True)
browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
response = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary/',
data='&'.join(['='.join(pair) for pair in data.items()]),
)
BUT! The Content-Length header is 0 and no Content-Disposition in this response. Why? And what can I do to make it work?
Had a try to use Requests, but could not even pass the login stage...

I have the answer on my own question now (thanks for my team lead).
The main mistake is in this incorrect request data:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
Let's try again, with proper solution.
# Open Google login page and log in.
browser = mechanize.Browser()
try:
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = 'email#adwords.login'
browser['Passwd'] = 'password'
browser.submit()
except HTTPError:
raise AdWordsException("Can't find the Google login form")
We are logged in now and can go deeper.
try:
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
except HTTPError:
raise AdWordsException("Can't open AdWords dashboard page")
# Welcome to the AdWords billing dashboard. We can get
# session-unique token from this page for the further POST-request
token_re = re.search(r"token:\'(.{41})\'", billing_resp.read())
if token_re is None:
raise AdWordsException("Can't parse the token")
# It's time for some magic now. We have to construct proper mcsSelector
# serialized data structure. This is GWT-RPC wire protocol hell.
# Paste your specific version from web debugger.
MCS_TEMPLATE = (
"7|0|49|https://adwords.google.com/mcm/gwt/|18FBB090A5C26E56AC16C9DF0689E720|"
"com.google.ads.api.services.common.selector.Selector/1054041135|"
"com.google.ads.api.services.common.date.DateRange/1118087507|"
"com.google.ads.api.services.common.date.Date/373224763|"
"java.util.ArrayList/4159755760|java.lang.String/2004016611|ClientName|"
"ExternalCustomerId|PrimaryUserLogin|PrimaryCompanyName|IsManager|"
"SalesChannel|Tier|AccountSettingTypes|Labels|Alerts|CostWithCurrency|"
"CostUsd|Clicks|Impressions|Ctr|Conversions|ConversionRate|SearchCtr|"
"ContentCtr|BudgetAmount|BudgetStartDate|BudgetEndDate|BudgetPercentSpent|"
"BudgetType|RemainingBudget|ClientDateTimeZoneId|"
"com.google.ads.api.services.common.selector.OrderBy/524388450|"
"SearchableData|"
"com.google.ads.api.services.common.sorting.SortOrder/2037387810|"
"com.google.ads.api.services.common.pagination.Paging/363399854|"
"com.google.ads.api.services.common.selector.Predicate/451365360|"
"SeedObfuscatedCustomerId|"
"com.google.ads.api.services.common.selector.Predicate$Operator/2293561107|"
"java.util.Arrays$ArrayList/2507071751|[Ljava.lang.String;/2600011424|"
"3069937889|ExcludeSeeds|true|ClientTraversal|DIRECT|"
"com.google.ads.api.services.common.selector.Summary/3224078220|included|1|"
"2|3|4|5|"
"{report_date}|5|{report_date}" # take a note of this
"|6|26|7|8|7|9|7|10|7|11|7|12|7|13|7|14|7|15|7|16|7|17|7|18|7|19|7|20|7|21|"
"7|22|7|23|7|24|7|25|7|26|7|27|7|28|7|29|7|30|7|31|7|32|7|33|6|0|0|0|6|2|34|"
"35|36|0|34|9|-35|37|100|0|6|0|6|3|38|39|40|2|41|42|1|43|38|44|40|0|41|42|1|"
"45|38|46|-45|41|42|1|47|0|0|6|0|6|1|48|6|0|49|6|0|0|"
)
# To take stats for today
report_date = datetime.date.today()
mcs_selector = MCS_TEMPLATE.format(
report_date='%s|%s|%s' % (
report_date.day,
report_date.month,
report_date.year
),
)
data = urllib.urlencode({
'token': token_re.group(1),
'mcsSelector': mcs_selector,
})
# And... it finally works! Token and proper mcsSelector is all we need.
# POST-request with this data returns zipped csv file for us with
# current balance state and another info that's not available via AdWords API
zipped_csv = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary',
data=data
)
# Unpack it and use as you wish.
with gzip.GzipFile(mode='r', fileobj=zipped_csv) as csv_io:
try:
csv = StringIO.StringIO(csv_io.read())
except IOError:
raise AdWordsException("Can't get CSV file from response")
finally:
browser.close()

Related

Python - urllib - Server is not getting everything after '?' in URL

I am writing a Python3 script to trigger an Axis camera via their HTTP API. The string they need in the url to trigger the Digital Input (with authorization) is:
"http://{ip}/axis-cgi/param.cgi?action=update&IOPort.I0.Input.Trig={open/closed}"
If I put this in a browser it works AKA response 200 - OK. When I run this on my Linux machine using the urllib request.
#encode user,pass
values = { 'username': username,'password': password }
data = urllib.urlencode(values)
#Axis param.cgi action
action = "update"
trig = "closed"
cgi_args = {"action": action, "IOPort.I0.Input.Trig": trig}
cgi_data = urllib.urlencode(cgi_args)
url = "http://{ip}/axis-cgi/param.cgi?{data}".format(ip=ip, data=cgi_data)
req = urllib2.Request(url, data, headers={'Content-type': 'text/html'})
print(req.get_full_url())
response = urllib2.urlopen(req)
result = response.read()
print (result)
The output is:
http://192.168.50.191/axis-cgi/param.cgi?action=update&IOPort.I0.Input.Trig=closed
action must be specified
I know that I am authenticated otherwise I get Unauthorized response from server.
As you can see for debugging I print the req.get_full_url() to make sure I've built the url string correctly, however the server responds with action must be specified which is what I get in the browser when I just have http://192.168.50.191/axis-cgi/param.cgi? in the address bar. So the action portion of the URL appears to not be making it to the server.
I've tried:
Using %3F as the ? character, but I get 404 error
Embedding the data in the data parameter does not work either
Anything I am missing here?

used pycurl with digest authorization for a GET request with auth:
def configCurl():
username = "user"
password = "pass"
auth_mode = pycurl.HTTPAUTH_DIGEST
curl = pycurl.Curl()
curl.setopt(pycurl.HTTPAUTH, auth_mode)
curl.setopt(pycurl.USERPWD, "{}:{}".format(username, password))
curl.setopt(pycurl.WRITEFUNCTION, lambda x: None)
return curl
curl = configCurl()
curl.setopt(curl.URL, url)
curl.perform()
curl.close()

Dropbox /delta ignoring cursor

I'm trying to list files on Dropbox for Business.
The Dropbox Python SDK does not support Dropbox for Business so I'm using the Python requests module to send POST requests to https://api.dropbox.com/1/delta directly.
In the following function there are repeated calls to Dropbox /delta, each of which should get a list of files along with a cursor.
The new cursor is then sent with the next request to get the next list of files.
BUT it always get the same list. It is as though Dropbox is ignoring the cursor that I am sending.
How can I get Dropbox to recognise the cursor?
def get_paths(headers, paths, member_id, response=None):
"""Add eligible file paths to the list of paths.
paths is a Queue of files to download later
member_id is the Dropbox member id
response is an example response payload for unit testing
"""
headers['X-Dropbox-Perform-As-Team-Member'] = member_id
url = 'https://api.dropbox.com/1/delta'
has_more = True
post_data = {}
while has_more:
# If ready-made response is not supplied, poll Dropbox
if response is None:
logging.debug('Requesting delta with {}'.format(post_data))
r = requests.post(url, headers=headers, json=post_data)
# Raise an exception if status is not OK
r.raise_for_status()
response = r.json()
# Set cursor in the POST data for the next request
# FIXME: fix cursor setting
post_data['cursor'] = response['cursor']
# Iterate items for possible adding to file list [removed from example]
# Stop looping if no more items are available
has_more = response['has_more']
# Clear the response
response = None
The full code is at https://github.com/blokeley/dfb/blob/master/dfb.py
My code seems very similar to the official Dropbox blog example, except that they use the SDK which I can't because I'm on Dropbox for Business and have to send additional headers.
Any help would be greatly appreciated.

It looks like you're sending a JSON-encoded body instead of a form-encoded body.
I think just change json to data in this line:
r = requests.post(url, headers=headers, data=post_data)
EDIT Here's some complete working code:
import requests
access_token = '<REDACTED>'
member_id = '<REDACTED>'
has_more = True
params = {}
while has_more:
response = requests.post('https://api.dropbox.com/1/delta', data=params, headers={
'Authorization': 'Bearer ' + access_token,
'X-Dropbox-Perform-As-Team-Member': member_id
}).json()
for entry in response['entries']:
print entry[0]
has_more = response['has_more']
params['cursor'] = response['cursor']

How to authenticate a site with Python using urllib2?

After much reading here on Stackoverflow as well as the web I'm still struggling with getting things to work.
My challenge: to get access to a restricted part of a website for which I'm a member using Python and urllib2.
From what I've read the code should be like this:
mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
url = 'http://www.domain.com'
mgr.add_password(None, url, 'username', 'password')
handler = urllib2.HTTPBasicAuthHandler(mgr)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen('http://www.domain.com/restrictedpage')
page = response.read()
print page.geturl()
except IOError, e:
print e
The print doesn't print "http://www.domain.com/restrictedpage", but shows "http://www.domain.com/login" so my credentials aren't stored/processed and I'm being redirected.
How can I get this to work? I've been trying for days and keep hitting the same dead ends. I've tried all the examples I could find to no avail.
My main question is: what's needed to authenticate to a website using Python and urllib2?
Quick question: what am I doing wrong?

Check first manually what is really happening when you are successfully authenticated (instructions with Chrome):
Open develper tools in Chrome (Ctrl + Shift + I)
Click Network tab
Go and do the authentication manually (go the the page, type user + passwd + submit)
check the POST method in the Network tab of the developer tools
check the Request Headers, Query String Parameters and Form Data. There you find all the information needed what you need to have in your own POST.
Then install "Advanced Rest Client (ARC)" Chrome extension
Use the ARC to construct a valid POST for authentication.
Now you know what to have in your headers and form data. Here's a sample code using Requests that worked for me for one particular site:
import requests
USERNAME = 'user' # put correct usename here
PASSWORD = 'password' # put correct password here
LOGINURL = 'https://login.example.com/'
DATAURL = 'https://data.example.com/secure_data.html'
session = requests.session()
req_headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
formdata = {
'UserName': USERNAME,
'Password': PASSWORD,
'LoginButton' : 'Login'
}
# Authenticate
r = session.post(LOGINURL, data=formdata, headers=req_headers, allow_redirects=False)
print r.headers
print r.status_code
print r.text
# Read data
r2 = session.get(DATAURL)
print "___________DATA____________"
print r2.headers
print r2.status_code
print r2.text

For HTTP Basic Auth you can refer this : http://www.voidspace.org.uk/python/articles/authentication.shtml

pass session cookies in http header with python urllib2?

I'm trying to write a simple script to log into Wikipedia and perform some actions on my user page, using the Mediawiki api. However, I never seem to get past the first login request (from this page: https://en.wikipedia.org/wiki/Wikipedia:Creating_a_bot#Logging_in). I don't think the session cookie that I set is being sent. This is my code so far:
import Cookie, urllib, urllib2, xml.etree.ElementTree
url = 'https://en.wikipedia.org/w/api.php?action=login&format=xml'
username = 'user'
password = 'password'
user_data = [('lgname', username), ('lgpassword', password)]
#Login step 1
#Make the POST request
request = urllib2.Request(url)
data = urllib.urlencode(user_data)
login_raw_data1 = urllib2.urlopen(request, data).read()
#Parse the XML for the login information
login_data1 = xml.etree.ElementTree.fromstring(login_raw_data1)
login_tag = login_data1.find('login')
token = login_tag.attrib['token']
cookieprefix = login_tag.attrib['cookieprefix']
sessionid = login_tag.attrib['sessionid']
#Set the cookies
cookie = Cookie.SimpleCookie()
cookie[cookieprefix + '_session'] = sessionid
#Login step 2
request = urllib2.Request(url)
session_cookie_header = cookieprefix+'_session='+sessionid+'; path=/; domain=.wikipedia.org; HttpOnly'
request.add_header('Set-Cookie', session_cookie_header)
user_data.append(('lgtoken', token))
data = urllib.urlencode(user_data)
login_raw_data2 = urllib2.urlopen(request, data).read()
I think the problem is somewhere in the request.add_header('Set-Cookie', session_cookie_header) line, but I don't know for sure. How do I use these python libraries to send cookies in the header with every request (which is necessary for a lot of API functions).

The latest version of requests has support for sessions (as well as being really simple to use and generally great):
with requests.session() as s:
s.post(url, data=user_data)
r = s.get(url_2)

Python form POST using urllib2 (also question on saving/using cookies)

I am trying to write a function to post form data and save returned cookie info in a file so that the next time the page is visited, the cookie information is sent to the server (i.e. normal browser behavior).
I wrote this relatively easily in C++ using curlib, but have spent almost an entire day trying to write this in Python, using urllib2 - and still no success.
This is what I have so far:
import urllib, urllib2
import logging
# the path and filename to save your cookies in
COOKIEFILE = 'cookies.lwp'
cj = None
ClientCookie = None
cookielib = None
logger = logging.getLogger(__name__)
# Let's see if cookielib is available
try:
import cookielib
except ImportError:
logger.debug('importing cookielib failed. Trying ClientCookie')
try:
import ClientCookie
except ImportError:
logger.debug('ClientCookie isn\'t available either')
urlopen = urllib2.urlopen
Request = urllib2.Request
else:
logger.debug('imported ClientCookie succesfully')
urlopen = ClientCookie.urlopen
Request = ClientCookie.Request
cj = ClientCookie.LWPCookieJar()
else:
logger.debug('Successfully imported cookielib')
urlopen = urllib2.urlopen
Request = urllib2.Request
# This is a subclass of FileCookieJar
# that has useful load and save methods
cj = cookielib.LWPCookieJar()
login_params = {'name': 'anon', 'password': 'pass' }
def login(theurl, login_params):
init_cookies();
data = urllib.urlencode(login_params)
txheaders = {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
try:
# create a request object
req = Request(theurl, data, txheaders)
# and open it to return a handle on the url
handle = urlopen(req)
except IOError, e:
log.debug('Failed to open "%s".' % theurl)
if hasattr(e, 'code'):
log.debug('Failed with error code - %s.' % e.code)
elif hasattr(e, 'reason'):
log.debug("The error object has the following 'reason' attribute :"+e.reason)
sys.exit()
else:
if cj is None:
log.debug('We don\'t have a cookie library available - sorry.')
else:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
# save the cookies again
cj.save(COOKIEFILE)
#return the data
return handle.read()
# FIXME: I need to fix this so that it takes into account any cookie data we may have stored
def get_page(*args, **query):
if len(args) != 1:
raise ValueError(
"post_page() takes exactly 1 argument (%d given)" % len(args)
)
url = args[0]
query = urllib.urlencode(list(query.iteritems()))
if not url.endswith('/') and query:
url += '/'
if query:
url += "?" + query
resource = urllib.urlopen(url)
logger.debug('GET url "%s" => "%s", code %d' % (url,
resource.url,
resource.code))
return resource.read()
When I attempt to log in, I pass the correct username and pwd,. yet the login fails, and no cookie data is saved.
My two questions are:
can anyone see whats wrong with the login() function, and how may I fix it?
how may I modify the get_page() function to make use of any cookie info I have saved ?

There are quite a few problems with the code that you've posted. Typically you'll want to build a custom opener which can handle redirects, https, etc. otherwise you'll run into trouble. As far as the cookies themselves so, you need to call the load and save methods on your cookiejar, and use one of subclasses, such as MozillaCookieJar or LWPCookieJar.
Here's a class I wrote to login to Facebook, back when I was playing silly web games. I just modified it to use a file based cookiejar, rather than an in-memory one.
import cookielib
import os
import urllib
import urllib2
# set these to whatever your fb account is
fb_username = "your#facebook.login"
fb_password = "secretpassword"
cookie_filename = "facebook.cookies"
class WebGamePlayer(object):
def __init__(self, login, password):
""" Start up... """
self.login = login
self.password = password
self.cj = cookielib.MozillaCookieJar(cookie_filename)
if os.access(cookie_filename, os.F_OK):
self.cj.load()
self.opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(self.cj)
)
self.opener.addheaders = [
('User-agent', ('Mozilla/4.0 (compatible; MSIE 6.0; '
'Windows NT 5.2; .NET CLR 1.1.4322)'))
]
# need this twice - once to set cookies, once to log in...
self.loginToFacebook()
self.loginToFacebook()
self.cj.save()
def loginToFacebook(self):
"""
Handle login. This should populate our cookie jar.
"""
login_data = urllib.urlencode({
'email' : self.login,
'pass' : self.password,
})
response = self.opener.open("https://login.facebook.com/login.php", login_data)
return ''.join(response.readlines())
test = WebGamePlayer(fb_username, fb_password)
After you've set your username and password, you should see a file, facebook.cookies, with your cookies in it. In practice you'll probably want to modify it to check whether you have an active cookie and use that, then log in again if access is denied.

If you are having a hard time making your POST requests to work (like I had with a login form), it definitely pays to quickly install the Live HTTP headers extension to Firefox (http://livehttpheaders.mozdev.org/index.html). This small extension can, among other things, show you the exact POST data that are sent when you manually log in.
In my case, I had banged my head against the wall for hours because the site insisted on an extra field with 'action=login' (doh!).

Please using ignore_discard and ignore_expires while save cookie, in mine case it saved OK.
self.cj.save(cookie_file, ignore_discard=True, ignore_expires=True)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

No Content-Disposition header in response from mechanize - python

Related

Python - urllib - Server is not getting everything after '?' in URL

Dropbox /delta ignoring cursor

How to authenticate a site with Python using urllib2?

pass session cookies in http header with python urllib2?

Python form POST using urllib2 (also question on saving/using cookies)

Categories

Resources