Although this is most likely a newbie question I struggled to find any information online to help me with my problem
My code is meant to scrap onion sites, and despite being able to connect to TOR and the web scraper working fine as a stand-alone, when I tried combining both code blocks I kept getting numerous errors regarding the keyword argument in my code, even attempting to delete it presents me with bugs, I am a bit lost on what I'm supposed to do
import socket
import socks
import requests
from pywebcopy import save_webpage
socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket
def get_tor_session():
session = requests.session()
# Tor uses the 9050 port as the default socks port
session.proxies = {'http': 'socks5h://127.0.0.1:9050',
'https': 'socks5h://127.0.0.1:9050'}
return session
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
kwargs = {'project_name': 'site folder'}
save_webpage(
# url of the website
session.get(url="http://elfqv3zjfegus3bgg5d7pv62eqght4h6sl6yjjhe7kjpi2s56bzgk2yd.onion"),
# folder where the copy will be saved
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
)
In this case, I'm presented with the following error:
TypeError: Cannot mix str and non-str arguments
attempting to replace
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
with
kwargs,
project_folder=r"C:\Users\admin\Desktop\WebScraping"
presents me with this error:
TypeError: save_webpage() got multiple values for argument
traceback for the first error:
File "C:\Users\admin\Desktop\WebScraping\tor.py", line 43, in <module>
**kwargs
File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\api.py", line 58, in save_webpage
config.setup_config(url, project_folder, project_name, **kwargs)
File "C:\Users\admin\anaconda3\lib\site-packages\pywebcopy\configs.py", line 189, in setup_config
SESSION.load_rules_from_url(urljoin(project_url, '/robots.txt'))
File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 487, in urljoin
base, url, _coerce_result = _coerce_args(base, url)
File "C:\Users\admin\anaconda3\lib\urllib\parse.py", line 120, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
I'd really appreciate an explanation on what causes such a bug and how to avoid it in the future
Not sure why this hasn't been answered yet. As mentioned in my comment, simply change this:
save_webpage(
# url of the website
session.get(url=...),
# folder where the copy will be saved
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
)
To:
save_webpage(
# url of the website
url=...,
# folder where the copy will be saved
project_folder=r"C:\Users\admin\Desktop\WebScraping",
**kwargs
)
save_webpage makes the request internally.
SOLVED
adding the following code resolved the issue:
def getaddrinfo(*args):
return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]
socket.getaddrinfo = getaddrinfo
Related
Trying to follow a simple tutorial for the openstack python API I found at http://docs.openstack.org/developer/python-novaclient/api.html but doesn't seem to be working. When I try to run
nova.servers.list()
or
nova.flavors.list()
from the tutorial on the python interpreter, I get following error:
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/novaclient/v2/servers.py", line 617, in list
return self._list("/servers%s%s" % (detail, query_string), "servers")
File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 64, in _list
_resp, body = self.api.client.get(url)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 440, in get
return self._cs_request(url, 'GET', **kwargs)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 399, in _cs_request
self.authenticate()
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 569, in authenticate
self._v2_auth(auth_url)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 634, in _v2_auth
return self._authenticate(url, body)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 647, in _authenticate
**kwargs)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 392, in _time_request
resp, body = self.request(url, method, **kwargs)
File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 386, in request
raise exceptions.from_response(resp, body, url, method)
novaclient.exceptions.NotFound: The resource could not be found. (HTTP 404)
I'm using the same credentials as admin_openrc.sh, which works. Can't figure out what might be the problem.
You're using the python-novaclient as a library and it was never designed to be used that way. It's a CLI that people unfortunately use as a library.
Give the official Python OpenStack SDK a try.
pip install openstacksdk
The code for listing servers or flavors.
import sys
from openstack import connection
from openstack import profile
from openstack import utils
utils.enable_logging(True, stream=sys.stdout)
prof = profile.Profile()
prof.set_region(prof.ALL, 'RegionOne')
conn = connection.Connection(
auth_url='http://my.openstack.com:5000/v2.0',
profile=prof,
username='demo',
project_name='demo',
password='demo')
for server in conn.compute.servers():
print(server)
for flavor in conn.compute.flavors():
print(flavor)
More info that might be helpful too:
http://python-openstacksdk.readthedocs.org/en/latest/users/guides/compute.html
https://pypi.python.org/pypi/openstacksdk
From your description, CLI works fine but script/interpreter fail, so it definitely because you initialize the novaclient.client.Client in a wrong way.
The usage of novaclient.client.Client depends on what version you are using, but your question doesn't provide such information, so currently I cannot provide an example for you, you can check it by run command 'nova --version'.
you can get help from developer documents for python-novaclient http://docs.openstack.org/developer/python-novaclient/api.html
Remember that, it is a good practice to use keyword argument instead of normal argument, which means
nc = client.Client(version=2, user='admin', password='password',
project_id='12345678', auth_url='http://127.0.0.1:5000')
is encouraged, it will expose problem when you try to do something in a wrong way.
Solved the issue: not sure why, openstack complained of missing user domain in auth (don't remember the exactly message error). Couldn't find how to inform user domain in nova, but I do found it on keystone!
from keystoneclient.auth.identity import v3
from keystoneclient import session
from keystoneclient.v3 import client
auth_url = 'http://10.37.135.89:5000/v3/'
username = 'admin'
user_domain_name = 'Default'
project_name = 'admin'
project_domain_name = 'Default'
password = '123456'
auth = v3.Password(auth_url=auth_url,
username=username,
password=password,
project_id='d5eef1aae54742e787d0653eea57254b',
user_domain_name=user_domain_name)
sess = session.Session(auth=auth)
keystone = client.Client(session=sess)
keystone.projects.list()
and after that I used keystone for authenticating in nova:
from novaclient import client
nova = client.Client(2, session=keystone.session)
nova.flavors.list()
Some usefull links that I used for this answer:
http://docs.openstack.org/developer/python-keystoneclient/authentication-plugins.html
http://docs.openstack.org/developer/python-keystoneclient/using-api-v3.html
http://docs.openstack.org/developer/python-novaclient/api.html
This is the script:
import requests
import json
import urlparse
from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=1))
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
as you can see, its trying to access a list of urls through a list of proxies. Here is the urls.txt file:
http://api.exip.org/?call=ip
here is the proxies.txt file:
{"http":"http://107.17.92.18:8080"}
I got this proxy at www.hidemyass.com. Could it be a bad proxy? I have tried several and this is the result. Note: if you are trying to replicate this, you may have to update the proxy to a recent one at hidemyass.com. They seem to stop working eventually.
here is the full error and traceback:
Traceback (most recent call last):
File "test.py", line 17, in <module>
data=requests.get(url, proxies=proxy)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')
Looking at stack trace you've provided your error is caused by httplib.BadStatusLine exception, which, according to docs, is:
Raised if a server responds with a HTTP status code that we don’t understand.
In other words something that is returned (if returned at all) by proxy server cannot be parsed by httplib that does actual request.
From my experience with (writing) http proxies I can say that some implementations may not follow specs too strictly (rfc specs on http aren't easy reading actually) or use hacks to fix old browsers that have flaws in their implementation.
So, answering this:
Could it be a bad proxy?
... I'd say - that this is possible. The only real way to be sure is to see what is returned by proxy server.
Try to debug it with debugger or grab packet sniffer (something like Wireshark or Network Monitor) to analyze what happens in the network. Having info about what exactly is returned by proxy server should give you a key to solve this issue.
Maybe you are overloading the proxy server by sending too much requests in a short period of time, you say that you got the proxy from a popular free proxy website which means that you're not the only one using that server and it's often under heavy load.
If you add some delay between your requests like this :
from time import sleep
[...]
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
sleep(1)
(note the sleep(1) which pauses the execution of the code for one second)
Does it work ?
def hello(self):
self.s = requests.Session()
self.s.headers.update({'User-Agent': self.user_agent})
return True
Try this,It worked for me :)
This happens when you send too many requests to the public IP address of https://anydomainname.example.com/. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://anydomainname.example.com/. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)
This happens when you overload the server with multiple requests. In order to bypass this you can increase the time between each request. But the best thing in my case was to increase the retry times in each request
requests.adapters.DEFAULT_RETRIES = 5 # increase retries number
requests.get(url)
If this is still not helpful you can find more ways here.
As it says in the title, I am trying to access a url through several different proxies sequentially (using for loop). Right now this is my code:
import requests
import json
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies={'http':line})
data1=data.text
print data1
and my urls.txt file:
http://api.exip.org/?call=ip
and my proxies.txt file:
{"https": "84.22.41.1:3128"}
{"http":"194.126.181.47:81"}
{"http":"218.108.170.170:82"}
that I got at [www.hidemyass.com][1]
for some reason, the output is
68.6.34.253
68.6.34.253
68.6.34.253
as if it is accessing that website through my own router ip address. In other words, it is not trying to access through the proxies I give it, it is just looping through and using my own over and over again. What am I doing wrong?
According to this thread, you need to specify the proxies dictionary as {"protocol" : "ip:port"}, so your proxies file should look like
{"https": "84.22.41.1.3128"}
{"http": "194.126.181.47:81"}
{"http": "218.108.170.170:82"}
EDIT:
You're reusing line for both URLs and proxies. It's fine to reuse line in the inner loop, but you should be using proxies=proxy--you've already parsed the JSON and don't need to build another dictionary. Also, as abanert says, you should be doing a check to ensure that the protocol you're requesting matches that of the proxy. The reason the proxies are specified as a dictionary is to allow lookup for the matching protocol.
There are two obvious problems right here:
data=requests.get(url, proxies={'http':line})
First, because you have a for line in urls: inside the for line in proxies:, line is going to be the current URL here, not the current proxy. And besides, even if you weren't reusing line, it would be the JSON string representation, not the dict you decoded from JSON.
Then, if you fix that to use proxy, instead of something like {'https': '83.22.41.1:3128'}, you're passing {'http': {'https': '83.22.41.1:3128'}}. And that obviously isn't a valid value.
To fix both of those problems, just do this:
data=requests.get(url, proxies=proxy)
Meanwhile, what happens when you have an HTTPS URL, but the current proxy is an HTTP proxy? You're not going to use the proxy. So you probably want to add something to skip over them, like this:
if urlparse.urlparse(url).scheme not in proxy:
continue
Directly copied from another answer of mine.
Well, actually you can, I've done this with a few lines of code and it works pretty well.
import requests
class Client:
def __init__(self):
self._session = requests.Session()
self.proxies = None
def set_proxy_pool(self, proxies, auth=None, https=True):
"""Randomly choose a proxy for every GET/POST request
:param proxies: list of proxies, like ["ip1:port1", "ip2:port2"]
:param auth: if proxy needs auth
:param https: default is True, pass False if you don't need https proxy
"""
from random import choice
if https:
self.proxies = [{'http': p, 'https': p} for p in proxies]
else:
self.proxies = [{'http': p} for p in proxies]
def get_with_random_proxy(url, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_get(url, **kwargs)
def post_with_random_proxy(url, *args, **kwargs):
proxy = choice(self.proxies)
kwargs['proxies'] = proxy
if auth:
kwargs['auth'] = auth
return self._session.original_post(url, *args, **kwargs)
self._session.original_get = self._session.get
self._session.get = get_with_random_proxy
self._session.original_post = self._session.post
self._session.post = post_with_random_proxy
def remove_proxy_pool(self):
self.proxies = None
self._session.get = self._session.original_get
self._session.post = self._session.original_post
del self._session.original_get
del self._session.original_post
# You can define whatever operations using self._session
I use it like this:
client = Client()
client.set_proxy_pool(['112.25.41.136', '180.97.29.57'])
It's simple, but actually works for me.
I am using Python's requests library in one method of my application. The body of the method looks like this:
def handle_remote_file(url, **kwargs):
response = requests.get(url, ...)
buff = StringIO.StringIO()
buff.write(response.content)
...
return True
I'd like to write some unit tests for that method, however, what I want to do is to pass a fake local url such as:
class RemoteTest(TestCase):
def setUp(self):
self.url = 'file:///tmp/dummy.txt'
def test_handle_remote_file(self):
self.assertTrue(handle_remote_file(self.url))
When I call requests.get with a local url, I got the KeyError exception below:
requests.get('file:///tmp/dummy.txt')
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/packages/urllib3/poolmanager.pyc in connection_from_host(self, host, port, scheme)
76
77 # Make a fresh ConnectionPool of the desired type
78 pool_cls = pool_classes_by_scheme[scheme]
79 pool = pool_cls(host, port, **self.connection_pool_kw)
80
KeyError: 'file'
The question is how can I pass a local url to requests.get?
PS: I made up the above example. It possibly contains many errors.
As #WooParadog explained requests library doesn't know how to handle local files. Although, current version allows to define transport adapters.
Therefore you can simply define you own adapter which will be able to handle local files, e.g.:
from requests_testadapter import Resp
import os
class LocalFileAdapter(requests.adapters.HTTPAdapter):
def build_response_from_file(self, request):
file_path = request.url[7:]
with open(file_path, 'rb') as file:
buff = bytearray(os.path.getsize(file_path))
file.readinto(buff)
resp = Resp(buff)
r = self.build_response(request, resp)
return r
def send(self, request, stream=False, timeout=None,
verify=True, cert=None, proxies=None):
return self.build_response_from_file(request)
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
requests_session.get('file://<some_local_path>')
I'm using requests-testadapter module in the above example.
Here's a transport adapter I wrote which is more featureful than b1r3k's and has no additional dependencies beyond Requests itself. I haven't tested it exhaustively yet, but what I have tried seems to be bug-free.
import requests
import os, sys
if sys.version_info.major < 3:
from urllib import url2pathname
else:
from urllib.request import url2pathname
class LocalFileAdapter(requests.adapters.BaseAdapter):
"""Protocol Adapter to allow Requests to GET file:// URLs
#todo: Properly handle non-empty hostname portions.
"""
#staticmethod
def _chkpath(method, path):
"""Return an HTTP status for the given filesystem path."""
if method.lower() in ('put', 'delete'):
return 501, "Not Implemented" # TODO
elif method.lower() not in ('get', 'head'):
return 405, "Method Not Allowed"
elif os.path.isdir(path):
return 400, "Path Not A File"
elif not os.path.isfile(path):
return 404, "File Not Found"
elif not os.access(path, os.R_OK):
return 403, "Access Denied"
else:
return 200, "OK"
def send(self, req, **kwargs): # pylint: disable=unused-argument
"""Return the file specified by the given request
#type req: C{PreparedRequest}
#todo: Should I bother filling `response.headers` and processing
If-Modified-Since and friends using `os.stat`?
"""
path = os.path.normcase(os.path.normpath(url2pathname(req.path_url)))
response = requests.Response()
response.status_code, response.reason = self._chkpath(req.method, path)
if response.status_code == 200 and req.method.lower() != 'head':
try:
response.raw = open(path, 'rb')
except (OSError, IOError) as err:
response.status_code = 500
response.reason = str(err)
if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url
response.request = req
response.connection = self
return response
def close(self):
pass
(Despite the name, it was completely written before I thought to check Google, so it has nothing to do with b1r3k's.) As with the other answer, follow this with:
requests_session = requests.session()
requests_session.mount('file://', LocalFileAdapter())
r = requests_session.get('file:///path/to/your/file')
The easiest way seems using requests-file.
https://github.com/dashea/requests-file (available through PyPI too)
"Requests-File is a transport adapter for use with the Requests Python library to allow local filesystem access via file:// URLs."
This in combination with requests-html is pure magic :)
packages/urllib3/poolmanager.py pretty much explains it. Requests doesn't support local url.
pool_classes_by_scheme = {
'http': HTTPConnectionPool,
'https': HTTPSConnectionPool,
}
In a recent project, I've had the same issue. Since requests doesn't support the "file" scheme, I'll patch our code to load the content locally. First, I define a function to replace requests.get:
def local_get(self, url):
"Fetch a stream from local files."
p_url = six.moves.urllib.parse.urlparse(url)
if p_url.scheme != 'file':
raise ValueError("Expected file scheme")
filename = six.moves.urllib.request.url2pathname(p_url.path)
return open(filename, 'rb')
Then, somewhere in test setup or decorating the test function, I use mock.patch to patch the get function on requests:
#mock.patch('requests.get', local_get)
def test_handle_remote_file(self):
...
This technique is somewhat brittle -- it doesn't help if the underlying code calls requests.request or constructs a Session and calls that. There may be a way to patch requests at a lower level to support file: URLs, but in my initial investigation, there didn't seem to be an obvious hook point, so I went with this simpler approach.
To load a file from a local URL, e.g. an image file you can do this:
import urllib
from PIL import Image
Image.open(urllib.request.urlopen('file:///path/to/your/file.png'))
I think simple solution for this will be creating temporary http server using python and using it.
Put all your files in temporary folder eg. tempFolder
Go to that directory and create a temporary http server in terminal/cmd as per your OS using command python -m http.server 8000 (Note 8000 is port no.)
This will you give you a link to http server. You can access it from http://127.0.0.1:8000/
Open your desired file in browser and copy the link to your url.
So I'm working with a block of code which communicates with the Flickr API.
I'm getting a 'syntax error' in xml.parsers.expat.ExpatError (below). Now I can't figure out how it'd be a syntax error in a Python module.
I saw another similar question on SO regarding the Wikipedia API which seemed to return HTML intead of XML. Flickr API returns XML; and I'm also getting the same error when there shouldn't be a response from Flickr (such as flickr.galleries.addPhoto)
CODE:
def _dopost(method, auth=False, **params):
#uncomment to check you aren't killing the flickr server
#print "***** do post %s" % method
params = _prepare_params(params)
url = '%s%s/%s' % (HOST, API, _get_auth_url_suffix(method, auth, params))
payload = 'api_key=%s&method=%s&%s'% \
(API_KEY, method, urlencode(params))
#another useful debug print statement
#print url
#print payload
return _get_data(minidom.parse(urlopen(url, payload)))
TRACEBACK:
Traceback (most recent call last):
File "TESTING.py", line 30, in <module>
flickr.galleries_create('test_title', 'test_descriptionn goes here.')
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1006, in galleries_create
primary_photo_id=primary_photo_id)
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1066, in _dopost
return _get_data(minidom.parse(urlopen(url, payload)))
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 62
(Code from http://code.google.com/p/flickrpy/ under New BSD licence)
UPDATE:
print urlopen(url, payload) == <addinfourl at 43340936 whose fp = <socket._fileobject object at 0x29400d0>>
Doing a urlopen(url, payload).read() returns HTML which is hard to read in a terminal :P but I managed to make out a 'You are not signed in.'
The strange part is that Flickr shouldn't return anything here, or if permissions are a problem, it should return a 99: User not logged in / Insufficient permissions error as it does with the GET function (which I'd expect would be in valid XML).
I'm signed in to Flickr (in the browser) and the program is properly authenticated with delete permissions (dangerous, but I wanted to avoid permission problems.)
SyntaxError normally means an error in Python syntax, but I think here that expatbuilder is overloading it to mean an XML syntax error. Put a try:except block around it, and print out the contents of payload and to work out what's wrong with the first line of it.
My guess would be that flickr is rejecting your request for some reason and giving back a plain-text error message, which has an invalid xml character at column 62, but it could be any number of things. You probably want to check the http status code before parsing it.
Also, it's a bit strange this method is called _dopost but you seem to actually be sending an http GET. Perhaps that's why it's failing.
This seems to fix my problem:
url = '%s%s/?api_key=%s&method=%s&%s'% \
(HOST, API, API_KEY, method, _get_auth_url_suffix(method, auth, params))
payload = '%s' % (urlencode(params))
It seems that the API key and method had to be in the URL not in the payload. (Or maybe only one needed to be there, but anyways, it works :-)