GAE Python URL Fetch throws InvalidURLError [duplicate]

GAE Python URL Fetch throws InvalidURLError [duplicate] - python

This question already has an answer here:
Python GAE urlfetch credentials
(1 answer)
Closed 7 years ago.
GAE Python URL Fetch throws InvalidURLError while the same URL works perfectly with Postman ( Google Chrome App ).
CODE
url = "https://abcdefgh:28dfd95928dfd95928dfd95928dfd95928dfd95928dfd959#twilix.exotel.in/v1/Accounts/abcdefgh/Sms/send"
form_fields = {
"From": "08039511111",
"To": "+919844100000",
"Body": "message for you"
}
form_data = urllib.urlencode (form_fields)
try:
result = urlfetch.fetch(url=url,
payload=form_data,
method=urlfetch.POST,
headers={'Content-Type': 'application/x-www-form-urlencoded' }
)
logging.info ("result = ["+repr (result)+"] ")
except Exception:
logging.error ("Exception. ["+traceback.format_exc ()+"] ")
OUTPUT LOGS
2016-01-21 15:48:23.368 +0530 E Exception. [
Traceback (most recent call last): File "main.py", line 27, in get method=urlfetch.POST,
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/urlfetch.py", line 271, in fetch return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/urlfetch.py", line 389, in _get_fetch_result 'Invalid request URL: ' + url + error_detail) InvalidURLError: Invalid request URL: https://abcdefgh:28dfd95928dfd95928dfd95928dfd95928dfd95928dfd959#twilix.exotel.in/v1/Accounts/abcdefgh/Sms/send ]
For security purpose, I have replaced sensitive text in the URL with similar different characters.

The code indicates an INVALID_URL RPC error code was received from the urlfetch service.
The most common occurence seems to be due to the URL length limit (check if your unedited URL hits that): Undocumented max length for urlfetch URL?
A long time ago it was also seen for very slow URLs (in Go land, but I suspect the urlfetch service itself is the same serving all language sandboxes) - unsure if this still stands, I also see a DEADLINE_EXCEEDED error code as well which might have been introduced specifically for such case in the meantime): Google App Engine Go HTTP request to a slow page
The failure might also be related to incorrect parsing of the rather unusual "host" portion of your URL foo:blah#hostname. Check if it you're getting the same error if dropping the foo:blah# portion. If it's indeed the case you might want to file an issue with Google - the URL seems valid, works with curl as well.

I found the problem and the solution.
We need to specify the HTTP auth info using headers.
urlfetch.make_fetch_call ( rpc,
url,
method = urlfetch.POST,
headers = { "Authorization" : "Basic %s" % base64.b64encode ( URL_USERNAME+":"+URL_PASSOWRD ) },
)
Courtesy
https://stackoverflow.com/a/8454580/1443563 by raugfer

Related

Status parameter not working when using python blogger api

I'm trying to use google-api-python-client 1.12.5 with Service account auth under Python 3.8. It seems to me that the when specifying the status parameter, Google responds with a 404 HTTP code. I can't figure out why. I also looked in the docs but I can't relate anything to this error.
I have pasted my code. The error is happening in the third call.
This is the code:
from google.oauth2 import service_account
from googleapiclient.discovery import build
SCOPES = ['https://www.googleapis.com/auth/blogger']
SERVICE_ACCOUNT_FILE = 'new_service_account.json'
BLOG_ID = '<your_blog_id>'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('blogger', 'v3', credentials=credentials)
p = service.posts()
# FIRST
promise = p.list(blogId=BLOG_ID)
result = promise.execute()
# SECOND
promise = p.list(blogId=BLOG_ID, orderBy='UPDATED')
result = promise.execute()
#THIRD
promise = p.list(blogId=BLOG_ID, orderBy='UPDATED', status='DRAFT')
result = promise.execute() # <===== ERROR HAPPENS HERE!!!!
service.close()
And this is the traceback:
Traceback (most recent call last):
File "/home/madtyn/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/202.7660.27/plugins/python/helpers/pydev/pydevd.py", line 1448, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/madtyn/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/202.7660.27/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/madtyn/PycharmProjects/blogger/main.py", line 24, in <module>
result = promise.execute()
File "/home/madtyn/venvs/blogger/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/madtyn/venvs/blogger/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 404 when requesting https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?orderBy=UPDATED&status=DRAFT&alt=json returned "Not Found">
python-BaseException

I can reproduce this issue... Adding status=DRAFT will return 404 but any other filter is working...
Tried with service account and your code: 404
Tried with API Key like this result = requests.get('https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?status=DRAFT&orderBy=UPDATED&alt=json&key=<api_key>'): 404
Extracted "access_token" from service account (credentials.token after a call): result = requests.get('https://blogger.googleapis.com/v3/blogs/<blog_id>/posts?status=DRAFT&orderBy=UPDATED&alt=json&access_token=<extracted_service_account_token>'): 404
But very strangely if I use access_token given by "Try this API" here : https://developers.google.com/blogger/docs/3.0/reference/posts/list?apix_params={"blogId"%3A"blog_id"%2C"orderBy"%3A"UPDATED"%2C"status"%3A["DRAFT"]%2C"alt"%3A"json"} it's works !
Used that token with requests give me my blog post in draft status...
Just copy/paste raw Authorization header inside that script:
import requests
blog_id = '<blog_id>'
headers = {
'Authorization' : 'Bearer <replace_here>'
}
# Using only Authorization header
result = requests.get(
'https://blogger.googleapis.com/v3/blogs/%s/posts?status=DRAFT&orderBy=UPDATED&alt=json' % (blog_id),
headers=headers
)
print(result)
# This should print DRAFT if you have at least one draft post
print(result.json()['items'][0]['status'])
# Using "access_token" param constructed with Authorization header splited to have only token
result = requests.get('https://blogger.googleapis.com/v3/blogs/%s/posts?status=DRAFT&orderBy=UPDATED&alt=json&access_token=%s' % (blog_id, headers['Authorization'][len('Bearer '):]))
print(result)
# This should print DRAFT if you have at least one draft post
print(result.json()['items'][0]['status'])
Results I have currently:
The bug doesn't seem to come from the library but rather from the token rights...However I also used the console normally to generate accesses like you.
To conclude I think it's either a bug or it's voluntary from Google... I don't know how long the "Try this API" token is valid but it is currently the only way I found to get the draft articles... Maybe you can try to open a bug ticket but I don't know specifically where it is possible to do that.

SharePlum error : "Can't get User Info List"

I'm trying to use SharePlum which is a Python module for SharePoint but when I try to connect to my SharePoint, SharePlum raises me this error:
Traceback (most recent call last):
File "C:/Users/me/Desktop/Sharpoint/sharpoint.py", line 13, in site = Site(sharepoint_url, auth=auth)
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py", line 46, in init self.users = self.GetUsers()
File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py", line 207, in GetUsers raise Exception("Can't get User Info List")
Exception: Can't get User Info List
Here is the very short code that I have written:
auth = HttpNtlmAuth(username, password)
site = Site(sharepoint_url, auth=auth)
This error seems to indicate bad username/password but I'm pretty sure that the one I have are correct...

Ok, it seems that I found the solution for my problem, it's about the Sharepoint URL that I gave.
If we take this example : https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary
You have to remove the last part : /DocumentLibrary.
Why remove this part precisely ?
In fact, when you go deep enough in your Sharepoint, your url will look like something like : https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary/Forms/AllItems.aspx?RootFolder=%2FYour%2FSharePoint%2DocumentLibrary%2FmyPersonnalFolder&FolderCTID=0x0120008BBC54784D92004D1E23F557873CC707&View=%7BE149526D%2DFD1B%2D4BFA%2DAA46%2D90DE0770F287%7D
You can see that the right of the path is in RootFolder=%2FYour%2FSharePoint%2DocumentLibrary%2Fmy%20personnal%20folder and not in the "normal" URL anymore (if it were, it will be like that https://www.mysharepoint.com/Your/SharePoint/DocumentLibrary/myPersonnalFolder/).
What you have to remove is the end of the "normal" URL so in this case, /DocumentLibrary.
So my correct Sharepoint URL to input in SharePlum will be https://www.mysharepoint.com/Your/SharePoint/
I'm pretty new to Sharepoint so I'm not really sure that this I the right answer to this problem for the others persons, may someone who know Sharepoint better than me can confirm ?

I know this is not actual solution for your problem and I would add just comment but it was too long so I will post as answer.
I can't replicate your issue, but by looking into source code of shareplum.py you can see why program throws the error. In line 196 of shareplum.py there is if clause (if response.status_code == 200:) which checks if the request to access your sharepoint url was successful (than it has status code 200) and if request failed (than it has some other status code) than it throws exception (Can't get User Info List). If you want to find out more about your problem go to your shareplum.py file ("C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\shareplum\shareplum.py") and add this line print('{} {} Error: {} for url: {}'.format(response.status_code, 'Client'*(400 <= response.status_code < 500) + 'Server'*(500 <= response.status_code < 600), response.reason, response.url)) before line 207 ('raise Exception("Can't get User Info List")'). Then your shareplum.py should look like this:
# Parse Response
if response.status_code == 200:
envelope = etree.fromstring(response.text.encode('utf-8'))
listitems = envelope[0][0][0][0][0]
data = []
for row in listitems:
# Strip the 'ows_' from the beginning with key[4:]
data.append({key[4:]: value for (key, value) in row.items() if key[4:]})
return {'py': {i['ImnName']: i['ID']+';#'+i['ImnName'] for i in data},
'sp': {i['ID']+';#'+i['ImnName'] : i['ImnName'] for i in data}}
else:
print('{} {} Error: {} for url: {}'.format(response.status_code, 'Client'*(400 <= response.status_code < 500) + 'Server'*(500 <= response.status_code < 600), response.reason, response.url))
raise Exception("Can't get User Info List")
Now just run your program again and it should print out why it isn't working.
I know it is best not to change files in Python modules, but if you know what you change then there is no problem so when you are finished just delete the added line.
Also when you find out status code you can search it online, just type it in google or search on List_of_HTTP_status_codes.

Requests CookieJar empty even thought the page have it

I'm on Python 3.5.1, using requests, the relevant part of the code is as follows:
req = requests.post(self.URL, data={"username": username, "password": password})
self.cookies = {"MOODLEID1_": req.cookies["MOODLEID1_"], "MoodleSession": req.cookies["MoodleSession"]}
self.URL has the correct page, and the POST is working as intended, I did some print to check that, and it passed.
My output:
Traceback (most recent call last):
File "D:/.../main.py", line 14, in <module>
m.login('first.last', 'pa$$w0rd!')
File "D:\...\moodle2.py", line 14, in login
self.cookies = {"MOODLEID1_": req.cookies["MOODLEID1_"], "MoodleSession": req.cookies["MoodleSession"]}
File "D:\...\venv\lib\site-packages\requests\cookies.py", line 287, in __getitem__
return self._find_no_duplicates(name)
File "D:\...\venv\lib\site-packages\requests\cookies.py", line 345, in _find_no_duplicates
raise KeyError('name=%r, domain=%r, path=%r' % (name, domain, path))
KeyError: "name='MOODLEID1_', domain=None, path=None"
I'm trying to debug during runtime to check what req.cookies has. But what I get is surprising, at least for me. If you put a breakpoint on self.cookies = {...} and run [(c.name, c.value, c.domain) for c in req.cookies] I get an empty list, like there isn't any cookie in there.
The site does send cookies, checking with a Chrome extension, I found 2, "MOODLEID1_" and "MoodleSession", so why I'm not getting them?

The response doesn't appear to contain any cookies. Look for one or more Set-Cookie headers in req.headers.
Cookies stored in a browser are there because a response included a Set-Cookie header for each of those cookies. You'll have to find what response the server sets those cookies with; apparently it is not this response.
If you need to retain those cookies (once set) across requests, do use a requests.Session() object; this'll retain any cookies returned by responses and send them out again as appropriate with new requests.

Python: requests.exceptions.ConnectionError. Max retries exceeded with url

This is the script:
import requests
import json
import urlparse
from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=1))
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
as you can see, its trying to access a list of urls through a list of proxies. Here is the urls.txt file:
http://api.exip.org/?call=ip
here is the proxies.txt file:
{"http":"http://107.17.92.18:8080"}
I got this proxy at www.hidemyass.com. Could it be a bad proxy? I have tried several and this is the result. Note: if you are trying to replicate this, you may have to update the proxy to a recent one at hidemyass.com. They seem to stop working eventually.
here is the full error and traceback:
Traceback (most recent call last):
File "test.py", line 17, in <module>
data=requests.get(url, proxies=proxy)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')

Looking at stack trace you've provided your error is caused by httplib.BadStatusLine exception, which, according to docs, is:
Raised if a server responds with a HTTP status code that we don’t understand.
In other words something that is returned (if returned at all) by proxy server cannot be parsed by httplib that does actual request.
From my experience with (writing) http proxies I can say that some implementations may not follow specs too strictly (rfc specs on http aren't easy reading actually) or use hacks to fix old browsers that have flaws in their implementation.
So, answering this:
Could it be a bad proxy?
... I'd say - that this is possible. The only real way to be sure is to see what is returned by proxy server.
Try to debug it with debugger or grab packet sniffer (something like Wireshark or Network Monitor) to analyze what happens in the network. Having info about what exactly is returned by proxy server should give you a key to solve this issue.

Maybe you are overloading the proxy server by sending too much requests in a short period of time, you say that you got the proxy from a popular free proxy website which means that you're not the only one using that server and it's often under heavy load.
If you add some delay between your requests like this :
from time import sleep
[...]
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
sleep(1)
(note the sleep(1) which pauses the execution of the code for one second)
Does it work ?

def hello(self):
self.s = requests.Session()
self.s.headers.update({'User-Agent': self.user_agent})
return True
Try this,It worked for me :)

This happens when you send too many requests to the public IP address of https://anydomainname.example.com/. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://anydomainname.example.com/. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)

This happens when you overload the server with multiple requests. In order to bypass this you can increase the time between each request. But the best thing in my case was to increase the retry times in each request
requests.adapters.DEFAULT_RETRIES = 5 # increase retries number
requests.get(url)
If this is still not helpful you can find more ways here.

XML parser syntax error

So I'm working with a block of code which communicates with the Flickr API.
I'm getting a 'syntax error' in xml.parsers.expat.ExpatError (below). Now I can't figure out how it'd be a syntax error in a Python module.
I saw another similar question on SO regarding the Wikipedia API which seemed to return HTML intead of XML. Flickr API returns XML; and I'm also getting the same error when there shouldn't be a response from Flickr (such as flickr.galleries.addPhoto)
CODE:
def _dopost(method, auth=False, **params):
#uncomment to check you aren't killing the flickr server
#print "***** do post %s" % method
params = _prepare_params(params)
url = '%s%s/%s' % (HOST, API, _get_auth_url_suffix(method, auth, params))
payload = 'api_key=%s&method=%s&%s'% \
(API_KEY, method, urlencode(params))
#another useful debug print statement
#print url
#print payload
return _get_data(minidom.parse(urlopen(url, payload)))
TRACEBACK:
Traceback (most recent call last):
File "TESTING.py", line 30, in <module>
flickr.galleries_create('test_title', 'test_descriptionn goes here.')
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1006, in galleries_create
primary_photo_id=primary_photo_id)
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1066, in _dopost
return _get_data(minidom.parse(urlopen(url, payload)))
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 62
(Code from http://code.google.com/p/flickrpy/ under New BSD licence)
UPDATE:
print urlopen(url, payload) == <addinfourl at 43340936 whose fp = <socket._fileobject object at 0x29400d0>>
Doing a urlopen(url, payload).read() returns HTML which is hard to read in a terminal :P but I managed to make out a 'You are not signed in.'
The strange part is that Flickr shouldn't return anything here, or if permissions are a problem, it should return a 99: User not logged in / Insufficient permissions error as it does with the GET function (which I'd expect would be in valid XML).
I'm signed in to Flickr (in the browser) and the program is properly authenticated with delete permissions (dangerous, but I wanted to avoid permission problems.)

SyntaxError normally means an error in Python syntax, but I think here that expatbuilder is overloading it to mean an XML syntax error. Put a try:except block around it, and print out the contents of payload and to work out what's wrong with the first line of it.
My guess would be that flickr is rejecting your request for some reason and giving back a plain-text error message, which has an invalid xml character at column 62, but it could be any number of things. You probably want to check the http status code before parsing it.
Also, it's a bit strange this method is called _dopost but you seem to actually be sending an http GET. Perhaps that's why it's failing.

This seems to fix my problem:
url = '%s%s/?api_key=%s&method=%s&%s'% \
(HOST, API, API_KEY, method, _get_auth_url_suffix(method, auth, params))
payload = '%s' % (urlencode(params))
It seems that the API key and method had to be in the URL not in the payload. (Or maybe only one needed to be there, but anyways, it works :-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

GAE Python URL Fetch throws InvalidURLError [duplicate] - python

Related

Status parameter not working when using python blogger api

SharePlum error : "Can't get User Info List"

Requests CookieJar empty even thought the page have it

Python: requests.exceptions.ConnectionError. Max retries exceeded with url

XML parser syntax error

Categories

Resources