I am trying to create an http request to get some json data from a site online. When I set up the requests.get() function, it seems to be translating some of the special characters in the parameters to other values, causing the response to fail. Is there a way to control how the .get() is sent?
I'm trying to send this http request:
'https://registers.esma.europa.eu/solr/esma_registers_firds_files/select?q=*&fq=publication_date:%5B2020-05-10T00:00:00Z+TO+2020-05-10T23:59:59Z%5D&wt=json&indent=true&start=0&rows=100'
To do so, here is my code:
response = requests.get('https://registers.esma.europa.eu/solr/esma_registers_firds_files/select',
params={'q':'*',
'fq':'publication_date:%5B2020-05-10T00:00:00Z+TO+2020-05-10T23:59:59Z%5D',
'wt':'json',
'indent': 'true',
'start':0,
'rows':100},)
However, this code seems to translate the '*' character and the ':' character into a different format, which means I'm getting a bad response code. Here is how it prints out when I run the .url() on the code:
response.url
https://registers.esma.europa.eu/solr/esma_registers_firds_files/select?q=%2A&fq=publication_date%3A%255B2020-05-10T00%3A00%3A00Z%2BTO%2B2020-05-10T23%3A59%3A59Z%255D&wt=json&indent=true&start=0&rows=100
You can see that the '*' in the 'q' param became '%2A', and the ':' in the 'fq' param became '%3A', etc.
I know the link works, because if I enter it directly into the requests.get(), I get the results I expect.
Is there a way to make it so that the special characters in the .get() don't change? I've been googling anything related to the requests module and character encoding, but haven't had any luck. I could just use the whole url each time I need it, but I think that using params is better practice. Any help would be much appreciated. Thanks!
That's not actually the problem. The conversion you're seeing is supposed to happen. It's called URL encoding.
The problem is in the publication_date value. See the %5B and %5D and the + signs?
'fq':'publication_date:%5B2020-05-10T00:00:00Z+TO+2020-05-10T23:59:59Z%5D'
^^^ ^ ^ ^^^
I don't know where you got this string, but this string has already gone through URL encoding. The %5B, %5D, and + are encoded forms of [, ], and space. You need to provide unencoded values, like this:
'fq':'publication_date:[2020-05-10T00:00:00Z TO 2020-05-10T23:59:59Z]'
requests will handle the encoding.
Related
I'm in the process of building a small (python) tool to retrieve orders from my SW5 store via API.
For this, I combined the username and the API key into a string (separated by ":") and then converted the whole thing as a bytestring. This bytestring was then base64 "encoded" and specified as a header as follows:
`
def get_order_shopware5():
header = {"Authorization": "Basic NjE2NDZkNjk2ZTNhNTM2ZTY1NzA0OTZlNmI2YzRhNjQ2YzY0NTA1MTM1Mzg0NjdhN2E0ODRlMzk3OTZiNGU2NDZlNzA2ODM1Nzk2YzU0NWEzODM2NjQ1MDZkNTM"}
print(header)
res = requests.get("https://shopname.de/api/orders", headers=header)
print(res.content)
`
But when I call the function, I always get a
"b'{"success":false, "message": "Invalid or missing auth"}'"
as a response.
When I manually access www.shopname.de/api/orders via the browser and enter the credentials, everything works fine. So I'm assuming that there's something hanging on the synthax or the credential conversion. But I can't figure out what exactly.
I am grateful for any hints!
Greetz,
Lama
P.S.:
I've tried multiple versions of the header synthax as well as different ways of converting the original string to a bytestring (with/without whitespaces, with/without using full bytes). But nothing worked so far.
If anyone is interested:
The first thing is an error in the URL -> "https://shopname.de/api/orders/" instead of "https://shopname.de/api/orders". The devil is in the detail :D.
The second thing is a slight confusion by the Shopware documentation. It says:
Combine user and key in a string - separated by ":".
Convert this to an octet (byte) string
Convert the resulting string to a base64 string and prepend "Basic ".
Create a header like this -> Authorization : Basic [KEY_VALUE]
3/4 are correct. If you skip step 2 everything works fine.
Greetz
I have url address where its extension needs to be in ASCII/UTF-8
a='sAE3DSRAfv+HG='
i need to convert above as this:
a='sAE3DSRAfv%2BHG%3D'
I searched but not able to get it.
Please see built-in method urllib.parse.quote()
A very important task for the URL is its safe transmission. Its meaning must not change after you created it till it is received by the intended receiver. To achieve that end URL encoding was incorporated. See RFC 2396
URL might contain non-ascii characters like cafés, López etc. Or it might contain symbols which have different meaning when put in the context of a URL. For example, # which signifies a bookmark. To ensure safe transmitting of such characters HTTP standards maintain that you quote the url at the point of origin. And URL is always present in quoted format to anyone else.
I have put sample usage below.
>>> import urllib.parse
>>> a='sAE3DSRAfv+HG='
>>> urllib.parse.quote(a)
'sAE3DSRAfv%2BHG%3D'
>>>
I'm trying to get a message by its Message-ID. The Gmail API has no get() method to pass the Message-ID in, so I have to list() first passing the q parameter as given below:
q="rfc822msgid:%s" % message_id
The response brings a list with a single message, just as hoped. Then I use the get() method to retrieve the message by its Google style identifier. This works like a charm, unless the Message-ID contains a + character:
message_id="a+b#c"
In this case, the Google Api Client requests this URL:
url="https://www.googleapis.com/gmail/v1/users/me/messages?q=rfc822msgid%3Aa+b%40c&alt=json"
I think the client is doing a quote_plus() with safe="+" to avoid the encoding of the + character. But this causes a problem in the commented cases, because the server interprets the + character as an space one, so the Message-ID is no more valid:
message_id="a b#c"
I tried to switch the + character for its quoted representation (%2B), but when the client encodes the URL, the Message-ID becomes quite worst due to the quote(quote()):
message_id="a%252Bb%40c"
So, is there a way to send the + character avoiding the server to decode it as a space character?
Thanks in advance.
EDIT: I was working on the solutions commented here with no positive result. But since a few days ago, my original code started to work. I've not changed a single line, so I think Google has fixed something related this. Thanks for the comments.
URLEncoder.encode("+", "UTF-8"); yields "%2B"
replace "+" with query parameter. ie
URLEncoder.encode("rfc822msgid:", "UTF-8");
I have to verify a list of strings to be present in a response to a soap request. I am using pylot testing tool. I know that if I use a string inside <verify>abcd</verify>element it works fine. I have to use regex though and I seem to face problems with the same since I am not good with regex.
I have to verify if <TestName>Abcd Hijk</TestName> is present in my response for the request sent.
Following is my attempt to write the regex inside testcases.xml
<verify>[.TestName.][\w][./TestName.]</verify>
Is this the correct way to write a regex in testcases.xml file? I want to exactly verify the tagnames and its values mentioned above.
When I run the tool, it gives me no errors. But If I change the the characters to <verify>[.TesttttName.][\w][./TestttttName.]</verify> and run the tool, it still run without giving errors. While this should be a failed run since no tags like the one mentioned is present in the response!
Could someone please tell me what I am doing wrong in the regex here?
Any help would be appreciated. Thanks!
The regex used should be like the following.
<verify><TestName>[\w\s]+</TestName></verify>
The reason being, Pylot has the response content in the form of a text i.e, [the above part in the response would be like the following]
.......<TestName>ABCd Hijk</TestName>.....
What Pylot does is, when it parses element in the Testcases.xml, it takes the value of the element in TEXT format. Then it searches for the 'verify text' in the response which it got from the request.
Hence whenever we would want to verify anything in Pylot using regex we need to put the regex in the above format so that it gives the required results.
Note: One has to be careful of the response format received. To view the response got from the request, Enable the Log Messages on the tool or if you want to view the response on the console, edit the tools engine.py module and insert print statements.
The raw regular expression (no XML escape). I assume you want to accept English alphabet a-zA-Z, digits 0-9, underscore _ and space characters (space, new line, carriage return, and a few others - check documentation for details).
<TestName>[\w\s]+</TestName>
You need to escape the < and > to specify inside <verify> tag:
<TestName>[\w\s]+</TestName>
I have got a url in this form - http:\\/\\/en.wikipedia.org\\/wiki\\/The_Truman_Show. How can I make it normal url. I have tried using urllib.unquote without much success.
I can always use regular expressions or some simple string replace stuff. But I believe that there is a better way to handle this...
urllib.unquote is for replacing %xx escape codes in URLs with the characters they represent. It won't be useful for this.
Your "simple string replace stuff" is probably the best solution.
Have you tried using json.loads from the json module?
>>> json.loads('"http:\\/\\/en.wikipedia.org\\/wiki\\/The_Truman_Show"')
'http://en.wikipedia.org/wiki/The_Truman_Show'
The input that I'm showing isn't exactly what you have. I've wrapped it in double quotes to make it valid json.
When you first get it from the json, how are you decoding it? That's probably where the problem is.
It is too childish -- look for some library function when you can transform URL by yourself.
Since there are not other visible rules but "/" replaced by "\/", you can simply replace it back:
def unescape_this(url):
return url.replace(r"\\/", "/")