How to convert a MongoDB find() query to a PyMongo dictionary? - python

I am successfully using PyMongo to connect and query (or rather find()) documents in my MongoDB collection.
However, I'm running into an issue. Using the MongoDB interface, I can execute the command:
db.mycollection.find({ $and: [{filename: {$regex: '\\.part_of_my_name'} }, {bases: {$gt: 10000 } } ] } )
And it works great. However, when I try to execute this command in PyMongo, I see that the PyMongo declaration for find() requires a Python dict object only. It actually raises an exception if a string is provided.
So my question is: how can I convert the above JSON(?) string to a dictionary?
I tried building it manually, but it's overly complicated, and I'm wondering if there is a simple way to go from string to dictionary.

To go from a string to a dictionary, you can use json.loads:
>>> import json
>>> json.loads('{"key":"value"}')
{'key': 'value'}
However, you cannot always copy and paste a MongoDB command and expect it to be valid json. For example:
>>> json.loads('{$and: [{filename: {$regex: "\\.part_of_my_name"} }, {bases: {$gt: 10000 }}]}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
In order for it to work in Python, all of the keys need to be explicitly quoted. So $and needs to become "$and". Also, you'll have to use additional escapes for backslashes (kinda ugly, I know).
The query in your example should look like this:
jsonString = '{"$and":[{"filename":{"$regex":"\\\\.part_of_my_name"}},{"bases":{"$gt":10000}}]}'
You can then use json.loads on it. However, At this point it is a valid Python dictionary, so you could just use this:
jsonDict = {"$and":[{"filename":{"$regex":"\\.part_of_my_name"}},{"bases":{"$gt":10000}}]}

Related

What's the most Pythonic way to parse out this value from a JSON-like blob?

See below. Given a well-known Google URL, I'm trying to retrieve data from that URL. That data will provide me another Google URL from which I can retrieve a list of JWKs.
>>> import requests, json
>>> open_id_config_url = 'https://ggp.sandbox.google.com/.well-known/openid-configuration'
>>> response = requests.get(open_id_config_url)
>>> r.status_code
200
>>> response.text
u'{\n "issuer": "https://www.stadia.com",\n "jwks_uri": "https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com",\n "claims_supported": [\n "iss",\n "aud",\n "sub",\n "iat",\n "exp",\n "s_env",\n "s_app_id",\n "s_gamer_tag",\n "s_purchase_country",\n "s_current_country",\n "s_session_id",\n "s_instance_ip",\n "s_restrict_text_chat",\n "s_restrict_voice_chat",\n "s_restrict_multiplayer",\n "s_restrict_stream_connect",\n ],\n "id_token_signing_alg_values_supported": [\n "RS256"\n ],\n}'
Above I have successfully retrieved the data from the first URL. I can see the entry jwks_uri contains the second URL I need. But when I try to convert that blob of text to a python dictionary, it fails.
>>> response.json()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/saqib.ali/saqib-env-99/lib/python2.7/site-packages/requests/models.py", line 889, in json
self.content.decode(encoding), **kwargs
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
>>> json.loads(response.text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The only way I can get out the JWKs URL is by doing this ugly regular expression parsing:
>>> re.compile('(?<="jwks_uri": ")[^"]+').findall(response.text)[0]
u'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Is there a cleaner, more Pythonic way to extract this string?
I really wish Google would send back a string that could be cleanly JSON-ified.
The returned json string is incorrect because last item of the dictionary ends with ,, which json cannot parse.
": [\n "RS256"\n ],\n}'
^^^
But ast.literal_eval can do that (as python parsing accepts lists/dicts that end with a comma). As long as you don't have booleans or null values, it is possible and pythonic
>>> ast.literal_eval(response.text)["jwks_uri"]
'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Your JSON is invalid because it has an extra comma after the last value in the claims_supported array.
I wouldn't necessarily recommend it, but you could use the similarity of JSON and Python syntax to parse this directly, since Python is much less picky:
ast.literal_eval(response.tezt)
As suggested in this answer use yaml to parse json. It will tolerate the trailing comma as well as other deviations from the json standard.
import yaml
d = yaml.load(response.text)

json.loads(string) throws json.decoder.JSONDecodeError when trying to convert string to dictionary

I'm using the requests library to make an api call. The json response is then formatted as a string and sent as part of a result to my server as shown by the code snippet:
def get_and_send(url, method):
resp = requests.request(url=url, method=method, **kwargs)
result = f'{{ "status_code":{resp.status_code}, "content":{resp.json()} }}'
send_to_server(result)
I intend to convert this result back to a dictionary object from the string result on the server.
The problem I have is that when I use json.loads(result) to convert the string to dictionary object, it throws the following error
Exception in thread Thread-2: Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/adipster/PycharmProjects/ScriptBackbone/ts_server/agent_thread.py", line 39, in run
resp_data = self._task_formatter.format_response(response) # Formats the response
File "/home/adipster/PycharmProjects/ScriptBackbone/utils/task_formatter.py", line 26, in format_response
response = self.get_dict_response(response.decode().strip())
File "/home/adipster/PycharmProjects/ScriptBackbone/utils/task_formatter.py", line 36, in get_dict_response
raise exp
File "/home/adipster/PycharmProjects/ScriptBackbone/utils/task_formatter.py", line 34, in get_dict_response
return json.loads(response)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 32 (char 31)
I understand that the error is because all my keys have to be in double quotes which is not the case when resp.json() is formatted to a string.
Please can someone help out on how I could ensure that all the keys of my dictionary object are in double quotes?
Or any other alternative to this will be welcomed. Thanks
The issue is, as you point out, that if you just do string interpolation (with an f-string, for example), the quotes for strings in the JSON object will be single quotes, but JSON format requires double quotes.
To fix this you can use json.dumps which takes a JSON object (in Python) and converts it to a properly formatted JSON string. Edit thanks to Charles Duffy: You can avoid the f-string entirely by just creating the whole result_data object as a dictionary, then converting it to JSON all at once using json.dumps.
For example:
import json
def get_and_send(url, method):
resp = requests.request(url=url, method=method, **kwargs)
result_data = {
"status_code": resp.status_code,
"content": resp.json(),
}
result = json.dumps(result_data)
send_to_server(result)

how to pass directly variable name as dict key with json.loads in python?

I'm making a script that write other script, arguments are passed as JSON while calling the script from terminal.
The script that need to be written contains a dictionnary.
One of the key value in this dict is a variable name (not a string) call strategy.
My problem looks like this.
d = json.loads(sys.argv[2])
# d should looks like that
d = {
"stopLossValue": 5,
"strategy": strategy,
"strategyTitle": "week5"
}
dic = """
parameterDict = {}
""".format(json.dumps(d, sort_keys=True, indent=4))
Running the script return an error that disapear if i set strategy key value as string.
Error:
Traceback (most recent call last):
File "updateCandleStrategy.py", line 11, in <module>
d = json.loads(sys.argv[2])
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 506 (char 505)
Is there a simple way to achieve my goal?
Thanks
when passing a json string from command line, there's a great chance that one of the quotes / escape char is interpreted by the underlying shell.
So that's not a viable/reliable method to pass json strings. Pass a file containing json data instead and read it:
with open(sys.argv[2]) as f:
d = json.load(f)
Example from windows console, just printing second argument:
S:\python>foo.py ff "d = {"s":12,"d":15}"
d = {s:12,d:15}
the quotes have been removed. Would need to double them.
On a Linux terminal, wrapping your argument into single quotes could solve most situations, though, until you stumble on a value containing a single quote...
Instead of passing a dictionary, why not using getopt or argparse and build/parse a proper command line?

Unable to load json containing escape sequences

I'm being passed some Json and am having trouble parsing it.
The object is currently simple with a single key/value pair. The key works fine but the value \d causes issues.
This is coming from an html form, via javascript. All of the below are literals.
Html: \d
Javascript: {'Key': '\d'}
Json: {"Key": "\\d"}
json.loads() doesn't seem to like Json in this format. A quick sanity check that I'm not doing anything silly works fine:
>>> import json
>>> json.loads('{"key":"value"}')
{'key': 'value'}
Since I'm declaring this string in Python, it should escape it down to a literal of va\\lue - which, when parsed as Json should be va\lue.
>>> json.loads('{"key":"va\\\\lue"}')
{'key': 'va\\lue'}
In case python wasn't escaping the string on the way in, I thought I'd check without the doubling...
>>> json.loads('{"key":"va\\lue"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python33\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\Python33\lib\json\decoder.py", line 352, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python33\lib\json\decoder.py", line 368, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 11 (char 10)
but it fails, as expected.
I can't see any way to parse Json field that should contain a single backslash after all the unescaping has taken place.
How can I get Python to deserialize this string literal {"a":"val\\ue"} (which is valid Json) into the appropriate python representation: {'a': 'val\ue'}?
As an aside, it doesn't help that PyDev is inconsistent with what representation of a string it uses. The watch window shows double backslashes, the tooltip of the variable shows quadruple backslashes. I assume that's the "If you were to type the string, this is what you'd have to use for it to escape to the original" representation, but it's by no means clear.
Edit to follow on from #twalberg's answer...
>>> input={'a':'val\ue'}
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec cant decode bytes in position 3-5: truncated \uXXXX escape
>>> input={'a':'val\\ue'}
>>> input
{'a': 'val\\ue'}
>>> json.dumps(input)
'{"a": "val\\\\ue"}'
>>> json.loads(json.dumps(input))
{'a': 'val\\ue'}
>>> json.loads(json.dumps(input))['a']
'val\\ue'
Using json.dumps() to see how json would represent your target string:
>>> orig = { 'a' : 'val\ue' }
>>> jstring = json.dumps(orig)
>>> print jstring
{"a": "val\\ue"}
>>> extracted = json.loads(jstring)
>>> print extracted
{u'a': u'val\\ue'}
>>> print extracted['a']
val\ue
>>>
This was in Python 2.7.3, though, so it may be only partially relevant to your Python 3.x environment. Still, I don't think JSON has changed that much...

json.JSONDecoder().decode() can not work well

code is simple, but it can not work. I don't know the problem
import json
json_data = '{text: \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", key: \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'
my_data = json.JSONDecoder().decode(json_data)
print my_data
throw exption behinde:
Traceback (most recent call last):
File "D:\Python27\project\demo\digSeo.py", line 4, in <module>
my_data = json.JSONDecoder().decode(json_data)
File "D:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
Your json_data is not valid JSON.
In JSON, property names need to be in double quotes ("). Also, the double quotes terminating the string values don't need to be ecaped since you're already using single quotes (') for the string.
Example:
json_data = '{"text": "tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=", "key": "MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW"}'
The json module in Python standard library can work well, that's what a lot of people are using for their applications.
However these few lines of code that use this module have a small issue. The problem is that your sample data is not a valid JSON. The keys (text and key) should be quoted like this:
json_data = '{"text": \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", "key": \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'

Categories

Resources