Different behavior of json.dump() on Windows and Linux - python

I wrote a python script to retrieve data from a website in json format using the requests library, and then I dump it into a json file. I have written a lot of code utilizing this data and have tested it in Windows only. Recently I shifted to a Linux system, and when the same python script is executed, the order of the keys in the json file is completely different.
This is the code I'm using:
API_request = requests.get('https://www.abcd.com/datarequest')
alertJson_Data = API_request.json() # To convert returned data to json
json.dump(alertJson_Data, jsonDataFile) # for adding the json data for the alert to the file
jsonDataFile.write('\n')
jsonDataFile.close()
A lot of my other scripts depends on the ordering of the keys in this json file, so is there any way to maintain the same ordering that is used in Windows to be used in Linux as well?
For example in Windows the order is "id":, "src":, "dest":, whereas in Linux its completely different. If I directly go to the Web link on my browser, it has the same ordering as the one saved in Windows. How do I retain this ordering?

Can you use collections.OrderedDict when loading json?
e.g
from collections import OrderedDict
alertJson_Data = API_request.json(object_pairs_hook=OrderedDict)
should works, because json() method implemented on requests take the same optional arguments as json.loads
json(**kwargs)
Returns the json-encoded content of a response, if any.
Parameters **kwargs – Optional arguments that json.loads takes. Raises
ValueError – If the response body does not contain valid json.
And the doc of json.loads specify:
object_hook, if specified, will be called with the result of every
JSON object decoded and its return value will be used in place of the
given dict. This can be used to provide custom deserializations (e.g.
to support JSON-RPC class hinting).
object_pairs_hook, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value of object_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hook takes priority.

Related

Is it safe to use the default load in ruamel.yaml

I can load and dump YAML files with tags using ruamel.yaml, and the tags are preserved.
If I let my customers edit the YAML document, will they be able to exploit the YAML vulnerabilities because of arbitrary python module code execution? As I understand it ruamel.yaml is derived from PyYAML that has such vulnerabilities according to its documentation.
From your question I deduct you are using .load() method on a YAML() instance as in:
import ruamel.yaml
yaml = ruamel.yaml.YAML()
data = yaml.load(some_path)
and not the old PyYAML load compatible function (which cannot handle unknown tags, and which can be unsafe). The short answer is that yes that is safe as no interpretation of tags is done without calling any tag dependent code, the tags just get assigned (as strings) to known types (CommentedMap, CommentedSeq, TaggedScalar for mapping, sequence and scalar respectively).
The vulnerability in PyYAML comes from its Constructor, that is part of the unsafe Loader. This was the default for PyYAML for the longest time and most people used it even when they could have used the safeloader because they were not using any tags (or could have regeistred the tags they needed against the SafeLoader/Constructor). That unsafe Constructor is a subclass of SafeConstructor and what makes it unsafe are the multi-methods registered for the interpretation of python/{name,module,object,apply,new):.... tags, essentially dynamically interpreting these tags to load modules and run code (i.e. execute functions/methods/instantiate classes).
The initial idea behind ruamel.yaml was its RoundTripConstructor , which is also a subclass of the SafeConstructor. You used to get this using the now deprecated round_trip_load function and nowadays via the .load() method after using YAML(typ='rt'), this is also the default for a YAML() instance without typ argument. That RoundTripConstructor does not registers any tags or have any code that interprets tags above and beyond the normal tags that the SafeConstructor uses.
Only the old PyYAML load and ruamel.yaml using typ='unsafe' can execute arbitrary Python code on doctored YAML input by executing code via the !!python/.. tags.
When you use typ='rt' the tag on nodes is preserved, even when a tag is not registered. This is possible because while processing nodes (in round-trip mode), those tags will just be assigned as strings to an attribute on the constructed type of the tagged node (mapping, sequence, or scalar, with the exception of tagged null). And when dumping these types, if they have a tag, it gets re-inserted into the representation processing code. At no point is the tag evaluated, used to import code, etc.

How to separate data in a Restful API?

I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.

Python Dictionary JSON Key, Val Extraction

I'm having a hard time understanding what is going on with this walmart API and I can't seem to iterate through key, values like I wish. I get different errors depending on the way I attack the problem.
import requests
import json
import urllib
response=requests.get("https://grocery.walmart.com/v0.1/api/stores/4104/departments/1256653758154/aisles/1256653758260/products?count=60&start=0")
info = json.loads(response.text)
print(info)
I'm not sure if I'm playing with a dictionary or a JSON object.
I'm thrown off because the API itself has no quotes over key/val.
When I do a json.loads it comes in but only comes in with single quotes.
I've tried going at it with for-loops but can only traverse the top layer and nothing else. My overall goal is to retrieve the info from the API link, turn it into JSON and be able to grab which ever key/val I need from it.
I'm not sure if I'm playing with a dictionary or a JSON object.
Python has no concept of a "JSON Object". It's a dictionary.
I'm thrown off because the API itself has no quotes over key/val.
Yes it does
{"aisleName":"Organic Dairy, Eggs & Meat","productCount":17,"products":[{"data":
When I do a json.loads it comes in but only comes in with single quotes
Because it's a Python dictionary, and the repr() of dict uses single quotes.
Try print(info['aisleName']) for example

JSON with Python "object must be str, not dict"

I'm trying to interpret data from the Twitch API with Python. This is my code:
from twitch.api import v3
import json
streams = v3.streams.all(limit=1)
list = json.loads(streams)
print(list)
Then, when running, I get:
TypeError, "the JSON object must be str, not 'dict'"
Any ideas? Also, is this a method in which I would actually want to use data from an API?
Per the documentation json.loads() will parse a string into a json hierarchy (which is often a dict). Therefore, if you don't pass a string to it, it will fail.
json.loads(s, encoding=None, cls=None, object_hook=None,
parse_float=None, parse_int=None, parse_constant=None,
object_pairs_hook=None, **kw) Deserialize s (a str instance containing
a JSON document) to a Python object using this conversion table.
The other arguments have the same meaning as in load(), except
encoding which is ignored and deprecated.
If the data being deserialized is not a valid JSON document, a
JSONDecodeError will be raised.
From the Twitch API we see that the object being returned by all() is a V3Query. Looking at the source and documentation for that, we see it is meant to return a list. Thus, you should treat that as a list rather than a string that needs to be decoded.
Specifically, the V3Query is a subclass of ApiQuery, in turn a subclass of JsonQuery. That class explicitly runs the query and passes a function over the results, get_json. That source explicitly calls json.loads()... so you don't need to! Remember: never be afraid to dig through the source.
after streams = v3.streams.all(limit=1)
try using
streams = json.dumps(streams)
As the streams should be a JSON string and be in the form:
'{"key":value}'
instead of just dict form:
{"key":value}

json object as get parameter

I'm writing API for a mongo database. I need to pass JSON object as GET parameter:
example.com/api/obj/list/1/?find={"foo":"bar"}
How should I organize this better?
I thought about using JSON-like objects without quotes and spaces, for example:
{$or:[{a:foo+bar},{b:2}]}
So is there any tools to parse it in Python/Django?
It should be fine as long as the JSON objects aren't too big, they don't contain sensitive data (it sucks to see your password in your browser history) and you URL-escape them.
Unfortunately, you have to take shortcuts if you want to have a human-readable JSON parameter. All JSON brackets ({, }, [, ]) are recommended for escaping. You don't have to escape them, but you are taking a risk if you don't. More annoying is the :, which is ubiquitous in JSON and must be escaped.
If you want human-readable query strings, then the sensible solution is to encode all query parameters explicitly. A compromise that might work quite well is to unpack the top-level JSON object into explicit query parameters, each of remains JSON-encoded. Going a small step further, you could drop any top-level delimiters that remain, e.g.:
JSON: {"foo":"bar", "items":[1, 2, 3], "staff":{"id":432, "first":"John", "last":"Doe"}}
Query: foo=bar&items=1,2,3&staff="id"%3A432,"first"%3A"John","last"%3A"Doe"
Since you know that foo is a string, items is an array and staff is an object, you can rehydrate the JSON syntax correctly before sending the lot to a JSON parser.

Categories

Resources