How can I sort json data for dumping? - python

I am trying to dump JSON data in Python to a file.
I receive the data as an ImmutableMultiDict from a Flask post request.
It looks as follows: ImmutableMultiDict([('prefix', ''), ('key1', 'value1'), ('key2', 'value2')])
The data should look like this in the file:
{ "prefix": [
{"key1": "value1"},
{"key2" : "value2"}
]
}
The prefix as well as all the other data is part of the post request. My question now is: How can I json.dump the ImmutableMultiDict so it appears like this in the file? Right now it looks like this:
{
"prefix": "",
"key1": "value1",
"key2": "value2",
}
The reason why I want to do it the other way is because I want to append data later on by appending it to the array with the "prefix" key. Can anyone show me a way to do this properly please?
Thanks.
EDIT:
Ok. I fixed it so it looks the way it should now. The Python code:
def write_to_json(file, data, prefix):
with open(file, "a", encoding="utf-8") as f:
json.dump({prefix : list(data)}, f, indent=4)
Result:
{
"prefix": [
"key1",
"key2"
]
}

As you did not mentioned anything about it, I'd recommend to check ImmutableMultiDict#to_dict() with flat parameter, see documentation.
Maybe I am understanding incorrectly, but are you trying to have multiple keys for "prefix"? This is opposing to the definition of JSON and will not work as the keys of a dict (MultiDict) uses something like a set for store their keys.

Related

Strange formatting on append - JSON

recently I have been working on a project, and I needed to append a list of dictionaries to my existing JSON file. But it behaves somewhat strangely.
Here is what I have:
def write_records_to_json(json_object):
with open("tracker.json", "r+") as f:
json_file = json.load(f)
json_file.append(json_object)
print(json_file)
This is the object I'm trying to append(The object is formatted this way):
[
{
"file": "dnc_complaint_numbers_2021-12-03.csv",
"date": "2021-12-03"
}
]
And this is what I get(Pay attention to the end):
Excuse me please, for not having it more readable.
[{'file': 'dnc_complaint_numbers_2021-12-01.csv', 'date': '2021-12-01'}, {'file': 'dnc_complaint_numbers_2021-12-02.csv', 'date': '2021-12-02'}, '[\n {\n "file": "dnc_complaint_numbers_2021-12-03.csv",\n "date": "2021-12-03"\n }\n]']
Can someone tell me why is that and how to fix it? Thanks a lot.
From your code and output, we can infer that json_object refers to a string. This string contains JSON. json_file is not JSON, it is a list that is deserialised from JSON.
If you want to add json_object to json_file you should first deserialise the former:
json_file.extend(json.loads(json_object))
You also want to use extend instead of append here, so it is on the same level as the rest of the data.

JSON File multiple roots

I have a JSON file I'm trying to manipulate in python but it seems the json formating is not correct:
{{"ticket":{"id":"123", "name":"bill"}},
{"ticket":{"id":"1234", "name":"james"}}}
When i try to format it using a json formatter it gives me Error multiple root elements
How can I fix it?
Update: with the suggestion from funqkey i updated the script:
import json
with open('ticketData8242020-6152021.json', 'r') as f:
data = f.read()
data = json.loads(data)
There is something wrong with the file. I will attempt to remove the ticket object references from the file. to fix it. Thanks everyone.
The problems here include
ticket needs to be in quotes
When you have multiple objects, you need a list, not a dict
You can't have an object with multiple "ticket" keys.
I SUSPECT what you want is a list of objects, like this:
[{"id":"123", "name":"bill"}, {"id":"1234", "name":"james"}]
Or maybe a list of objects with one entry each, as funqkey suggested:
[{"ticket":{"id":"123", "name":"bill"}}, {"ticket":{"id":"1234", "name":"james"}}]
# Should look like this [{"ticket": {"id": "123", "name": "bill"}}, {"ticket": {"id": "1234", "name": "james"}}]
import json
with open('ticketData8242020-6152021.json', 'r') as f:
data = f.read()
data = json.loads(data)
In JSON, the keys should be quoted using ". Therefore
{{ticket:{"id":"123", "name":"bill"}}, {ticket:{"id":"1234", "name":"james"}}}
is not a valid JSON. The corrected version is
{{"ticket":{"id":"123", "name":"bill"}}, {"ticket":{"id":"1234", "name":"james"}}}
You can validate your JSON online: JSON Online Validator and Formatter - JSON Lint

How to convert from TSV file to JSON file?

So I know this question might be duplicated but I just want to know and understand how can you convert from TSV file to JSON? I've tried searching everywhere and I can't find a clue or understand the code.
So this is not the Python code, but it's the TSV file that I want to convert to JSON:
title content difficulty
week01 python syntax very easy
week02 python data manipulation easy
week03 python files and requests intermediate
week04 python class and advanced concepts hard
And this is the JSON file that I want as an output.
[{
"title": "week 01",
"content": "python syntax",
"difficulty": "very easy"
},
{
"title": "week 02",
"content": "python data manipulation",
"difficulty": "easy"
},
{
"title": "week 03",
"content": "python files and requests",
"difficulty": "intermediate"
},
{
"title": "week 04",
"content": "python class and advanced concepts",
"difficulty": "hard"
}
]
The built-in modules you need for this are csv and json.
To read tab-separated data with the CSV module, use the delimiter="\t" parameter:
Even more conveniently, the CSV module has a DictReader that automatically reads the first row as column keys, and returns the remaining rows as dictionaries:
with open('file.txt') as file:
reader = csv.DictReader(file, delimiter="\t")
data = list(reader)
return json.dumps(data)
The JSON module can also write directly to a file instead of a string.
if you are using pandas you can use the to_json method with the option orient="records"to obtain the list of entries you want.
my_data_frame.to_json(orient="records")

JSON file isn't finished writing to by the time I load it, behave BDD

My program is writing to a JSON file, and then loading, reading, andPOSTing it. The writing part is being done by the behave BDD.
# writing to the JSON file is done by behave
data = json.load(open('results.json', 'r'))
r = requests.post(MyAPIEndpoint, json=data)
I'm running into an issue since the writing is not being completed before I begin loading. (It's missing the closing [ after the final {.)
HOOK-ERROR in after_all: ValueError: Expecting object: line 2 column 2501 (char 2502)
Is there a way of getting by this, either by changing something with my call to behave's __main__ or by a change in how or when I'm loading the JSON file?
i think the problem here is little mixed, in one part, you can wait to the file finish to be written, and close it when is not in use, you can do that inside your code or somwthing like this
check if a file is open in Python
In other part, for the computer, the data is data, don't will analysis that, i means, you know where is the error, because you analise that, when you think, so for you is obvs, but for the computer isn't obvs, how much errores there is?, and where or the structure of it?, is there all the data we need?, for a computer is hard know all of this, you will need write a program to can deduce all of this data and check it.
If your program uses multiples results, i think the better ways is use temp files, so you can freely create one, write, check when is ready and use it, and don't will care if you have other similar process using it.
Other way, check if the json is valid before call it, Python: validate and format JSON files, when is valid load it.
Hope can help.
Cya.
One way to address this problem is to change your file format from being JSON at the top level to newline-delimited JSON (NDJSON), also called line-delimited JSON (LDJSON) or JSON lines (JSONL).
https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON
For example, this JSON file:
{
"widgets": [
{"name": "widget1", "color": "red"},
{"name": "widget2", "color": "green"},
{"name": "widget3", "color": "blue"}
]
}
Would become this NDJSON file:
{"name": "widget1", "color": "red"}
{"name": "widget2", "color": "green"}
{"name": "widget3", "color": "blue"}
It's especially useful in the context of streaming data, which kind of sounds like the use case you have where you might have one process writing to a file continuously while another is reading it.
You could then read the NDJSON file like so:
import json
from pprint import pprint
with open('widgets.json') as f:
all_lines = [json.loads(l) for l in f.readlines()]
all_data = {'widgets': all_lines}
pprint(all_data)
Output:
{'widgets': [{'color': 'red', 'name': 'widget1'},
{'color': 'green', 'name': 'widget2'},
{'color': 'blue', 'name': 'widget3'}]}

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but it's not in one line. As per this thread, a dictionary in the json file should be in one line which is a limitation of Apache Spark. I don't have it structured that way.
My JSON schema looks like this -
{
"dataset": [
{
"key1": [
{
"range": "range1",
"value": 0.0
},
{
"range": "range2",
"value": 0.23
}
]
}, {..}, {..}
],
"last_refreshed_time": "2016/09/08 15:05:31"
}
Here are my questions -
Can I avoid converting these files to match the schema required by Apache Spark (one dictionary per line in a file) and still be able to read it?
If not, what's the best way to do it in Python? I have a bunch of these files for each day in the bucket. The bucket is partitioned by day.
Is there any other tool better suited to query these files other than Apache Spark? I'm on AWS stack so can try out any other suggested tool with Zeppelin notebook.
You could use sc.wholeTextFiles() Here is a related post.
Alternatively, you could reformat your json using a simple function and load the generated file.
def reformat_json(input_path, output_path):
with open(input_path, 'r') as handle:
jarr = json.load(handle)
f = open(output_path, 'w')
for entry in jarr:
f.write(json.dumps(entry)+"\n")
f.close()

Categories

Resources