How to convert from TSV file to JSON file?

How to convert from TSV file to JSON file? - python

So I know this question might be duplicated but I just want to know and understand how can you convert from TSV file to JSON? I've tried searching everywhere and I can't find a clue or understand the code.
So this is not the Python code, but it's the TSV file that I want to convert to JSON:
title content difficulty
week01 python syntax very easy
week02 python data manipulation easy
week03 python files and requests intermediate
week04 python class and advanced concepts hard
And this is the JSON file that I want as an output.
[{
"title": "week 01",
"content": "python syntax",
"difficulty": "very easy"
},
{
"title": "week 02",
"content": "python data manipulation",
"difficulty": "easy"
},
{
"title": "week 03",
"content": "python files and requests",
"difficulty": "intermediate"
},
{
"title": "week 04",
"content": "python class and advanced concepts",
"difficulty": "hard"
}
]

The built-in modules you need for this are csv and json.
To read tab-separated data with the CSV module, use the delimiter="\t" parameter:
Even more conveniently, the CSV module has a DictReader that automatically reads the first row as column keys, and returns the remaining rows as dictionaries:
with open('file.txt') as file:
reader = csv.DictReader(file, delimiter="\t")
data = list(reader)
return json.dumps(data)
The JSON module can also write directly to a file instead of a string.

if you are using pandas you can use the to_json method with the option orient="records"to obtain the list of entries you want.
my_data_frame.to_json(orient="records")

Related

Strange formatting on append - JSON

recently I have been working on a project, and I needed to append a list of dictionaries to my existing JSON file. But it behaves somewhat strangely.
Here is what I have:
def write_records_to_json(json_object):
with open("tracker.json", "r+") as f:
json_file = json.load(f)
json_file.append(json_object)
print(json_file)
This is the object I'm trying to append(The object is formatted this way):
[
{
"file": "dnc_complaint_numbers_2021-12-03.csv",
"date": "2021-12-03"
}
]
And this is what I get(Pay attention to the end):
Excuse me please, for not having it more readable.
[{'file': 'dnc_complaint_numbers_2021-12-01.csv', 'date': '2021-12-01'}, {'file': 'dnc_complaint_numbers_2021-12-02.csv', 'date': '2021-12-02'}, '[\n {\n "file": "dnc_complaint_numbers_2021-12-03.csv",\n "date": "2021-12-03"\n }\n]']
Can someone tell me why is that and how to fix it? Thanks a lot.

From your code and output, we can infer that json_object refers to a string. This string contains JSON. json_file is not JSON, it is a list that is deserialised from JSON.
If you want to add json_object to json_file you should first deserialise the former:
json_file.extend(json.loads(json_object))
You also want to use extend instead of append here, so it is on the same level as the rest of the data.

Error: "Cannot find reference to loads in json.py"

I am a newbie to python. I am learning the how python works with json.
After writing this code in pycharm, I am getting unresolved references at several locations.
"Import resolves to its containing file".
"Cannot find reference to dumps in json.py "
"Cannot find reference to loads in json.py "
I am getting this error while importing json, loads() and dumps() method are called.
This is the video link from where I am learning to code python.
https://www.youtube.com/watch?v=9N6a-VLBa2I&list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU&index=44
Please help me in resolving this.
import json
# Decoding json string to python.
# This a python string that happens to be a valid json also.
people_string = '''
{
"people": [
{
"name": "Sumedha",
"phone":"0987654312"
"City": "Middletown"
},
{
"name": "Ankit",
"phone":"9999999999"
"City": "Middletown2"
},
{
"name": "Hemlata",
"phone":"9865656475"
"City": "Chandigarh"
}
]
}
'''
# loads method loads the string.
data = json.loads(people_string)
for person in data['people']:
print(person['name'])
del person['phone']
new_string = json.dumps(data, indent=2, sort_keys=True)

You named your test script json.py, so it's shadowing the built-in json module, preventing you from importing the built-in module, making import json try to import itself (that's what "Import resolves to its containing file" is trying to warn you about). Name your script something else (e.g. jsontest.py) and it will work.

The error seems to appear when you have both JSON & pandas packages in the same python environment.
I found a cheap solution by having a separate project file for pandas & JSON so that I install the packages in different virtual environments.

Append geoJSON feature with Python?

I have the following structure in a geojsonfile:
{"crs":
{"type": "name",
"properties":
{"name": "urn:ogc:def:crs:EPSG::4326"}
},
"type": "FeatureCollection",
"features": [
{"geometry":
{"type": "Polygon",
"coordinates": [[[10.914622377957983, 45.682007076150505],
[10.927456267537572, 45.68179119797432],
[10.927147329501077, 45.672795442796335],
[10.914315493899755, 45.67301125363092],
[10.914622377957983, 45.682007076150505]]]},
"type": "Feature",
"id": 0,
"properties": {"cellId": 38}
},
{"geometry":
{"type": "Polygon",
"coordinates":
... etc. ...
I want to read this geoJSON into Google Maps and have each cell colored based on a property I calculated in Python for each cell individually. So my most question would be: How can I read the geoJSON in with Python and add another property to these Polygons (there are like 12 000 polygons, so adding them one by one is not an option), then write the new file?
I think what I'm looking for is a library for Python that can handle geoJSON, so I don't have to add these feature via srting manipulation.

There is a way with Python geojson package.
Like that, you can read the geojson has an object:
import geojson
loaded = geojson.loads("Any geojson file or geojson string")
for feature in loaded.features[0:50]: #[0:50] for the only first 50th.
print feature
There is Feature, FeatureCollection and Custom classes to help you to add your attributes.

The geoJSON is just a JSON doc (a simplification but is all you need for this purpose). Python reads that as a dict object.
Since dict are updated inplace, we don't need to store a new variable for the geo objects.
import json
# read in json
geo_objects = json.load(open("/path/to/files"))
# code ...
for d in geo_objects:
d["path"]["to"]["field"] = calculated_value
json.dump(geofiles, open("/path/to/output/file"))
No string manipulation needed, no need to load new library!

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but it's not in one line. As per this thread, a dictionary in the json file should be in one line which is a limitation of Apache Spark. I don't have it structured that way.
My JSON schema looks like this -
{
"dataset": [
{
"key1": [
{
"range": "range1",
"value": 0.0
},
{
"range": "range2",
"value": 0.23
}
]
}, {..}, {..}
],
"last_refreshed_time": "2016/09/08 15:05:31"
}
Here are my questions -
Can I avoid converting these files to match the schema required by Apache Spark (one dictionary per line in a file) and still be able to read it?
If not, what's the best way to do it in Python? I have a bunch of these files for each day in the bucket. The bucket is partitioned by day.
Is there any other tool better suited to query these files other than Apache Spark? I'm on AWS stack so can try out any other suggested tool with Zeppelin notebook.

You could use sc.wholeTextFiles() Here is a related post.
Alternatively, you could reformat your json using a simple function and load the generated file.
def reformat_json(input_path, output_path):
with open(input_path, 'r') as handle:
jarr = json.load(handle)
f = open(output_path, 'w')
for entry in jarr:
f.write(json.dumps(entry)+"\n")
f.close()

Writing BSON to disk

I'm storing hierarchical data in a format similar to JSON:
{
"preferences": {
"is_latest": true,
"revision": 18,
// ...
},
"updates": [
{ "id": 1, "content": "..." },
// ...
]
}
I'm writing this data to disk and I'd like to store it efficiently. I assume that, towards this end, BSON would be more efficient as a storage format than raw JSON.
How can I read and write BSON trees to/from disk in Python?

I haven't used it, but it looks like there is a bson module on PyPI:
https://pypi.python.org/pypi/bson
The project is hosted in GitHub here:
https://github.com/martinkou/bson

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert from TSV file to JSON file? - python

if you are using pandas you can use the to_json method with the option orient="records"to obtain the list of entries you want. my_data_frame.to_json(orient="records")

Related

Strange formatting on append - JSON

Error: "Cannot find reference to loads in json.py"

Append geoJSON feature with Python?

Reading pretty print json files in Apache Spark

Writing BSON to disk

Categories

Resources