I read in the JSON file as Pandas data frame. Now I want to print the JSON object schema. I look around and mostly I only saw links to do this online but my file is too big (almost 11k objects/lines). I'm new at this so I was wondering is there a function/code that I can do this in python?
What I have so far...
import json
import pandas as pd
df = pd.read_json('/Users/name/Downloads/file.json', lines = True)
print(df)
I can't add a comment, but, maybe if you try to convert the df into json inside a variable and then print the variable.
you can if you use the pydantic and datamodel-code-generator libraries.
Use datamodel-code-generator to produce the Python model and then use pydantic to print out the schema from the model.
Related
I am trying to read a CSV file in Pandas, convert each row in to a JSON object and append them to a dict and then store in MongoDB.
Here is my code
data = pd.DataFrame(pd.read_csv('data/airports_test.csv'))
for i in data.index:
json = data.apply(lambda x: x.to_json(), axis=1)
json_dict = json.to_dict()
print(json_dict[5])
ins = collection.insert_many(json_dict)
# for i in json_dict:
# ins = collection.insert_one(json_dict[i])
If I print elements of the dict I get the correct output (I think..). If I try to use collection.insert_many, I get the error 'documents must be a non empty list' If I try to loop through the dict and add one at a time I get the error
document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
I have Googled and Googled but I can't seem to find a solution! Any help would be massively appreciated.
You can skip processing the individual rows of the DataFrame via:
import json
import pandas
data = pandas.DataFrame(pandas.read_csv('test2.csv'))
data = data.to_dict(orient="records")
collection.insert_many(data)
As an aside, I think I would personally use the csv module and dictReader rather than pandas here but this way is fine.
My task is to take an output from a machine, and convert that data to json. I am using python, but the issue is the structure of the output.
From my research online, csv usually has the first row with the keys and the values in the same order underneath. Example: https://e.nodegoat.net/CMS/upload/guide-import_person_csv_notepad.png
However, the output from my machine doesn't look like this.
Mine looks like:
Date:,10/10/2015
Name:,"Company name"
Location:,"Company location"
Serial num:,"Serial number"
So the machine i'm working with is outputting each result on a new .dat file instead of appending onto a single csv with the row of keys and whatnot. Technically, yes the data is separated with csv, but not sure how to work with this.
How should I go about turning this kind of data to json? Should I look into restructuring the data to the default csv? Or is there a way I can work with this and not do any cleanup to convert this? In either case, any direction is appreciated.
You can try transpose using pandas
import pandas as pd
from io import StringIO
data = '''\
Date:,10/10/2015
Name:,"Company name"
Location:,"Company location"
Serial num:,"Serial number"
'''
f = StringIO(data)
df = pd.read_csv(f)
t = df.set_index(df.columns[0]).T
print(t['Location:'][0])
print(t['Serial num:'][0])
I am looking for a way to convert the CSV data into JSON data without needing it to save it another JSON file. Is it possible?
So the following functionality needs to be carried out.
Sample Code:
df= pd.read_csv("file_xyz.csv").to_json("another_file.json")
data_json = pd.read_json("another_file.json")
Now, if I had to do the same thing without having to save my data in "another_file.json". I want the data_json to have JSON data by directly performing some operations on CSV data. Is it possible? How can we do that?
Use DataFrame.to_json without filename:
j = pd.read_csv("file_xyz.csv").to_json()
Or if want convert output to dictionary for next processing use DataFrame.to_dict:
d = pd.read_csv("file_xyz.csv").to_dict()
I am Trying to convert the YAML Data to Data frame through pandas with yamltodb package. but it is showing only the single row enclosed with header and only one data is showing. I tried to convert the Yaml file to JSON file and then tried normalize function. But it is not working out. Attached the screenshot for JSON function output. I need to categorize it under batman, bowler and runs etc. Code
Output Image and their code..
Just guessing, as I don’t know what your data actually looks like
import pandas as pd
import yaml
with open('fName.yaml', 'r') as f:
df = pd.io.json.json_normalize(yaml.load(f))
df.head()
I am trying to write my JSON structured data to a JSON file. js dataframe contains the JSON data like this:
[{"variable":"Latitude","min":26.845043,"Q1":31.1972475},{"variable":"Longitude","min":-122.315002,"Q1":-116.557795},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
But when I write it to a file, the data gets stored differently. Could you please help me to store the result as like it is in the dataframe(js)?
"[{\"variable\":\"Latitude\",\"min\":26.845043,\"Q1\":31.1972475},{\"variable\":\"Longitude\",\"min\":-122.315002,\"Q1\":-116.557795},{\"variable\":\"Zip\",\"min\":20910.0,\"Q1\":32788.5}]"
Code:
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
js = df.to_json(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)
import pandas as pd
import json
df = pd.read_csv("temp.csv")
# it will dump json to file
df.to_json("filename.json", orient="records")
Output as filename.json:
[{"variable":"Latitude","min":26.84505,"Q1":31.19725},{"variable":"Longtitude","min":-122.315,"Q1":-116.558},{"variable":"Zip","min":20910.0,"Q1":32788.5}]
I think you're double-encoding your data - df.to_json converts the data to a JSON string. Then you're running json.dump which then encodes that already-encoded string as JSON again - which results in wrapping your existing JSON in quote marks and escaping all the inner quote marks with a backslash You end up with JSON-within-JSON, which is not necessary or desirable.
You should use one or other of these methods, but not both together. It's probably easiest to use df.to_json to serialise the dataframe data accurately, and then just write the string directly to a file as text.
Talk is so cheap ,why not let me show you the code ?
import csv
import json
import pandas as pd
df = pd.read_csv(r'C:\Users\spanda031\Downloads\DQ_RESUlT.csv')
// where magic happends! :)
js = df.to_dict(orient="records")
print(js)
# Read JSON file
with open('C:\\Users\\spanda031\\Downloads\\data.json', 'w') as data_file:
json.dump(js,data_file)