Insert Multiple JSON files to MongoDB using python - python

The JSON files are as follows a.json,b.json.....z.json (26 json files)
The json format of each of the file looks as:
{
"a cappella": {
"word": "a cappella",
"wordset_id": "5feb6f679a",
"meanings": [
{
"id": "492099d426",
"def": "without musical accompaniment",
"example": "they performed a cappella",
"speech_part": "adverb"
},
{
"id": "0bf8d49e2e",
"def": "sung without instrumental accompaniment",
"example": "they sang an a cappella Mass",
"speech_part": "adjective"
}
]
},
"A.D.": {
"word": "A.D.",
"wordset_id": "b7e9d406a0",
"meanings": [
{
"id": "a7482f3e30",
"def": "in the Christian era",
"speech_part": "adverb",
"synonyms": [
"AD"
]
}
]
},.........
}
How could I store these in MongoDB such that if queried with word the results shows meanings,synonyms(if available)?
I have never used Mongo on how to approach, but the same was done with SO suggestions for a single json file in mysql as:
**cursor has db connection
with open('a.json') as f:
d = json.load(f)
for word in d:
word_obj = d[word]
wordset_id = word_obj['wordset_id']
sql = "INSERT INTO Word (word, wordset_id) VALUES (%s, %s)"
values = (word, wordset_id)
cursor.execute(sql, values)
conn.commit()
similarly to store meanings and synonyms as different tables,
But as suggetsed I guess this would become better if MongoDB is used

If you want to insert data from multiple .json files, do it in a loop:
file_names = ['a.json', 'b.json', ...]
for file_name in file_names:
with open(file_name) as f:
file_data = json.load(f) # load data from JSON to dict
for k, v in file_data.items(): # iterate over key-value pairs
collection.insert_one(v) # your collection object here

Related

Parse JSON document to records for PostgreSQL with Python

I am trying to parse a JSON document into records to store them in PostgreSQL using Python. I am new to this. Trying to put together two strings. A header string and a value string. The structure of a JSON document with nested vocabularies is difficult.
# import Python's JSON lib
import json
# import the new JSON method from psycopg2
from psycopg2.extras import Json
dict = {"results": [
{
"communication_type": "ChatSite",
"conversation": [
{
"created_at": "2021-11-26 23:30:20",
"id": "b29530e3-69ff-4798-abb1-abc17d4d44b5",
"int_referer": "link1",
"result": "failure",
"visitor_id": "account:206867:site:167330:visitor:ybrr4e43f3hj8aor"
}
],
"duration": 53,
"first_answer_time": null
},
{
"communication_type": "ChatSite",
"conversation": [
{
"created_at": "2021-11-26 23:34:00",
"id": "e8f7e9bf-e836-4643-a30c-8bcbeffc397a",
"int_referer": "link2",
"result": "failure",
"visitor_id": "account:206867:site:167330:visitor:iosbe9bfqbfswcdi"
}
],
"duration": 16,
"first_answer_time": null
},
]
}
a = list(dict.values())
b = a[0]
# use JSON loads to create a list of records
record_list = json.loads(b)
# create a nested list of the records' values
values = [list(x.values()) for x in record_list]
# get the column names
columns = [list(x.keys()) for x in record_list][0]
##print(columns)
# value string for the SQL string
values_str = ""
# enumerate over the records' values
for i, record in enumerate(b):
# declare empty list for values
val_list = []
# append each value to a new list of values
for v, val in enumerate(record):
if type(val) == str:
val = str(Json(val)).replace('"', '')
val_list += [ str(val) ]
# put parenthesis around each record string
values_str += "(" + ', '.join( val_list ) + "),\n"
# remove the last comma and end SQL with a semicolon
values_str = values_str[:-2] + ";"
#print(values_str)
# concatenate the SQL string
table_name = "json_data"
sql_string = "INSERT INTO %s (%s)\nVALUES %s" % (
table_name,
', '.join(columns),
values_str
)
print (sql_string)
Please help me how to fix it?
Traceback (most recent call last):
File "c:\Dev\livetex-master\Two.py", line 45, in <module>
record_list = json.loads(b)
File "C:\Users\ANISA4\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not list
I am trying to output a header string like:
"communication_type", "created_at" "id" "int_referer" "result", "visitor_id", "duration", "first_answer_time"
The dict is not a json string, you have to have it in raw format and then load it with json.loads() => json load string
{
"communication_type": "ChatSite",
"conversation": [
{
"created_at": "2021-11-26 23:30:20",
"id": "b29530e3-69ff-4798-abb1-abc17d4d44b5",
"int_referer": "link1",
"result": "failure",
"visitor_id": "account:206867:site:167330:visitor:ybrr4e43f3hj8aor"
}
],
"duration": 53,
"first_answer_time": null
}
should be
"{
"communication_type": "ChatSite",
"conversation": [
{
"created_at": "2021-11-26 23:30:20",
"id": "b29530e3-69ff-4798-abb1-abc17d4d44b5",
"int_referer": "link1",
"result": "failure",
"visitor_id": "account:206867:site:167330:visitor:ybrr4e43f3hj8aor"
}
],
"duration": 53,
"first_answer_time": null
}"
also
a = list(dict.values())
returns list in list since your results is list
[[...data...]]

To retrieve specific data from multiple similar portions in a .json file

A part of json file's content as below:
{"ID": "PK45", "People": "Kate", "Date": "2020-01-05"}, {"ID": "OI85", "People": "John", "Date": "2020-01-18" }, {"ID": "CN658", "People": "Pevo", "Date": "2020-02-01" }
It has multiple portions containing "ID", "People" and "Date".
What I want to do is to retrieve John's ID (in this case "OI85". )
If the key is unique, I can use:
data_content = json.loads(data)
ID = data_content['ID']
But there are multiple similar portions. So I can only locate the "John" first:
with open("C:\\the_file.json") as data_file:
data = data_file.read()
where = data.find('John')
where_1 = data[where - 20 : where]
ID = where_1[where_1.find('ID')+3:where_1.find('ID')+7]
print (ID)
It looked clumsy.
What will be the smart json way to retrieve the specific data from multiple similar portions in a .json file?
Thank you.
Iterate on the list of dicts until you find the right one:
import json
data = '[{"ID": "PK45", "People": "Kate", "Date": "2020-01-05"}, {"ID": "OI85", "People": "John", "Date": "2020-01-18" }, {"ID": "CN658", "People": "Pevo", "Date": "2020-02-01" }]'
data_content = json.loads(data)
def find_id_by_name(name, data):
for d in data:
if d['People'] == name:
return d["ID"]
else:
raise ValueError('Name not found')
print(find_id_by_name('John', data_content))
# OI85
print(find_id_by_name('Jane', data_content))
# ... ValueError: Name not found
If you have to do many such searches, it may be worth creating another dict from your data to associate IDs to names:
ids_by_name = {d['People']:d['ID'] for d in data_content}
print(ids_by_name['John'])
# OI85
You probably should use the json module, which makes the task trivial:
import json
with open('data.json') as f:
records = json.load(f)
for record in records:
if record['People'] == 'John':
print(record['ID'])
break

How to insert JSON file data in to table

I have a sample json file named a.json
The json data in a.json is as:
{
"a cappella": {
"word": "a cappella",
"wordset_id": "5feb6f679a",
"meanings": [
{
"id": "492099d426",
"def": "without musical accompaniment",
"example": "they performed a cappella",
"speech_part": "adverb"
},
{
"id": "0bf8d49e2e",
"def": "sung without instrumental accompaniment",
"example": "they sang an a cappella Mass",
"speech_part": "adjective"
}
]
},
"A.D.": {
"word": "A.D.",
"wordset_id": "b7e9d406a0",
"meanings": [
{
"id": "a7482f3e30",
"def": "in the Christian era",
"speech_part": "adverb",
"synonyms": [
"AD"
]
}
]
},.........
}
As suggested in my previous question I am looking on how to insert this data in to tables
Word: [word, wordset_id]
Meaning: [word, meaning_id, def, example, speech_part
Synonym: [word, synonym_word]
I tried reading file as:
import json
with open('a.json') as f:
d = json.load(f)
when I tried printing all words as:
for word in d:
print(word)
I got all words, but failed to get wordset_id for the same.
How can I insert the word and wordset_id in to the table word for the json format as above?
DBconnection as:
from flask import Flask
from flaskext.mysql import MySQL
app = Flask(__name__)
mysql = MySQL()
app.config['MYSQL_DATABASE_USER'] = 'root'
app.config['MYSQL_DATABASE_PASSWORD'] = 'root'
app.config['MYSQL_DATABASE_DB'] = 'wordstoday'
app.config['MYSQL_DATABASE_HOST'] = 'localhost'
mysql.init_app(app)
conn = mysql.connect()
cursor =conn.cursor()
When you try to execute code:
for word in d:
print(word)
It will only print the keys of the json object, not complete value. Instead, you can try doing something like this,
for word in d:
word_obj = d[word]
wordset_id = word_obj['wordset_id']
sql = "INSERT INTO Word (word, wordset_id) VALUES (%s, %s)"
values = (word, wordset_id)
cursor.execute(sql, values)
meaning_obj_list = d[word]['meanings']
for meaning_obj in meaning_obj_list:
meaning_id = meaning_obj['id']
definition = meaning_obj['def']
example = meaning_obj.get('example', None) # since it is not guaranteed that "example" key will be present in the data, it is safer to extract the value this way
speech_part = meaning_obj['speech_part']
sql = "INSERT INTO Meaning (word, meaning_id, def, example, speech_part) VALUES (%s, %s, %s, %s, %s)"
values = (word, meaning_id, definition, example, speech_part)
cursor.execute(sql, values)
db.commit()
Also, refrain from using the keys names such as def as this is a keyword in python.

Join array of objects to string value in Python

I want to join array of objects to a string in python. Is there any way for me to do that?
url = "https://google.com",
search = "thai food",
results = [
{
"restaurant": "Siam Palace",
"rating": "4.5"
},
{
"restaurant": "Bangkok Palace",
"rating": "3.5"
}
]
I want to be able to join these all to form one value.
If I could make it look like:
data = { url = "https://google.com",
{
search = "thai food",
results = [
{
"restaurant": "Siam Palace",
"rating": "4.5"
},
{
"restaurant": "Bangkok Palace",
"rating": "3.5"
}
]
}
}
I am receiving these results from mongodb and want to join these 3 together.
Use the JSON module
data = {} # create empty dict
# set the fields
data['url'] = 'https://google.com'
data['search'] = 'thai food'
# set the results
data['results'] = results
# export as string
import json
print(json.dumps(data, indent=4)

Python 3 Get JSON value

I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.

Categories

Resources