Creating a Data Structure from JSON Using Python - python

I'm new to python and I have a json file that I'm trying to use to create a data structure using python.
Here is a sample of what one of the lines would look like:
[{'name': 'Nick', 'age': '35', 'city': 'New York'}...]
I read it into memory, but am unsure what additional steps I would need to take to use it to create a table.
Here is what I have tried so far:
import json
import csv
from pprint import pprint
with open("/desktop/customer_records.json") as customer_records:
data=json.load(customer_records)
for row in data:
print(row)
Ideally, I would it in the following format:
Name Age City
Nick 35 New York
Any help would be appreciated. Thanks in advance.

Your problem is not specified too precisely, but if you want to generate SQL that can be inserted into MySQL, here's a little program that converts JSON to a sequence of SQL statements:
#!/usr/bin/env python
import json
# Load data
with open("zz.json") as f:
data=json.load(f)
# Find all keys
keys = []
for row in data:
for key in row.keys():
if key not in keys:
keys.append(key)
# Print table definition
print """CREATE TABLE MY_TABLE(
{0}
);""".format(",\n ".join(map(lambda key: "{0} VARCHAR".format(key), keys)))
# Now, for all rows, print values
for row in data:
print """INSERT INTO MY_TABLE VALUES({0});""".format(
",".join(map(lambda key: "'{0}'".format(row[key]) if key in row else "NULL", keys)))
For this JSON file:
[
{"name": "Nick", "age": "35", "city": "New York"},
{"name": "Joe", "age": "21", "city": "Boston"},
{"name": "Alice", "city": "Washington"},
{"name": "Bob", "age": "49"}
]
It generates
CREATE TABLE MY_TABLE(
city VARCHAR,
age VARCHAR,
name VARCHAR
);
INSERT INTO MY_TABLE VALUES('New York','35','Nick');
INSERT INTO MY_TABLE VALUES('Boston','21','Joe');
INSERT INTO MY_TABLE VALUES('Washington',NULL,'Alice');
INSERT INTO MY_TABLE VALUES(NULL,'49','Bob');
And for the future, please make your question WAY more specific :)

Related

Updating Jsontype column in table database

Trying to update Json column in table User with python script.
So, I have a list of UID(stored in uid_list variable), by this list of uid I would like to update following uids in database.
json_data = Column(JSONType) - column, that need to be updated, name and surname.
The data that stores in this column: {"view_data": {"active": false, "text": "", "link": "http://google.com/"}, "name": "John", "surname": "Black", "email": "john#gmail.com"}
def update_json_column_in_table_in_db_by_list_of_uid():
uid_list = ['25a00f0e-58a5-4356-8b91-b18ea2eed71d', '68ccc759-97ae-48a2-bc42-5c2f1fa7a0ba', '9e2ee469-f777-4622-bca1-68d924caed0f']
name = 'empty'
surname = 'empty2'
User.query.filter(User.uid.in_(uid_list)).update({User.json_data: name + surname})
You need to do two things:
use .where() instead of .filter()
use func.jsonb_set or func.json_set
from sqlalchemy import func
stmt = update(User).values(json_data=func.json_set(User.json_data, '{name}', name)).where(User.uid.in_(uid_list))

How to effectively synchronize freshly fetched data with data stored in database?

Let's start with initialization of the database:
import sqlite3
entries = [
{"name": "Persuasive", "location": "Bolivia"},
{"name": "Crazy", "location": "Guyana"},
{"name": "Humble", "location": "Mexico"},
{"name": "Lucky", "location": "Uruguay"},
{"name": "Jolly", "location": "Alaska"},
{"name": "Mute", "location": "Uruguay"},
{"name": "Happy", "location": "Chile"}
]
conn = sqlite3.connect('entries.db')
conn.execute('''DROP TABLE ENTRIES''')
conn.execute('''CREATE TABLE ENTRIES
(ID INT PRIMARY KEY NOT NULL,
NAME TEXT NOT NULL,
LOCATION TEXT NOT NULL,
ACTIVE NUMERIC NULL);''')
conn.executemany("""INSERT INTO ENTRIES (ID, NAME, LOCATION) VALUES (:id, :name, :location)""", entries)
conn.commit()
That was an initial run, just to populate the database with some data.
Then, everytime the application runs, new data gets fetched from somewhere:
findings = [
{"name": "Brave", "location": "Bolivia"}, # new
{"name": "Crazy", "location": "Guyana"},
{"name": "Humble", "location": "Mexico"},
{"name": "Shy", "location": "Suriname"}, # new
{"name": "Cautious", "location": "Brazil"}, # new
{"name": "Mute", "location": "Uruguay"},
{"name": "Happy", "location": "Chile"}
]
In this case, we have 3 new items in the list. I expect that all the items that are in the database now will remain there, and the 3 new items will be appended to the db. And all items from the list above would get active flag set to True, remaining ones would get the flag set to False. Let's prepare a dump from the database:
conn = sqlite3.connect('entries.db')
cursor = conn.execute("SELECT * FROM ENTRIES ORDER BY ID")
db_entries = []
for row in cursor:
entry = {"id": row[0], "name": row[1], "location": row[2], "active": row[3]}
db_entries.append(entry)
OK, now we can compare what's in new findings, and what was there already in the database:
import random
for f in findings:
n = next((d for d in db_entries if d["name"] == f["name"] and d["location"] == f["location"]), None)
if n is None:
id = int(random.random() * 10)
conn.execute('''INSERT INTO ENTRIES(ID, NAME, LOCATION, ACTIVE) VALUES (?, ?, ?, ?)''',
(id, f["name"], f["location"], 1))
conn.commit()
else:
active = next((d for d in db_entries if d['id'] == n['id']), None)
active.update({"act": "yes"})
conn.execute("UPDATE ENTRIES set ACTIVE = 1 where ID = ?", (n["id"],))
conn.commit()
(I know you're probably upset with the random ID generator, but it's for prototyping purpose only)
As you saw, instances of db_entries that are common with findings instances were updated with a flag ({"act": "yes"}). They get processed now, and beside that the items that are no longer active get updated with a different flag and then queried for deactivation:
for d in db_entries:
if "act" in d:
conn.execute("UPDATE ENTRIES set ACTIVE = 1 where ID = ?", (d["id"],))
conn.commit()
else:
if d["active"] == 1:
d.update({"deact": "yes"})
for d in db_entries:
if "deact" in d:
conn.execute("UPDATE ENTRIES set ACTIVE = 0 where ID = ?", (d["id"],))
conn.commit()
conn.close()
And this is it: items fetched on the fly were compared with those in the database and synchronized.
I have a feeling that this approach saves some data transfer between application and database, as it only updates items that require updating, but on the other hand it feels like the whole process could be rebuilt and made more effective.
What would you improve in this process?
Wouldn't a simpler approach be in just inserting all new data, but keeping track of duplicates by appending ON CONFLICT DO UPDATE SET.
You wouldn't even necessarily need the ID field, but you would need a unique key on NAME and LOCATION to identify duplicates. Then following query would identify the duplicate and not insert it, but just update the NAME field with the same value again (so basically same result as ignoring the row).
INSERT INTO ENTRIES (NAME, LOCATION)
VALUES ('Crazy', 'Guyana')
ON CONFLICT(NAME,LOCATION) DO UPDATE SET NAME = 'Crazy';
then you can simply execute:
conn.execute('''INSERT INTO ENTRIES(NAME, LOCATION) VALUES (?, ?) ON CONFLICT(NAME,LOCATION) DO UPDATE SET NAME=?''',
(f["name"], f["location"], f["name"]))
This would simplify your "insert only new entries" process. I recon you could also combine this in such a way, that the update you perform is not updating the NAME field, but in fact add your ACTIVE logic here.
Also since SQLite version 3.24.0 it supports UPSERT

How can you query an item in a list field in DynamoDB using Python?

I have a table that contains an item with the following attributes:
{
"country": "USA",
"names": [
"josh",
"freddy"
],
"phoneNumber": "123",
"userID": 0
}
I'm trying to query an item in a DynameDB by looking for a name using python. So I would write in my code that the item I need has "freddy" in the field "names".
I saw many forums mentioning "contains" but none that show an example...
My current code is the following:
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users_table')
data = table.query(
FilterExpression: 'names = :name',
ExpressionAttributeValues: {
":name": "freddy"
}
)
I obviously cannot use that because "names" is a list and not a string field.
How can I look for "freddy" in names?
Since names field isn't part of the primary key, so you can't use query. The only way to look for an item by names is to use scan.
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users_table')
data = table.scan(
FilterExpression=Attr('names').contains('freddy')
)

Parsing json into Insert statements with Python

I have a file which contains several json records. I have to parse this file and load each of the jsons to a particular SQL-Server table. However, the table might not exist on the database, in which case I have to also create it first before loading. So, I have to parse the json file and figure out the fields/columns and create the table. Then I will have to de-serialize the jsons into records and insert them into the table created. However, the caveat is that some fields in the json are optional i.e. a field might be absent from one json record but could be present in another record. Below is an example file with 3 records :-
{ id : 1001,
name : "John",
age : 30
} ,
{ id : 1002,
name : "Peter",
age : 25
},
{ id : 1002,
name : "Kevin",
age : 35,
salary : 5000
},
Notice that the field salary appears only in the 3rd record. The results should be :-
CREATE TABLE tab ( id int, name varchar(100), age int, salary int );
INSERT INTO tab (id, name, age, salary) values (1001, 'John', 30, NULL)
INSERT INTO tab (id, name, age, salary) values (1002, 'Peter', 25, NULL)
INSERT INTO tab (id, name, age, salary) values (1003, 'Kevin', 35, 5000)
Can anyone please help me with some pointers as I am new to Python. Thanks.
You could try this:
import json
TABLE_NAME = "tab"
sqlstatement = ''
with open ('data.json','r') as f:
jsondata = json.loads(f.read())
for json in jsondata:
keylist = "("
valuelist = "("
firstPair = True
for key, value in json.items():
if not firstPair:
keylist += ", "
valuelist += ", "
firstPair = False
keylist += key
if type(value) in (str, unicode):
valuelist += "'" + value + "'"
else:
valuelist += str(value)
keylist += ")"
valuelist += ")"
sqlstatement += "INSERT INTO " + TABLE_NAME + " " + keylist + " VALUES " + valuelist + "\n"
print(sqlstatement)
However for this to work, you'll need to change your JSON file to correct the syntax like this:
[{
"id" : 1001,
"name" : "John",
"age" : 30
} ,
{
"id" : 1002,
"name" : "Peter",
"age" : 25
},
{
"id" : 1003,
"name" : "Kevin",
"age" : 35,
"salary" : 5000
}]
Running this gives the following output:
INSERT INTO tab (age, id, name) VALUES (30, 1001, 'John')
INSERT INTO tab (age, id, name) VALUES (25, 1002, 'Peter')
INSERT INTO tab (salary, age, id, name) VALUES (5000, 35, 1003, 'Kevin')
Note that you don't need to specify NULLs. If you don't specify a column in the insert statement, it should automatically insert NULL into any columns you left out.
In Python, you can do something like this using sqlite3 and json, both from the standard library.
import json
import sqlite3
# The string representing the json.
# You will probably want to read this string in from
# a file rather than hardcoding it.
s = """[
{
"id": 1001,
"name": "John",
"age" : 30
},
{
"id" : 1002,
"name" : "Peter",
"age" : 25
},
{
"id" : 1002,
"name" : "Kevin",
"age" : 35,
"salary" : 5000
}
]"""
# Read the string representing json
# Into a python list of dicts.
data = json.loads(s)
# Open the file containing the SQL database.
with sqlite3.connect("filename.db") as conn:
# Create the table if it doesn't exist.
conn.execute(
"""CREATE TABLE IF NOT EXISTS tab(
id int,
name varchar(100),
age int,
salary int
);"""
)
# Insert each entry from json into the table.
keys = ["id", "name", "age", "salary"]
for entry in data:
# This will make sure that each key will default to None
# if the key doesn't exist in the json entry.
values = [entry.get(key, None) for key in keys]
# Execute the command and replace '?' with the each value
# in 'values'. DO NOT build a string and replace manually.
# the sqlite3 library will handle non safe strings by doing this.
cmd = """INSERT INTO tab VALUES(
?,
?,
?,
?
);"""
conn.execute(cmd, values)
conn.commit()
This will create a file named 'filename.db' in the current directory with the entries inserted.
To test the tables:
# Testing the table.
with sqlite3.connect("filename.db") as conn:
cmd = """SELECT * FROM tab WHERE SALARY NOT NULL;"""
cur = conn.execute(cmd)
res = cur.fetchall()
for r in res:
print(r)

Python - JSON to CSV table?

I was wondering how I could import a JSON file, and then save that to an ordered CSV file, with header row and the applicable data below.
Here's what the JSON file looks like:
[
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
etc. etc. + } ]
Note there are multiple 'keys' (firstName, lastName, nickname, etc.). I would like to create a CSV file with those as the header, then the applicable info beneath in rows, with each row having a player's information.
Here's the script I have so far for Python:
import urllib2
import json
import csv
writefilerows = csv.writer(open('WCData_Rows.csv',"wb+"))
api_key = "xxxx"
url = "http://worldcup.kimonolabs.com/api/players?apikey=" + api_key + "&limit=1000"
json_obj = urllib2.urlopen(url)
readable_json = json.load(json_obj)
list_of_attributes = readable_json[0].keys()
print list_of_attributes
writefilerows.writerow(list_of_attributes)
for x in readable_json:
writefilerows.writerow(x[list_of_attributes])
But when I run that, I get a "TypeError: unhashable type:'list'" error. I am still learning Python (obviously I suppose). I have looked around online (found this) and can't seem to figure out how to do it without explicitly stating what key I want to print...I don't want to have to list each one individually...
Thank you for any help/ideas! Please let me know if I can clarify or provide more information.
Your TypeError is occuring because you are trying to index a dictionary, x with a list, list_of_attributes with x[list_of_attributes]. This is not how python works. In this case you are iterating readable_json which appears it will return a dictionary with each iteration. There is no need pull values out of this data in order to write them out.
The DictWriter should give you what your looking for.
import csv
[...]
def encode_dict(d, out_encoding="utf8"):
'''Encode dictionary to desired encoding, assumes incoming data in unicode'''
encoded_d = {}
for k, v in d.iteritems():
k = k.encode(out_encoding)
v = unicode(v).encode(out_encoding)
encoded_d[k] = v
return encoded_d
list_of_attributes = readable_json[0].keys()
# sort fields in desired order
list_of_attributes.sort()
with open('WCData_Rows.csv',"wb+") as csv_out:
writer = csv.DictWriter(csv_out, fieldnames=list_of_attributes)
writer.writeheader()
for data in readable_json:
writer.writerow(encode_dict(data))
Note:
This assumes that each entry in readable_json has the same fields.
Maybe pandas could do this - but I newer tried to read JSON
import pandas as pd
df = pd.read_json( ... )
df.to_csv( ... )
pandas.DataFrame.to_csv
pandas.io.json.read_json
EDIT:
data = ''' [
{
"firstName": "Nicolas Alexis Julio",
"lastName": "N'Koulou N'Doubena",
"nickname": "N. N'Koulou",
"nationality": "Cameroon",
"age": 24
},
{
"firstName": "Alexandre Dimitri",
"lastName": "Song-Billong",
"nickname": "A. Song",
"nationality": "Cameroon",
"age": 26,
}
]'''
import pandas as pd
df = pd.read_json(data)
print df
df.to_csv('results.csv')
result:
age firstName lastName nationality nickname
0 24 Nicolas Alexis Julio N'Koulou N'Doubena Cameroon N. N'Koulou
1 26 Alexandre Dimitri Song-Billong Cameroon A. Song
With pandas you can save it in csv, excel, etc (and maybe even directly in database).
And you can do some operations on data in table and show it as graph.

Categories

Resources