Getting data from nested json using python - python

I'm trying to get some data from a government website and store them in two different tables. One will contain the filenames and the release date (let's call this filename) and one will contain the actual data and the key to join with the filename (let's call this datasplit)
These data come into a JSON file I saved from the web page (I don't have an API for that). Here's a little example of how the JSON file looks:
{
"filename": [
{
"id": 2,
"nome": "Societa' controllate di fatto dalla Presidenza del Consiglio dei Ministri e dai Ministeri",
"aggiornamento": "04-02-2020",
"datasplit": [
{
"cf": "00081070591",
"den": "SIOG SOCIETA'ITALIANA OLEODOTTI DI GAETA SPA IN AMM.NE STRAORDINARIA",
"dm": "1513641600"
},
{
"cf": "00103540829",
"den": "INDUSTRIA SICILIANA ACIDO FOSFORICO S.P.A.IN LIQUIDAZIONE",
"dm": "1513641600"
}
]
},
{
"id": 1,
"nome": "Enti o societa' controllate dalle Amministrazioni Centrali",
"aggiornamento": "30-10-2019",
"datasplit": [
{
"cf": "00049100522",
"den": "MPS TENIMENTI POGGIO BONELLI E CHIGI SARACINI - SOC. AGRICOLA SPA",
"dm": "1513641600"
},
{
"cf": "00051010528",
"den": "SOCIETA' AGRICOLA SUVIGNANO S.R.L.",
"dm": "1513641600"
}
]
},
{
"id": 4,
"nome": "Societa' quotate inserite nell'indice FTSE MIB della Borsa italiana",
"aggiornamento": "19-12-2017",
"datasplit": [
{
"cf": "00079760328",
"den": "ASSICURAZIONI GENERALI S.P.A.",
"dm": "1513641600"
},
{
"cf": "00222620163",
"den": "FRENI BREMBO - SPA",
"dm": "1513641600"
}
]
}
]
}
So what I'd like to get is the filename table with id, nome, aggiornamento fields and the datasplit table with id, aggiornamento, cf, den, dm
What I've done so far is to get the JSON file (saving it locally) from the webpage and read it in my python program.
# this works
import json
sqlstatement = ''
with open('splitdata.json', 'r') as f: #this is where I saved the website content I want to Import
jsondata = json.loads(f.read())
I was trying to build something that would go through the json file and build some INSERT INTO table_name SQL queries to later execute them and finally have my data in the database.
So, my problem is how to read the nested JSON first and how to insert the data in my database second (if you have a better solution than creating and running the SQL script).
When trying to cycle inside the JSON it seems that it only finds one element.
for json in jsondata:
keylist = "("
valuelist = "("
firstPair = True
for key, value in jsondata.items():
if not firstPair:
keylist += ", "
valuelist += ", "
firstPair = False
keylist += key
if type(value) in (str, unicode):
valuelist += "'" + value + "'"
else:
valuelist += str(value)
keylist += ")"
valuelist += ")"
sqlstatement += "INSERT INTO " + TABLE_NAME + " " + keylist + " VALUES " + valuelist + "\n"
print(sqlstatement)
I know the code is incomplete to generate the correct SQL statements but i need help on how to get to the nested part of the JSON, the datasplit field. Could it be it's not treating it as a dictionary? If so, how can i eventually fix it?

You can attempt to solve the problem using Pandas which has SQL capabilities:
import pandas as pd
for key, val in json_data.items():
df = pd.json_normalize(val)
df = df.explode('datasplit')
df[['cf', 'den', 'dm']] = df.datasplit.apply(pd.Series)
df = df.drop('datasplit', axis=1)
df.to_sql(<name>, <con>)
This is what each datasplit (df) looks like:
id nome aggiornamento \
0 2 Societa' controllate di fatto dalla Presidenza... 04-02-2020
0 2 Societa' controllate di fatto dalla Presidenza... 04-02-2020
1 1 Enti o societa' controllate dalle Amministrazi... 30-10-2019
1 1 Enti o societa' controllate dalle Amministrazi... 30-10-2019
2 4 Societa' quotate inserite nell'indice FTSE MIB... 19-12-2017
2 4 Societa' quotate inserite nell'indice FTSE MIB... 19-12-2017
cf den dm
0 00081070591 SIOG SOCIETA'ITALIANA OLEODOTTI DI GAETA SPA I... 1513641600
0 00103540829 INDUSTRIA SICILIANA ACIDO FOSFORICO S.P.A.IN L... 1513641600
1 00049100522 MPS TENIMENTI POGGIO BONELLI E CHIGI SARACINI ... 1513641600
1 00051010528 SOCIETA' AGRICOLA SUVIGNANO S.R.L. 1513641600
2 00079760328 ASSICURAZIONI GENERALI S.P.A. 1513641600
2 00222620163 FRENI BREMBO - SPA 1513641600

I am not sure where you are stuck at, but perhaps you aren't familiar with how dictionaries work?
jsondata['filename'] # has everything you want. you array of data.
# get the first element in the array
jsondata['filename'][0]
# result: {'id': 2, 'nome': "Societa' controllate di fatto dalla Presidenza del Consiglio dei Ministri e dai Ministeri", 'aggiornamento': '04-02-2020', 'datasplit': [{'cf': '00081070591', 'den': "SIOG SOCIETA'ITALIANA OLEODOTTI DI GAETA SPA IN AMM.NE STRAORDINARIA", 'dm': '1513641600'}, {'cf': '00103540829', 'den': 'INDUSTRIA SICILIANA ACIDO FOSFORICO S.P.A.IN LIQUIDAZIONE', 'dm': '1513641600'}]}
# second element
jsondata['filename'][1]
# result: {'id': 1, 'nome': "Enti o societa' controllate dalle Amministrazioni Centrali", 'aggiornamento': '30-10-2019', 'datasplit': [{'cf': '00049100522', 'den': 'MPS TENIMENTI POGGIO BONELLI E CHIGI SARACINI - SOC. AGRICOLA SPA', 'dm': '1513641600'}, {'cf': '00051010528', 'den': "SOCIETA' AGRICOLA SUVIGNANO S.R.L.", 'dm': '1513641600'}]}
# access `datasplit` directly like
jsondata['filename'][0]['datasplit']
# result: [{'cf': '00081070591', 'den': "SIOG SOCIETA'ITALIANA OLEODOTTI DI GAETA SPA IN AMM.NE STRAORDINARIA", 'dm': '1513641600'}, {'cf': '00103540829', 'den': 'INDUSTRIA SICILIANA ACIDO FOSFORICO S.P.A.IN LIQUIDAZIONE', 'dm': '1513641600'}]
# or in for loop
for data in jsondata['filename']:
data['datasplit']

Related

How can I explore a dictionary into a dictionary when im consuming an API. Example with PokeAPI

I'm trying to see the value of a key in a nested dictionary consuming an api. The code below iterate each pokemon in a the list of all pokemons and print some of its attributes. I know how to print the value of a single key like height, id, name or base_experience but how can I print the name of each ability?? for example:
JSON CODE
"id": 1,
"height": 7,
"name": "bulbasur",
"abilities": [
{
"ability": {
"name": "overgrow",
"url": "https://pokeapi.co/api/v2/ability/65/"
},
"is_hidden": false,
"slot": 1
},
{
"ability": {
"name": "chlorophyll",
"url": "https://pokeapi.co/api/v2/ability/34/"
},
"is_hidden": true,
"slot": 3
}
]
PYTHON CODE
import requests
def print_restul(json_respnse):
pokemones = json_respnse.get("results")
for pokemon in pokemones:
# print(pokemon)
explore_pokemon(pokemon)
def explore_pokemon(pokemon):
url_pokemon = pokemon.get("url")
r = requests.get(url_pokemon)
json_respnse = r.json()
# print(json_respnse.keys())
print("el id del pokemon {} es {}, y la altura es de {}".format(pokemon.get("name"),json_respnse.get("id"),json_respnse.get("height"),))
if __name__ == "__main__":
url = 'http://pokeapi.co/api/v2/pokemon'
r = requests.get(url)
json_respnse = r.json()
print_restul(json_respnse)
for _ in range(10):
print("== nuevo ciclo for === ")
url_next = json_respnse.get("next")
r = requests.get(url_next)
json_respnse = r.json()
print_restul(json_respnse)
Loop over the abilities list and get all the names.
def explore_pokemon(pokemon):
url_pokemon = pokemon.get("url")
r = requests.get(url_pokemon)
json_respnse = r.json()
# print(json_respnse.keys())
print("el id del pokemon {} es {}, y la altura es de {}".format(pokemon.get("name"),json_respnse.get("id"),json_respnse.get("height"),))
abilities = ", ".join(a['ability']['name'] for a in json_response.get("abilities", []))
print(f'abilities: {abilities}')
In long run it might be a better idea to model your data as dataclasses instead of python dict objects. It will also promote code quality and code reuse which I feel is a good idea in general.
To work with nested dataclass model and easily generate it from json, I suggest using wiz command-line from dataclass-wizard. This library can be install with pip install dataclass-wizard and then you have access to wiz utility from a terminal to generate a nested dataclass schema.
In this scenario, your code now becomes like below:
def explore_pokemon(pokemon):
url_pokemon = pokemon.get("url")
r = requests.get(url_pokemon)
json_respnse = Data.from_dict(r.json())
# print(json_respnse.keys())
print("el id del pokemon {} es {}, y la altura es de {}".format(pokemon.get("name"),json_respnse.id,json_respnse.height),)
abilities = ", ".join(a.ability.name for a in json_respnse.abilities)
print(f'abilities: {abilities}')

Conversion from nested json to csv with pandas

I am trying to convert a nested json into a csv file, but I am struggling with the logic needed for the structure of my file: it's a json with 2 objects and I would like to convert into csv only one of them, which is a list with nesting.
I've found very helpful "flattening" json info in this blog post. I have been basically adapting it to my problem, but it is still not working for me.
My json file looks like this:
{
"tickets":[
{
"Name": "Liam",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Piano",
"Sports"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "barkele01",
"salary" : 870000
},
{
"Name": "John",
"Location": {
"City": "Los Angeles",
"State": "CA"
},
"hobbies": [
"Music",
"Running"
],
"year" : 1985,
"teamId" : "ATL",
"playerId" : "bedrost01",
"salary" : 550000
}
],
"count": 2
}
my code, so far, looks like this:
import json
from pandas.io.json import json_normalize
import argparse
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
parser.add_argument(
"-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)
args = parser.parse_args()
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
flat_json = flatten_json(json_data)
# normalizing flat json
final_data = json_normalize(flat_json)
with open(args.json_file.replace(".json", ".csv"), "w") as outputFile: # open csv file
# saving DataFrame to csv
final_data.to_csv(outputFile, encoding='utf8', index=False)
What I would like to obtain is 1 line per ticket in the csv, with headings:
Name,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.
I would really appreciate anything that can do the click!
Thank you!
I actually wrote a package called cherrypicker recently to deal with this exact sort of thing since I had to do it so often!
I think the following code would give you exactly what you're after:
from cherrypicker import CherryPicker
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
picker = CherryPicker(data)
flat = picker['tickets'].flatten().get()
df = pd.DataFrame(flat)
print(df)
This gave me the output:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
You can install the package with:
pip install cherrypicker
...and there's more docs and guidance at https://cherrypicker.readthedocs.io.
An you already have a function to flatten a Json object, you have just to flatten the tickets:
...
with open(args.json_file, "r") as inputFile: # open json file
json_data = json.loads(inputFile.read()) # load json content
final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
...
With your sample data, final_data is as expected:
Location_City Location_State Name hobbies_0 hobbies_1 playerId salary teamId year
0 Los Angeles CA Liam Piano Sports barkele01 870000 ATL 1985
1 Los Angeles CA John Music Running bedrost01 550000 ATL 1985
There may be a simpler solution for this. But this should work!
import json
import pandas as pd
with open('file.json') as file:
data = json.load(file)
df = pd.DataFrame(data['tickets'])
for i,item in enumerate(df['Location']):
df['location_city'] = dict(df['Location'])[i]['City']
df['location_state'] = dict(df['Location'])[i]['State']
for i,item in enumerate(df['hobbies']):
df['hobbies_{}'.format(i)] = dict(df['hobbies'])[i]
df = df.drop({'Location','hobbies'}, axis=1)
print(df)

xls to JSON using python3 xlrd

I have to directly convert a xls file to a JSON document using python3 and xlrd.
Table is here.
It's divided in three main categories (PUBLICATION, CONTENU, CONCLUSION) whose names are on column one (first column is zero) and number of rows by category can vary. Each rows has three key values (INDICATEURS, EVALUATION, PROPOSITION) on column 3, 5 and 7. There can be empty lines, or missing values
I have to convert that table to the following JSON data I have written directly has a reference. It's valid.
{
"EVALUATION": {
"PUBLICATION": [
{
"INDICATEUR": "Page de garde",
"EVALUATION": "Inexistante ou non conforme",
"PROPOSITION D'AMELIORATION": "Consulter l'example sur CANVAS"
},
{
"INDICATEUR": "Page de garde",
"EVALUATION": "Titre du TFE non conforme",
"PROPOSITION D'AMELIORATION": "Utilisez le titre avalisé par le conseil des études"
},
{
"INDICATEUR": "Orthographe et grammaire",
"EVALUATION": "Nombreuses fautes",
"PROPOSITION D'AMELIORATION": "Faire relire le document"
},
{
"INDICATEUR": "Nombre de page",
"EVALUATION": "Nombre de pages grandement différent à la norme",
"PROPOSITION D'AMELIORATION": ""
}
],
"CONTENU": [
{
"INDICATEUR": "Développement du sujet",
"EVALUATION": "Présentation de l'entreprise",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Développement du sujet",
"EVALUATION": "Plan de localisation inutile",
"PROPOSITION D'AMELIORATION": "Supprimer le plan de localisation"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran excessives",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran Inutiles",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Figures et capture d'écran",
"EVALUATION": "Captures d'écran illisibles",
"PROPOSITION D'AMELIORATION": "Pour chaque figure et capture d'écran se poser la question 'Qu'est-ce que cela apporte à mon sujet ?'"
},
{
"INDICATEUR": "Conclusion",
"EVALUATION": "Conclusion inexistante",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Bibliographie",
"EVALUATION": "Inexistante",
"PROPOSITION D'AMELIORATION": ""
},
{
"INDICATEUR": "Bibliographie",
"EVALUATION": "Non normalisée",
"PROPOSITION D'AMELIORATION": "Ecrire la bibliographie selon la norme APA"
}
],
"CONCLUSION": [
{
"INDICATEUR": "",
"EVALUATION": "Grave manquement sur le plan de la présentation",
"PROPOSITION D'AMELIORATION": "Lire le document 'Conseil de publication' disponible sur CANVAS"
},
{
"INDICATEUR": "",
"EVALUATION": "Risque de refus du document par le conseil des études",
"PROPOSITION D'AMELIORATION": ""
}
]
}
}
My intention is to loop through lines, check rows[1] to identify the category, and sub-loop to add data as dictionary in a list by category.
Here is my code so far :
import xlrd
file = '/home/eh/Documents/Base de Programmation/Feedback/EvaluationEI.xls'
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
def readRows():
for rownum in range(2,sheet.nrows):
rows = sheet.row_values(rownum)
indicateur = rows[3]
evaluation = rows[5]
amelioration = rows[7]
publication = []
contenu = []
conclusion = []
if rows[1] == "PUBLICATION":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
publication.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
if rows[1] == "CONTENU":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
contenu.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
if rows[1] == "CONCLUSION":
if rows[3] == '' and rows[5] == '' and rows[7] == '':
continue
else:
conclusion.append("INDICATEUR : " + indicateur , "EVALUATION : " + evaluation , "PROPOSITION D'AMELIORATION : " + amelioration)
print (publication)
print (contenu)
print (conclusion)
readRows()
I am having a hard time figuring out how to sub-loop for the right number of rows to separate data by categories.
Any help would be welcome.
Thank you in advance
Using the json package and the OrderedDict (to preserve key order), I think this gets to what you're expecting, and I've modified slightly so we're not building a string literal, but rather a dict which contains the data that we can then convert with json.dumps.
As Ron noted above, your previous attempt was skipping the lines where rows[1] was not equal to one of your three key values.
This should read every line, appending to the last non-empty key:
def readRows(file, s_index=0):
"""
file: path to xls file
s_index: sheet_index for the xls file
returns a dict of OrderedDict of list of OrderedDict which can be parsed to JSON
"""
d = {"EVALUATION" : OrderedDict()} # this will be the main dict for our JSON object
wb = xlrd.open_workbook(file)
sheet = wb.sheet_by_index(s_index)
# getting the data from the worksheet
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
# fill the dict with data:
for _,row in enumerate(data[3:]):
if row[1]: # if there's a value, then this is a new categorie element
categorie = row[1]
d["EVALUATION"][categorie] = []
if categorie:
i,e,a = row[3::2][:3]
if i or e or a: # as long as there's any data in this row, we write the child element
val = OrderedDict([("INDICATEUR", i),("EVALUATION", e),("PROPOSITION D'AMELIORATION", a)])
d["EVALUATION"][categorie].append(val)
return d
This returns a dict which can be easily parsed to json. Screenshot of some output:
Write to file if needed:
import io # for python 2
d = readRows(file,0)
with io.open('c:\debug\output.json','w',encoding='utf8') as out:
out.write(json.dumps(d,indent=2,ensure_ascii=False))
Note: in Python 3, I don't think you need io.open.
Is pandas not an option? Would add as a comment but don't have the rep.
From Documentation
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
df = pandas.read_excel('path_to_file.xls')
df.to_json(path_or_buf='output_path.json', orient='table')

Trying to access data in JSON data structure read from file

I have the following Python code to read a JSON file:
import json
from pprint import pprint
with open('traveladvisory.json') as json_data:
print 'json data ',json_data
d = json.load(json_data)
json_data.close()
Below is a piece of the 'traveladvisory.json' file opened with this code. The variable 'd' does print out all the JSON data. But I can't seem to get the syntax correct to read all of the 'country-eng' and 'advisory-text' fields and their data and print it out. Can someone assist? Here's a piece of the json data (sorry, can't get it pretty printed):
{
"metadata":{
"generated":{
"timestamp":1475854624,
"date":"2016-10-07 11:37:04"
}
},
"data":{
"AF":{
"country-id":1000,
"country-iso":"AF",
"country-eng":"Afghanistan",
"country-fra":"Afghanistan",
"advisory-state":3,
"date-published":{
"timestamp":1473866215,
"date":"2016-09-14 11:16:55",
"asp":"2016-09-14T11:16:55.000000-04:00"
},
"has-advisory-warning":1,
"has-regional-advisory":0,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"September 14, 2016 11:16 EDT",
"advisory-text":"Avoid all travel",
"recent-updates":"The Health tab was updated - travel health notices (Public Health Agency of Canada)."
},
"fra":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"14 septembre 2016 11:16 HAE",
"advisory-text":"\u00c9viter tout voyage",
"recent-updates":"L'onglet Sant\u00e9 a \u00e9t\u00e9 mis \u00e0 jour - conseils de sant\u00e9 aux voyageurs (Agence de la sant\u00e9 publique du Canada)."
}
},
"AL":{
"country-id":4000,
"country-iso":"AL",
"country-eng":"Albania",
"country-fra":"Albanie",
"advisory-state":0,
"date-published":{
"timestamp":1473350931,
"date":"2016-09-08 12:08:51",
"asp":"2016-09-08T12:08:51.8301256-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Albania",
"url-slug":"albania",
"friendly-date":"September 8, 2016 12:08 EDT",
"advisory-text":"Exercise normal security precautions (with regional advisories)",
"recent-updates":"An editorial change was made."
},
"fra":{
"name":"Albanie",
"url-slug":"albanie",
"friendly-date":"8 septembre 2016 12:08 HAE",
"advisory-text":"Prendre des mesures de s\u00e9curit\u00e9 normales (avec avertissements r\u00e9gionaux)",
"recent-updates":"Un changement mineur a \u00e9t\u00e9 apport\u00e9 au contenu."
}
},
"DZ":{
"country-id":5000,
"country-iso":"DZ",
"country-eng":"Algeria",
"country-fra":"Alg\u00e9rie",
"advisory-state":1,
"date-published":{
"timestamp":1475593497,
"date":"2016-10-04 11:04:57",
"asp":"2016-10-04T11:04:57.7727548-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Full TAA review",
"eng":{
"name":"Algeria",
"url-slug":"algeria",
"friendly-date":"October 4, 2016 11:04 EDT",
"advisory-text":"Exercise a high degree of caution (with regional advisories)",
"recent-updates":"This travel advice was thoroughly reviewed and updated."
},
"fra":{
"name":"Alg\u00e9rie",
"url-slug":"algerie",
"friendly-date":"4 octobre 2016 11:04 HAE",
"advisory-text":"Faire preuve d\u2019une grande prudence (avec avertissements r\u00e9gionaux)",
"recent-updates":"Les pr\u00e9sents Conseils aux voyageurs ont \u00e9t\u00e9 mis \u00e0 jour \u00e0 la suite d\u2019un examen minutieux."
}
},
}
}
Assuming d contains the json Data
for country in d["data"]:
print "Country :",country
#country.get() gets the value of the key . the second argument is
#the value returned in case the key is not present
print "country-eng : ",country.get("country-eng",0)
print "advisory-text(eng) :",country["eng"].get("advisory-text",0)
print "advisory-text(fra) :",country["fra"].get("advisory-text",0)
This worked for me:
for item in d['data']:
print d['data'][item]['country-eng'], d['data'][item]['eng']['advisory-text']
If I am understanding your question. Here's how to do it:
import json
with open('traveladvisory.json') as json_data:
d = json.load(json_data)
# print(json.dumps(d, indent=4)) # pretty-print data read
for country in d['data']:
print(country)
print(' country-eng: {}'.format(d['data'][country]['country-eng']))
print(' advisory-state: {}'.format(d['data'][country]['advisory-state']))
Output:
DZ
country-eng: Algeria
advisory-state: 1
AL
country-eng: Albania
advisory-state: 0
AF
country-eng: Afghanistan
advisory-state: 3
Then the code bellow work for python2 (if you need the python3 version please ask)
You have to use the load function from the json module
#-*- coding: utf-8 -*-
import json # import the module we need
with open("traveladvisory.json") as f: # f for file
d = json.load(f) # d is a dictionnary
for key in d['data']:
print d['data'][key]['country-eng']
print d['data'][key]['eng']['advisory-text']
It is good practice to use the with statement when dealing with file objects.
This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way.
Also, the json is wrong, you have to remove to comma from line 98:
{
"metadata":{
"generated":{
"timestamp":1475854624,
"date":"2016-10-07 11:37:04"
}
},
"data":{
"AF":{
"country-id":1000,
"country-iso":"AF",
"country-eng":"Afghanistan",
"country-fra":"Afghanistan",
"advisory-state":3,
"date-published":{
"timestamp":1473866215,
"date":"2016-09-14 11:16:55",
"asp":"2016-09-14T11:16:55.000000-04:00"
},
"has-advisory-warning":1,
"has-regional-advisory":0,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"September 14, 2016 11:16 EDT",
"advisory-text":"Avoid all travel",
"recent-updates":"The Health tab was updated - travel health notices (Public Health Agency of Canada)."
},
"fra":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"14 septembre 2016 11:16 HAE",
"advisory-text":"\u00c9viter tout voyage",
"recent-updates":"L'onglet Sant\u00e9 a \u00e9t\u00e9 mis \u00e0 jour - conseils de sant\u00e9 aux voyageurs (Agence de la sant\u00e9 publique du Canada)."
}
},
"AL":{
"country-id":4000,
"country-iso":"AL",
"country-eng":"Albania",
"country-fra":"Albanie",
"advisory-state":0,
"date-published":{
"timestamp":1473350931,
"date":"2016-09-08 12:08:51",
"asp":"2016-09-08T12:08:51.8301256-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Albania",
"url-slug":"albania",
"friendly-date":"September 8, 2016 12:08 EDT",
"advisory-text":"Exercise normal security precautions (with regional advisories)",
"recent-updates":"An editorial change was made."
},
"fra":{
"name":"Albanie",
"url-slug":"albanie",
"friendly-date":"8 septembre 2016 12:08 HAE",
"advisory-text":"Prendre des mesures de s\u00e9curit\u00e9 normales (avec avertissements r\u00e9gionaux)",
"recent-updates":"Un changement mineur a \u00e9t\u00e9 apport\u00e9 au contenu."
}
},
"DZ":{
"country-id":5000,
"country-iso":"DZ",
"country-eng":"Algeria",
"country-fra":"Alg\u00e9rie",
"advisory-state":1,
"date-published":{
"timestamp":1475593497,
"date":"2016-10-04 11:04:57",
"asp":"2016-10-04T11:04:57.7727548-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Full TAA review",
"eng":{
"name":"Algeria",
"url-slug":"algeria",
"friendly-date":"October 4, 2016 11:04 EDT",
"advisory-text":"Exercise a high degree of caution (with regional advisories)",
"recent-updates":"This travel advice was thoroughly reviewed and updated."
},
"fra":{
"name":"Alg\u00e9rie",
"url-slug":"algerie",
"friendly-date":"4 octobre 2016 11:04 HAE",
"advisory-text":"Faire preuve d\u2019une grande prudence (avec avertissements r\u00e9gionaux)",
"recent-updates":"Les pr\u00e9sents Conseils aux voyageurs ont \u00e9t\u00e9 mis \u00e0 jour \u00e0 la suite d\u2019un examen minutieux."
}
}
}
}

Python JSON parsing not storing values

I've looked over this many times, and cant seem to find the problem with it.
I am trying to pull 3 fields from a JSON response (engagements, shares, comments), sum them together, then print the sum.
It seems to be returning the fields correctly, but it returns zero in the final print.
I'm very new to this stuff, but would appreciate any help anyone can give me. I'm guessing there is something fundamental I am missing here.
import urllib2,time,csv,json,requests,urlparse,pdb
SEARCH_URL = urllib2.unquote("http://soyuz.elastic.tubularlabs.net:9200/intelligence/video/_search?q=channel_youtube_id:""%s""%20AND%20published:%3E20150715T000000Z%20AND%20published:%3C20150716T000000Z")
reader = csv.reader(open('input.csv', 'r+U'), delimiter=',', quoting=csv.QUOTE_NONE)
cookie = {"user": "2|1:0|10:1438908462|4:user|36:eyJhaWQiOiA1Njk3LCAiaWQiOiA2MzQ0fQ==|b5c4b3adbd96e54833bf8656625aedaf715d4905f39373b860c4b4bc98655e9e"}
# idsToProcess = []
# for row in reader:
# if len(row)>0:
# idsToProcess.append(row[0])
idsToProcess = ['qdGW_m8Rim4FeMM29keDEg']
for userID in idsToProcess:
# print "fetching for %s.." % fbid
url = SEARCH_URL % userID
soyuzResponse = None
response = requests.request("GET", url, cookies=cookie)
ret = response.json()
soyuzResponse = ret['hits']['hits'][0]['_source']['youtube']
print soyuzResponse
totalDelta = 0
totalEngagementsVal = 0
totalSharesVal = 0
totalCommentsVal = 0
valuesArr = []
for entry in valuesArr:
arrEngagements = entry['engagements']
arrShares = entry['shares']
arrComments = entry['comments']
if len(arrEngagements)>0:
totalEngagementsVal = arrEngagements
elif len(arrShares)>0:
totalSharesVal = arrShares
elif len(arrComments)>0:
totalCommentsVal = arrComments
print "%s,%s" % (userID,totalEngagementsVal+totalSharesVal+totalCommentsVal)
totalDelta += totalEngagementsVal+totalSharesVal+totalCommentsVal
time.sleep(0)
print "%s,%s" % (userID,totalDelta)
exit()
Here is the json I am parsing:
took: 371,
timed_out: false,
_shards: {
total: 128,
successful: 128,
failed: 0
},
hits: {
total: 1,
max_score: 9.335125,
hits: [
{
_index: "intelligence_v2",
_type: "video",
_id: "jW7mjVdzR_U",
_score: 9.335125,
_source: {
claim: [
"Blucollection%2Buser"
],
topics: [
{
title_non: "Toy",
topic_id: "/m/0138tl",
title: "Toy"
},
{
title_non: "Sofia the First",
topic_id: "/m/0ncq483",
title: "Sofia the First"
}
],
likes: 1045,
duration: 318,
channel_owner_type: "influencer",
category: "Entertainment",
imported: "20150809T230652Z",
title: "Princess Sofia Cash Register Toy Surprise - Play Doh Caja Registradora Disney Sofia the First",
audience_location: [
{
country: "US",
value: 100
}
],
comments: 10,
twitter: {
tweets: 6,
engagements: 6
},
description: "Disney Princess "Sofia Cash Register" toy unboxing review by DisneyCollector. This is the authentic Royal toy of Sofia fit for a little Princess of Enchantia. Young Girls learn early on how mathematics is important in our lives, and learn to do math, developing creativity with a super fun game! Thx 4 watching this "Disney Princess Sofia cash register" unboxing review. In this video i also used Disney Frozen Princess Anna, Nickelodeon Peppa Pig blind bag and plastilina Play-Doh. Revisión del juguete Princesita Sofía Caja Registradora Real para niños y niñas. Las niñas aprenden desde muy temprano cómo las matemáticas es importante en nuestras vidas, y aprenden a hacer matemáticas, el desarrollo de la creatividad con un juego súper divertido! Here's how to say Princess in other languages: printzesa, 公主, prinses, prenses, printsess, princesse, Prinzessin, puteri, banphrionsa, Principesse, principessa, プリンセス, princese, puteri, prinsessa,prinsesse, princesa, công chúa, tywysoges, Princesses Disney, Prinzessinen, 공주, Princesas Disney, Disney πριγκίπισσες, Дисней принцесс, 디즈니 공주, ディズニーのお姫様, Vorstin, koningsdochter, Fürstin, πριγκίπισσα, księżniczka, królewna, принцесса. Here's how register is called in other languages: Caja Registradora de Princesa Sofía, Caisse Enregistreuse Princesse Sofia, Kassa, Registrierkasse Sofia die Erste Auf einmal Prinzessin, Registratore di Cassa di La Principessa Sofia, Caixa Registadora da Princesa Sofia, ηλεκτρονική ταμειακή μηχανή Σοφία η Πριγκίπισσα, 電子式金銭登録機 ちいさなプリンセス ソフィア, София Прекрасная кассовый аппарат, 디즈니주니어 리틀 프린세스 소피아 전자 금전 등록기, máy tính tiền điện tử, daftar uang elektronik, elektronik yazarkasa, Sofia den första kassaapparat leksak, Jej Wysokość Zosia kasa zabawki, Sofia het prinsesje kassa speelgoed, София Първа касов апарат играчка, casa de marcat jucărie Sofia Întâi. Princess Sofia SLEEPOVER Slumber Party - Princesita Sofía Pijamada Real. https://www.youtube.com/watch?v=WSa-Tp7HfyQ Princesita Sofía Castillo Mágico Parlante juguete de niñas. https://www.youtube.com/watch?v=ALQm_3uhIyg Sofia the First Magical Talking Castle Royal Prep Academy. https://www.youtube.com/watch?v=gcUiY0Suzrc Play-Doh Meal Makin' Kitchen with Princess Sofia the First. https://www.youtube.com/watch?v=x_-OxnRXj6g Sofia the First Royal Prep Academy Dolls Character Collection. https://www.youtube.com/watch?v=_kNY6AkSp9g Peppa Pig Picnic Adventure Car With Princess Sofia the First. https://www.youtube.com/watch?v=KIPH3txlq1o Watch "Sofia the First Talking Magic Castle" talking Clover: https://www.youtube.com/watch?v=ALQm_3uhIyg Play-Doh Sofia the First Magic Talking Castle w/ Peppa Pig: https://www.youtube.com/watch?v=-slXqMiDrY0 Play-Doh Sofia the First Going to School Portable Classroom http://www.youtube.com/watch?v=0R-dkVAIUlA",
views: 941726,
channel_network: null,
channel_subscribers: 5054024,
youtube_id: "jW7mjVdzR_U",
facebook: {
engagements: 9,
likes: 2,
shares: 7,
comments: 0
},
location_demo_count: 1,
is_public: true,
engagements: 1070,
channel_country: "US",
demo_count: null,
monetizable: true,
youtube: {
engagements: 1055,
likes: 1045,
comments: 10
},
published: "20150715T100003Z",
channel_youtube_id: "qdGW_m8Rim4FeMM29keDEg"
}
}
]
}
}
Response from terminal after running script:
{u'engagements': 1055, u'likes': 1045, u'comments': 10}
qdGW_m8Rim4FeMM29keDEg,0
qdGW_m8Rim4FeMM29keDEg,0
Your problem is these two lines:
valuesArr = []
for entry in valuesArr:
Because valuesArr is empty, the for loop never iterates, and that's where your totals are being summed.

Categories

Resources