Convert Json to XML - Json with array values - python

I know there's a lot of similar questions like mine, but none of them worked for me.
My json file has array for actors, directors and genre. I'm having difficult to deal if this arrays while building the xml.
This is the json file:
[
{
"title":"The Kissing Booth",
"year":"2018",
"actors":[
"Megan du Plessis",
"Lincoln Pearson",
"Caitlyn de Abrue",
"Jack Fokkens",
"Stephen Jennings",
"Chloe Williams",
"Michael Miccoli",
"Juliet Blacher",
"Jesse Rowan-Goldberg",
"Chase Dallas",
"Joey King",
"Joel Courtney",
"Jacob Elordi",
"Carson White",
"Hilton Pelser"
],
"genre":[
"Comedy",
"Romance"
],
"description":"A high school student is forced to confront her secret crush at a kissing booth.",
"directors":[
"Vince Marcello"
]
},
{
"title":"Dune",
"year":"2020",
"actors":[
"Rebecca Ferguson",
"Zendaya",
"Jason Momoa",
"Timoth\u00e9e Chalamet",
"Dave Bautista",
"Josh Brolin",
"Oscar Isaac",
"Stellan Skarsg\u00e5rd",
"Javier Bardem",
"Charlotte Rampling",
"David Dastmalchian",
"Stephen McKinley Henderson",
"Sharon Duncan-Brewster",
"Chen Chang",
"Babs Olusanmokun"
],
"genre":[
"Adventure",
"Drama",
"Sci-Fi"
],
"description":"Feature adaptation of Frank Herbert's science fiction novel, about the son of a noble family entrusted with the protection of the most valuable asset and most vital element in the galaxy.",
"directors":[
"Denis Villeneuve"
]
},
{
"title":"Parasite",
"year":"2019",
"actors":[
"Kang-ho Song",
"Sun-kyun Lee",
"Yeo-jeong Jo",
"Woo-sik Choi",
"So-dam Park",
"Jeong-eun Lee",
"Hye-jin Jang",
"Myeong-hoon Park",
"Ji-so Jung",
"Hyun-jun Jung",
"Keun-rok Park",
"Jeong Esuz",
"Jo Jae-Myeong",
"Ik-han Jung",
"Kim Gyu Baek"
],
"genre":[
"Comedy",
"Drama",
"Thriller"
],
"description":"Greed and class discrimination threaten the newly formed symbiotic relationship between the wealthy Park family and the destitute Kim clan.",
"directors":[
"Bong Joon Ho"
]
},
{
"title":"Money Heist",
"year":null,
"actors":[
"\u00darsula Corber\u00f3",
"\u00c1lvaro Morte",
"Itziar Itu\u00f1o",
"Pedro Alonso",
"Miguel Herr\u00e1n",
"Jaime Lorente",
"Esther Acebo",
"Enrique Arce",
"Darko Peric",
"Alba Flores",
"Fernando Soto",
"Mario de la Rosa",
"Juan Fern\u00e1ndez",
"Rocco Narva",
"Paco Tous",
"Kiti M\u00e1nver",
"Hovik Keuchkerian",
"Rodrigo De la Serna",
"Najwa Nimri",
"Luka Peros",
"Roberto Garcia",
"Mar\u00eda Pedraza",
"Fernando Cayo",
"Antonio Cuellar Rodriguez",
"Anna Gras",
"Aitana Rinab Perez",
"Olalla Hern\u00e1ndez",
"Carlos Su\u00e1rez",
"Mari Carmen S\u00e1nchez",
"Antonio Romero",
"Pep Munn\u00e9"
],
"genre":[
"Action",
"Crime",
"Mystery",
"Thriller"
],
"description":"An unusual group of robbers attempt to carry out the most perfect robbery in Spanish history - stealing 2.4 billion euros from the Royal Mint of Spain."
},
{
"title":"The Vampire Diaries",
"year":null,
"actors":[
"Paul Wesley",
"Ian Somerhalder",
"Kat Graham",
"Candice King",
"Zach Roerig",
"Michael Trevino",
"Nina Dobrev",
"Steven R. McQueen",
"Matthew Davis",
"Michael Malarkey"
],
"genre":[
"Drama",
"Fantasy",
"Horror",
"Mystery",
"Romance",
"Thriller"
],
"description":"The lives, loves, dangers and disasters in the town, Mystic Falls, Virginia. Creatures of unspeakable horror lurk beneath this town as a teenage girl is suddenly torn between two vampire brothers."
}
]
I want to convert my json file to xml, and I my python code:
import json as j
import xml.etree.ElementTree as ET
with open("imdb_movie_sample.json") as json_format_file:
data = j.load(json_format_file)
root = ET.Element("movie")
ET.SubElement(root,"title").text = data["title"]
ET.SubElement(root,"year").text = str(data["year"])
actors = ET.SubElement(root,"actors") #.text = data["actors"]
actors.text = ''
for i in jsondata[0]['movie'][0]['actors']:
actors.text = actors.text + '\n\t\t' + i
genre = ET.SubElement(root,"genre") #.text = data["genre"]
genre.text = ''
for i in jsondata[0]['movie'][0]['genre']:
genre.text = genre.text + '\n\t\t' + i
ET.SubElement(root,"description").text = data["description"]
directors = ET.SubElement(root,"directors") #.text = data["directors"]
directors.text = ''
for i in jsondata[0]['movie'][0]['directors']:
directors.text = directors.text + '\n\t\t' + i
tree = ET.ElementTree(root)
tree.write("imdb_sample.xml")
Does anyone know how to help me doing this? Thanks.

I found this on pypi. I would always try looking on pypi to see what exists before asking others. Its an awesome resource with python packages created by tons of developers.
https://pypi.org/project/json2xml/

Related

Trying to sum Python list [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
this is my first post here, I am learning and practicing Python.
The problem is that anything I try to run after a for loop not running, so at the end I can’t get a total count. Maybe I should have created a function but now I need to know why this is happening, what have I done wrong?
lst = ["32.225.012", "US", "574.280", "17.997.267", "India", "201.187", "14.521.289", "Brazil", "398.185", "5.626.942", "France", "104.077", "4.751.026", "Turkey", "39.398", "4.732.981", "Russia", "107.547", "4.427.433", "United Kingdom", "127.734", "3.994.894", "Italy", "120.256", "3.504.799", "Spain", "77.943", "3.351.014", "Germany", "82.395", "2.905.172", "Argentina", "62.599", "2.824.626", "Colombia", "72.725", "2.776.927", "Poland", "66.533", "2.459.906", "Iran", "70.966", "2.333.126", "Mexico", "215.547", "2.102.130", "Ukraine", "45.211", "1.775.062", "Peru", "60.416", "1.657.035", "Indonesia", "45.116", "1.626.033", "Czechia", "29.141", "1.578.450", "South Africa", "54.285", "1.506.455", "Netherlands", "17.339", "1.210.077", "Canada", "24.105", "1.184.271", "Chile", "26.073", "1.051.868", "Iraq", "15.392", "1.051.779", "Romania", "27.833", "1.020.495", "Philippines", "17.031", "979.034", "Belgium", "24.104", "960.520", "Sweden", "14.000", "838.323", "Israel", "6.361", "835.563", "Portugal", "16.973", "810.231", "Pakistan", "17.530", "774.399", "Hungary", "27.172", "754.614", "Bangladesh", "11.305", "708.265", "Jordan", "8.754", "685.937", "Serbia", "6.312", "656.077", "Switzerland", "10.617", "614.510", "Austria", "10.152", "580.666", "Japan", "10.052", "524.241", "Lebanon", "7.224", "516.301", "United Arab Emirates", "1.580", "510.465", "Morocco", "9.015", "415.281", "Saudi Arabia", "6.935", "402.491", "Bulgaria", "16.278", "401.593", "Malaysia", "1.477", "381.180", "Slovakia", "11.611", "377.662", "Ecuador", "18.470", "366.709", "Kazakhstan", "3.326", "363.533", "Panama", "6.216", "355.924", "Belarus", "2.522", "340.493", "Greece", "10.242", "327.737", "Croatia", "7.001", "316.521", "Azerbaijan", "4.461", "312.699", "Nepal", "3.211","307.401", "Georgia", "4.077", "305.313", "Tunisia", "10.563", "300.258", "Bolivia", "12.885", "294.550", "West Bank and Gaza", "3.206", "271.814", "Paraguay", "6.094", "271.145", "Kuwait", "1.546", "265.819", "Dominican Republic", "3.467", "255.288", "Ethiopia", "3.639", "250.479", "Denmark", "2.482", "250.138", "Moldova", "5.780", "247.857", "Ireland", "4.896", "244.555", "Lithuania", "3.900", "243.167", "Costa Rica", "3.186", "238.421", "Slovenia", "4.236", "224.621", "Guatemala", "7.478", "224.517", "Egypt", "13.168", "214.872", "Armenia", "4.071", "208.356", "Honduras", "5.212", "204.289", "Qatar", "445","197.378", "Bosnia and Herzegovina", "8.464", "193.721", "Venezuela", "2.082", "192.326", "Oman", "2.001","190.096", "Uruguay", "2.452", "176.701", "Libya", "3.019","174.659", "Bahrain", "632","164.912", "Nigeria", "2.063", "158.326", "Kenya", "2.688","151.569", "North Macedonia", "4.772", "142.790", "Burma", "3.209","130.859", "Albania", "2.386", "121.580", "Algeria", "3.234", "121.232", "Estonia", "1.148", "120.673", "Korea. South", "1.821", "117.099", "Latvia", "2.118", "111.915", "Norway", "753","104.953", "Sri Lanka", "661", "104.512", "Cuba", "614","103.638", "Kosovo", "2.134", "102.426", "China", "4.845","97.080", "Montenegro", "1.485", "94.599", "Kyrgyzstan", "1.592", "92.513", "Ghana", "779","91.484", "Zambia", "1.249","90.008", "Uzbekistan", "646", "86.405", "Finland", "908","69.804", "Mozambique", "814", "68.922", "El Salvador", "2.117", "66.826", "Luxembourg", "792", "65.998", "Cameroon", "991","63.720", "Cyprus", "303","61.699", "Thailand", "178","61.086", "Singapore", "30","59.370", "Afghanistan", "2.611", "48.177", "Namibia", "638","46.600", "Botswana", "702","45.885", "Cote d'Ivoire", "285", "45.292", "Jamaica", "770","41.766", "Uganda", "341","40.249", "Senegal", "1.107", "38.191", "Zimbabwe", "1.565", "36.510", "Madagascar", "631", "34.052", "Malawi", "1.147","33.944", "Sudan", "2.349","33.608", "Mongolia", "97","30.249", "Malta", "413","29.768", "Congo Kinshasa", "763", "29.749", "Australia", "910", "29.052", "Maldives", "72","25.942", "Angola", "587","24.888", "Rwanda", "332","23.181", "Cabo Verde", "213", "22.568", "Gabon", "138","22.513", "Syria", "1.572","22.087", "Guinea", "141","18.452", "Eswatini", "671","18.314", "Mauritania", "455", "13.915", "Somalia", "713","13.780", "Mali", "477","13.308", "Tajikistan", "90", "13.286", "Burkina Faso", "157", "13.148", "Andorra", "125","13.017", "Haiti", "254","12.963", "Guyana", "293","12.898", "Togo", "122","12.631", "Belize", "322","11.761", "Cambodia", "88","10.986", "Djibouti", "142","10.915", "Papua New Guinea", "107", "10.730", "Lesotho", "316","10.678", "Congo Brazzaville", "144", "10.553", "South Sudan", "114", "10.220", "Bahamas", "198","10.170", "Trinidad and Tobago", "163", "10.157", "Suriname", "201","7.821", "Benin", "99","7.559", "Equatorial Guinea", "107", "6.898", "Nicaragua", "182","6.456", "Iceland", "29","6.359", "Central African Republic", "87", "6.220", "Yemen", "1.207","5.882", "Gambia", "174","5.354", "Seychelles", "26","5.220", "Niger", "191","5.059", "San Marino", "90","4.789", "Chad", "170","4.508", "Saint Lucia", "74", "4.049", "Sierra Leone", "79", "3.941", "Burundi", "6","3.833", "Comoros", "146","3.831", "Barbados", "44","3.731", "Guinea-Bissau", "67", "3.659", "Eritrea", "10","2.908", "Liechtenstein", "57", "2.865", "Vietnam", "35","2.610", "New Zealand", "26", "2.447", "Monaco", "32","2.301", "Sao Tome and Principe", "35", "2.124", "Timor-Leste", "3","2.099", "Liberia", "85","1.850", "Saint Vincent and the Grenadines", "11", "1.232", "Antigua and Barbuda", "32", "1.207", "Mauritius", "17","1.116", "Taiwan", "12","1.059", "Bhutan", "1","712", "Diamond Princess", "13", "604", "Laos", "0","509", "Tanzania", "21","224", "Brunei", "3","173", "Dominica", "0","159", "Grenada", "1","111", "Fiji", "2","44", "Saint Kitts and Nevis", "0", "27", "Holy See", "0","20", "Solomon Islands", "0", "9", "MS Zaandam", "2","4", "Marshall Islands", "0", "4", "Vanuatu", "1","3", "Samoa", "0","1", "Micronesia", "0"]
countryIndex = 1
casesIndex = 0
deathsIndex = 2
countries = []
cases = []
deaths = []
for item in lst:
print(f"Country: {lst[countryIndex]}")
print(f"Cases: {lst[casesIndex]}")
print(f"Deaths: {lst[deathsIndex]}")
print("")
countryToAppend = lst[countryIndex]
casesToAppend = lst[casesIndex]
deathsToAppend = lst[deathsIndex]
countries.append(countryToAppend)
cases.append(casesToAppend)
deaths.append(deathsToAppend)
countryIndex += 3
casesIndex += 3
deathsIndex += 3
total = sum(deaths)
print(f"Total deaths: {total}")
On top of the suggestion to replace the name of the data set to not use the reserved word list, my recommendation would be to leverage the ability to skip in the builtin range in an example like so:
# Lists to store data
countries = []
total_cases = []
total_deaths = []
# Iterate over thje range of the data skipping 3 at a time: 0, 3, ...
for x in range(0, len(data), 3):
# Parse out the cases a deaths to ints
cases = int(data[x].replace('.', ''))
deaths = int(data[x+2].replace('.', ''))
# We can just extract the country label
country_label = data[x+1]
countries.append(country_label)
total_cases.append(cases)
total_deaths.append(deaths)
# Get the desired sums
sum_cases = sum(total_cases)
sum_deaths = sum(total_deaths)
print(f"The total cases: {sum_cases}")
print(f"The total deaths: {sum_deaths}")
Above I renamed your dataset to be data and was able to sum up each list.
sum = 0
for i in range(2,len(l),3): # l is your list of data ,
sum = sum+int(l[i].replace('.','')) # here I removed the point between numbers ex: 574.280 --> 574280
print(sum)
#output : 3145239

Iterating through JSON Array in Python Match Array of Values

I'm working with the songkick api found here and am very close to finishing a program I'm working on to pull info about a few specific artists upcoming shows. I have inputted an array of metro_areas that I want to track and output ONLY shows within THESE cities and their accompanying ID's. Other info I'm pulling is date of show, artist name, venue name. Basically right now my program is able to pull every show from the list of artist_ids I've inputted iterating through the id's in the request url for the range of dates in my parameters. My current output looks something like this:
['Date', 'Artist Name', 'Venue Name', 'City', 'metroArea ID']
['FEB - 7', 'Rosalia', 'NOTO', 'Philadelphia, PA, US', 5202]
['FEB - 8', 'Rosalia', 'Audio', 'San Francisco, CA, US', 26330]
['FEB - 8', 'Kid Cudi', 'Shady Park', 'Tempe, AZ, US', 23068]
['FEB - 8', 'Kid Cudi', 'Madison Square Garden', 'New York City, NY, US', 7644]
I want the output to be like this:
['FEB - 8', 'Rosalia', 'Audio', 'San Francisco, CA, US', 26330]
['FEB - 8', 'Kid Cudi', 'Madison Square Garden', 'New York City, NY, US', 7644]
Based on this array I've defined at the beginning of my program to match with the metro_areas in the songkick object array.
metro_areas = [
('Los Angeles', '17835'),
('San Francisco', '26330'),
('New York City', '7644'),
('Seattle', '2846'),
('Nashville', '11104')
]
Here is the json object array that I pull for each artist_id:
{
"resultsPage": {
"results": {
"event": [
{
"id":11129128,
"type":"Concert",
"uri":"http://www.songkick.com/concerts/11129128-wild-flag-at-fillmore?utm_source=PARTNER_ID&utm_medium=partner",
"displayName":"Wild Flag at The Fillmore (April 18, 2012)",
"start": {
"time":"20:00:00",
"date":"2012-04-18",
"datetime":"2012-04-18T20:00:00-0800"
},
"performance": [
{
"artist": {
"id":29835,
"uri":"http://www.songkick.com/artists/29835-wild-flag?utm_source=PARTNER_ID&utm_medium=partner",
"displayName":"Wild Flag",
"identifier": []
},
"id":21579303,
"displayName":"Wild Flag",
"billingIndex":1,
"billing":"headline"
}
],
"location": {
"city":"San Francisco, CA, US",
"lng":-122.4332937,
"lat":37.7842398
},
"venue": {
"id":6239,
"displayName":"The Fillmore",
"uri":"http://www.songkick.com/venues/6239-fillmore?utm_source=PARTNER_ID&utm_medium=partner",
"lng":-122.4332937,
"lat":37.7842398,
"metroArea": {
"id":26330,
"uri":"http://www.songkick.com/metro-areas/26330-us-sf-bay-area?utm_source=PARTNER_ID&utm_medium=partner",
"displayName":"SF Bay Area",
"country": { "displayName":"US" },
"state": { "displayName":"CA" }
}
},
"status":"ok",
"popularity":0.012763
}, ....
]
},
"totalEntries":24,
"perPage":50,
"page":1,
"status":"ok"
}
}
Some more code to see how I'm getting to my output from the JSON in the songkick requests.
metro_areas = [
('Los Angeles','17835'),
('San Francisco', '26330'),
('New York City','7644'),
('Seattle','2846'),
('Nashville','11104')
]
# artists we want to track
artist_ids = [
('Rosalia', '4610868'), ('EARTHGANG', '5720759'), ('Kid Cudi', '8630279'), ('Kanye West', '5566863'),
('Ludacris', '398291'), ('Hayley Williams', '10087966')
]
# Fetch existing events in each metro area
for artist_id in artist_ids:
params = {
'apikey': 'API_KEY',
'min_date': '2020-02-01',
'max_date': '2020-02-08',
# 'type': 'Concert'
}
r = requests.get('https://api.songkick.com/api/3.0/artists/' + artist_id[1] + '/calendar.json', params=params)
response = r.json()
shows = response['resultsPage']['results']
for show in shows:
try:
shows = shows['event']
formatted_shows = [{
'artistID': [perf['artist']['id'] for perf in s['performance']],
'date': s['start']['date'],
'name': [perf['artist']['displayName'] for perf in s['performance']],
'metroArea': s['venue']['metroArea']['id'],
'city': s['location']['city'],
'venue': s['venue']['displayName']
}
for s in shows if len(s['performance']) > 0
]
for sub in formatted_shows:
if sub['artistID'] == artist_id[1]:
sub['name'] = artist_id[0]
new_show = artist_id[1]
new_show_name = artist_id[0]
new_date = sub['date']
new_date_time = new_date = datetime.strptime(new_date, '%Y-%m-%d')
date_time_fin = new_date_time.strftime('%b - %-d').upper()
formatted_show_final = [date_time_fin, new_show_name, sub['venue'], sub['city'], sub['metroArea']
print(formatted_show_final)
Long story short, I need to find a way to iterate through each of my listed metro_areas id's (LA, SF, NYC, Seattle, Nashville) only and only output the shows that match with 'metroArea': s['venue']['metroArea']['id'] for each request iteration.
If I understood well the question, add inside the second for loop: if sub['metroArea'] in [area[1] for area in metro_areas]:
for show in shows:
try:
shows = shows['event']
formatted_shows = [{
'artistID': [perf['artist']['id'] for perf in s['performance']],
'date': s['start']['date'],
'name': [perf['artist']['displayName'] for perf in s['performance']],
'metroArea': s['venue']['metroArea']['id'],
'city': s['location']['city'],
'venue': s['venue']['displayName']
}
for s in shows if len(s['performance']) > 0
]
for sub in formatted_shows:
#Modified here to apply str() function to transform #sub['metroArea'] to string instead of int value
if str(sub['metroArea']) in [area[1] for area in metro_areas]:
if sub['artistID'] == artist_id[1]:
sub['name'] = artist_id[0]
new_show = artist_id[1]
new_show_name = artist_id[0]
new_date = sub['date']
new_date_time = new_date = datetime.strptime(new_date, '%Y-%m-%d')
date_time_fin = new_date_time.strftime('%b - %-d').upper()
formatted_show_final = [date_time_fin, new_show_name, sub['venue'], sub['city'], sub['metroArea']]
print(formatted_show_final)

How to extract data from asn1 data file and load it into a dataframe?

My ultimate goal is to load meta data received from PubMed into a pyspark dataframe.
So far, I have managed to download the data I want from the PubMed data base using a shell script.
The downloaded data is in asn1 format. Here is an example of a data entry:
Pubmed-entry ::= {
pmid 31782536,
medent {
em std {
year 2019,
month 11,
day 30,
hour 6,
minute 0
},
cit {
title {
name "Impact of CYP2C19 genotype and drug interactions on voriconazole
plasma concentrations: a spain pharmacogenetic-pharmacokinetic prospective
multicenter study."
},
authors {
names std {
{
name ml "Blanco Dorado S",
affil str "Pharmacy Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
Pharmacology Group, University Clinical Hospital, Health Research Institute
of Santiago de Compostela (IDIS). Santiago de Compostela, Spain.; Department
of Pharmacology, Pharmacy and Pharmaceutical Technology, Faculty of Pharmacy,
University of Santiago de Compostela (USC). Santiago de Compostela, Spain."
},
{
name ml "Maronas O",
affil str "Genomic Medicine Group, Centro Nacional de Genotipado
(CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
Santiago de Compostela, Spain."
},
{
name ml "Latorre-Pellicer A",
affil str "Genomic Medicine Group, Centro Nacional de Genotipado
(CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
Santiago de Compostela, Spain."
},
{
name ml "Rodriguez Jato T",
affil str "Pharmacy Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
},
{
name ml "Lopez-Vizcaino A",
affil str "Pharmacy Department, University Hospital Lucus Augusti
(HULA). Lugo, Spain."
},
{
name ml "Gomez Marquez A",
affil str "Pharmacy Department, University Hospital Ourense
(CHUO). Ourense, Spain."
},
{
name ml "Bardan Garcia B",
affil str "Pharmacy Department, University Hospital Ferrol (CHUF).
A Coruna, Spain."
},
{
name ml "Belles Medall D",
affil str "Pharmacy Department, General University Hospital
Castellon (GVA). Castellon, Spain."
},
{
name ml "Barbeito Castineiras G",
affil str "Microbiology Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
},
{
name ml "Perez Del Molino Bernal ML",
affil str "Microbiology Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain."
},
{
name ml "Campos-Toimil M",
affil str "Department of Pharmacology, Pharmacy and Pharmaceutical
Technology, Faculty of Pharmacy, University of Santiago de Compostela (USC).
Santiago de Compostela, Spain."
},
{
name ml "Otero Espinar F",
affil str "Department of Pharmacology, Pharmacy and Pharmaceutical
Technology, Faculty of Pharmacy, University of Santiago de Compostela (USC).
Santiago de Compostela, Spain."
},
{
name ml "Blanco Hortas A",
affil str "Epidemiology Unit. Fundacion Instituto de Investigacion
Sanitaria de Santiago de Compostela (FIDIS), University Hospital Lucus
Augusti (HULA), Spain."
},
{
name ml "Duran Pineiro G",
affil str "Clinical Pharmacology Group, University Clinical
Hospital, Health Research Institute of Santiago de Compostela (IDIS).
Santiago de Compostela, Spain."
},
{
name ml "Zarra Ferro I",
affil str "Pharmacy Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
Pharmacology Group, University Clinical Hospital, Health Research Institute
of Santiago de Compostela (IDIS). Santiago de Compostela, Spain."
},
{
name ml "Carracedo A",
affil str "Genomic Medicine Group, Centro Nacional de Genotipado
(CEGEN-PRB3), CIBERER, CIMUS, University of Santiago de Compostela (USC),
Santiago de Compostela, Spain.; Galician Foundation of Genomic Medicine,
Health Research Institute of Santiago de Compostela (IDIS), SERGAS, Santiago
de Compostela, Spain."
},
{
name ml "Lamas MJ",
affil str "Clinical Pharmacology Group, University Clinical
Hospital, Health Research Institute of Santiago de Compostela (IDIS).
Santiago de Compostela, Spain."
},
{
name ml "Fernandez-Ferreiro A",
affil str "Pharmacy Department, University Clinical Hospital
Santiago de Compostela (CHUS). Santiago de Compostela, Spain.; Clinical
Pharmacology Group, University Clinical Hospital, Health Research Institute
of Santiago de Compostela (IDIS). Santiago de Compostela, Spain.; Department
of Pharmacology, Pharmacy and Pharmaceutical Technology, Faculty of Pharmacy,
University of Santiago de Compostela (USC). Santiago de Compostela, Spain."
}
}
},
from journal {
title {
iso-jta "Pharmacotherapy",
ml-jta "Pharmacotherapy",
issn "1875-9114",
name "Pharmacotherapy"
},
imp {
date std {
year 2019,
month 11,
day 29
},
language "eng",
pubstatus aheadofprint,
history {
{
pubstatus other,
date std {
year 2019,
month 11,
day 30,
hour 6,
minute 0
}
},
{
pubstatus pubmed,
date std {
year 2019,
month 11,
day 30,
hour 6,
minute 0
}
},
{
pubstatus medline,
date std {
year 2019,
month 11,
day 30,
hour 6,
minute 0
}
}
}
}
},
ids {
pubmed 31782536,
doi "10.1002/phar.2351",
other {
db "ELocationID doi",
tag str "10.1002/phar.2351"
}
}
},
abstract "BACKGROUND: Voriconazole, a first-line agent for the treatment
of invasive fungal infections, is mainly metabolized by cytochrome P450 (CYP)
2C19. A significant portion of patients fail to achieve therapeutic
voriconazole trough concentrations, with a consequently increased risk of
therapeutic failure. OBJECTIVE: To show the association between
subtherapeutic voriconazole concentrations and factors affecting voriconazole
pharmacokinetics: CYP2C19 genotype and drug-drug interactions. METHODS:
Adults receiving voriconazole for antifungal treatment or prophylaxis were
included in a multicenter prospective study conducted in Spain. The
prevalence of subtherapeutic voriconazole troughs were analyzed in the rapid
metabolizer and ultra-rapid metabolizer patients (RMs and UMs, respectively),
and compared with the rest of the patients. The relationship between
voriconazole concentration, CYP2C19 phenotype, adverse events (AEs), and
drug-drug interactions was also assessed. RESULTS: In this study 78 patients
were included with a wide variability in voriconazole plasma levels with only
44.8% of patients attaining trough concentrations within the therapeutic
range of 1 and 5.5 microg/ml. The allele frequency of *17 variant was found
to be 29.5%. Compared with patients with other phenotypes, RMs and UMs had a
lower voriconazole plasma concentration (RM/UM: 1.85+/-0.24 microg/ml versus
other phenotypes: 2.36+/-0.26 microg/ml, ). Adverse events were more common
in patients with higher voriconazole concentrations (p<0.05). No association
between voriconazole trough concentration and other factors (age, weight,
route of administration, and concomitant administration of enzyme inducer,
enzyme inhibitor, glucocorticoids, or proton pump inhibitors) was found.
CONCLUSION: These results suggest the potential clinical utility of using
CYP2C19 genotype-guided voriconazole dosing to achieve concentrations in the
therapeutic range in the early course of therapy. Larger studies are needed
to confirm the impact of pharmacogenetics on voriconazole pharmacokinetics.",
pmid 31782536,
pub-type {
"Journal Article"
},
status publisher
}
}
This is where I am stuck. I do not know how to extract the information from asn1 and get it into a pyspark dataframe. Could anyone suggest a way of doing this?
The above data is definitely in an "ASN.1 format". This format is called ASN.1 Value Notation and is used to represent ASN.1 values textually. (This format pre-dates the standardization of the JSON encoding rules. Today, one could use JSON for the same purpose, with some differences in the way the JSON would be processed compared to the ASN.1 value notation).
The ASN.1 schema that YaFred posted above contains a few errors, as YaFred himself noted. The notation you posted yourself also seems to contain a few errors. I have looked at the whole set of ASN.1 files of NCBI and noticed that they contain several errors. Because of this, they cannot be handled by a standard-conforming ASN.1 tool (such as the ASN.1 playground) unless they are fixed. Some of those errors are easy to fix, but fixing other errors require knowledge of the intent of the author of those files. This state of affairs is probably due to the fact that the NCBI project uses their own ASN.1 toolkit, which perhaps uses ASN.1 in some non-standard way.
I would imagine that in the NCBI toolkit there should be some means for you to decode the above value notation, so if I were you I would look into that toolkit. I am unable to give you a better suggestion because I don't know the NCBI toolkit.
Your problem may not be simple but it's worth experimenting.
Method 1:
As you have the specification, you can try looking for an ASN.1 tool (aka ASN.1 compiler) that will create a data model. In your case, because you downloaded a textual ASN.1 value, you need this tool to provide ASN.1 value decoders.
If the tool was generating Java code, it would go like this:
// decode a Pubmed-entry
// input is your data
Asn1ValueReader reader = new Asn1ValueReader(input);
PubmedEntry obj = PubmedEntry.readPdu(reader);
// access the data
obj.getPmid();
obj.getMedent();
A few caveats:
Tools that can do all that will not be free (if you find one at all). The problem here is that you have a textual ASN1 value while tools will generally provide binary decoders (BER, DER, etc ..)
You have a lot of glue code to write to create the record that goes into you pyspark dataframe
I wrote this some time ago but it does not have the textual ASN1 value decoders
Method 2:
If your data are simple enough and as they are textual data, you can try and write your own parser (using a tool like ANTLR) ... Not easy, to evaluate this method if you are not familiar with parsers.
EDIT:
Unfortunately, the specification is not valid.

Trying to access data in JSON data structure read from file

I have the following Python code to read a JSON file:
import json
from pprint import pprint
with open('traveladvisory.json') as json_data:
print 'json data ',json_data
d = json.load(json_data)
json_data.close()
Below is a piece of the 'traveladvisory.json' file opened with this code. The variable 'd' does print out all the JSON data. But I can't seem to get the syntax correct to read all of the 'country-eng' and 'advisory-text' fields and their data and print it out. Can someone assist? Here's a piece of the json data (sorry, can't get it pretty printed):
{
"metadata":{
"generated":{
"timestamp":1475854624,
"date":"2016-10-07 11:37:04"
}
},
"data":{
"AF":{
"country-id":1000,
"country-iso":"AF",
"country-eng":"Afghanistan",
"country-fra":"Afghanistan",
"advisory-state":3,
"date-published":{
"timestamp":1473866215,
"date":"2016-09-14 11:16:55",
"asp":"2016-09-14T11:16:55.000000-04:00"
},
"has-advisory-warning":1,
"has-regional-advisory":0,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"September 14, 2016 11:16 EDT",
"advisory-text":"Avoid all travel",
"recent-updates":"The Health tab was updated - travel health notices (Public Health Agency of Canada)."
},
"fra":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"14 septembre 2016 11:16 HAE",
"advisory-text":"\u00c9viter tout voyage",
"recent-updates":"L'onglet Sant\u00e9 a \u00e9t\u00e9 mis \u00e0 jour - conseils de sant\u00e9 aux voyageurs (Agence de la sant\u00e9 publique du Canada)."
}
},
"AL":{
"country-id":4000,
"country-iso":"AL",
"country-eng":"Albania",
"country-fra":"Albanie",
"advisory-state":0,
"date-published":{
"timestamp":1473350931,
"date":"2016-09-08 12:08:51",
"asp":"2016-09-08T12:08:51.8301256-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Albania",
"url-slug":"albania",
"friendly-date":"September 8, 2016 12:08 EDT",
"advisory-text":"Exercise normal security precautions (with regional advisories)",
"recent-updates":"An editorial change was made."
},
"fra":{
"name":"Albanie",
"url-slug":"albanie",
"friendly-date":"8 septembre 2016 12:08 HAE",
"advisory-text":"Prendre des mesures de s\u00e9curit\u00e9 normales (avec avertissements r\u00e9gionaux)",
"recent-updates":"Un changement mineur a \u00e9t\u00e9 apport\u00e9 au contenu."
}
},
"DZ":{
"country-id":5000,
"country-iso":"DZ",
"country-eng":"Algeria",
"country-fra":"Alg\u00e9rie",
"advisory-state":1,
"date-published":{
"timestamp":1475593497,
"date":"2016-10-04 11:04:57",
"asp":"2016-10-04T11:04:57.7727548-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Full TAA review",
"eng":{
"name":"Algeria",
"url-slug":"algeria",
"friendly-date":"October 4, 2016 11:04 EDT",
"advisory-text":"Exercise a high degree of caution (with regional advisories)",
"recent-updates":"This travel advice was thoroughly reviewed and updated."
},
"fra":{
"name":"Alg\u00e9rie",
"url-slug":"algerie",
"friendly-date":"4 octobre 2016 11:04 HAE",
"advisory-text":"Faire preuve d\u2019une grande prudence (avec avertissements r\u00e9gionaux)",
"recent-updates":"Les pr\u00e9sents Conseils aux voyageurs ont \u00e9t\u00e9 mis \u00e0 jour \u00e0 la suite d\u2019un examen minutieux."
}
},
}
}
Assuming d contains the json Data
for country in d["data"]:
print "Country :",country
#country.get() gets the value of the key . the second argument is
#the value returned in case the key is not present
print "country-eng : ",country.get("country-eng",0)
print "advisory-text(eng) :",country["eng"].get("advisory-text",0)
print "advisory-text(fra) :",country["fra"].get("advisory-text",0)
This worked for me:
for item in d['data']:
print d['data'][item]['country-eng'], d['data'][item]['eng']['advisory-text']
If I am understanding your question. Here's how to do it:
import json
with open('traveladvisory.json') as json_data:
d = json.load(json_data)
# print(json.dumps(d, indent=4)) # pretty-print data read
for country in d['data']:
print(country)
print(' country-eng: {}'.format(d['data'][country]['country-eng']))
print(' advisory-state: {}'.format(d['data'][country]['advisory-state']))
Output:
DZ
country-eng: Algeria
advisory-state: 1
AL
country-eng: Albania
advisory-state: 0
AF
country-eng: Afghanistan
advisory-state: 3
Then the code bellow work for python2 (if you need the python3 version please ask)
You have to use the load function from the json module
#-*- coding: utf-8 -*-
import json # import the module we need
with open("traveladvisory.json") as f: # f for file
d = json.load(f) # d is a dictionnary
for key in d['data']:
print d['data'][key]['country-eng']
print d['data'][key]['eng']['advisory-text']
It is good practice to use the with statement when dealing with file objects.
This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way.
Also, the json is wrong, you have to remove to comma from line 98:
{
"metadata":{
"generated":{
"timestamp":1475854624,
"date":"2016-10-07 11:37:04"
}
},
"data":{
"AF":{
"country-id":1000,
"country-iso":"AF",
"country-eng":"Afghanistan",
"country-fra":"Afghanistan",
"advisory-state":3,
"date-published":{
"timestamp":1473866215,
"date":"2016-09-14 11:16:55",
"asp":"2016-09-14T11:16:55.000000-04:00"
},
"has-advisory-warning":1,
"has-regional-advisory":0,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"September 14, 2016 11:16 EDT",
"advisory-text":"Avoid all travel",
"recent-updates":"The Health tab was updated - travel health notices (Public Health Agency of Canada)."
},
"fra":{
"name":"Afghanistan",
"url-slug":"afghanistan",
"friendly-date":"14 septembre 2016 11:16 HAE",
"advisory-text":"\u00c9viter tout voyage",
"recent-updates":"L'onglet Sant\u00e9 a \u00e9t\u00e9 mis \u00e0 jour - conseils de sant\u00e9 aux voyageurs (Agence de la sant\u00e9 publique du Canada)."
}
},
"AL":{
"country-id":4000,
"country-iso":"AL",
"country-eng":"Albania",
"country-fra":"Albanie",
"advisory-state":0,
"date-published":{
"timestamp":1473350931,
"date":"2016-09-08 12:08:51",
"asp":"2016-09-08T12:08:51.8301256-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Editorial change",
"eng":{
"name":"Albania",
"url-slug":"albania",
"friendly-date":"September 8, 2016 12:08 EDT",
"advisory-text":"Exercise normal security precautions (with regional advisories)",
"recent-updates":"An editorial change was made."
},
"fra":{
"name":"Albanie",
"url-slug":"albanie",
"friendly-date":"8 septembre 2016 12:08 HAE",
"advisory-text":"Prendre des mesures de s\u00e9curit\u00e9 normales (avec avertissements r\u00e9gionaux)",
"recent-updates":"Un changement mineur a \u00e9t\u00e9 apport\u00e9 au contenu."
}
},
"DZ":{
"country-id":5000,
"country-iso":"DZ",
"country-eng":"Algeria",
"country-fra":"Alg\u00e9rie",
"advisory-state":1,
"date-published":{
"timestamp":1475593497,
"date":"2016-10-04 11:04:57",
"asp":"2016-10-04T11:04:57.7727548-04:00"
},
"has-advisory-warning":0,
"has-regional-advisory":1,
"has-content":1,
"recent-updates-type":"Full TAA review",
"eng":{
"name":"Algeria",
"url-slug":"algeria",
"friendly-date":"October 4, 2016 11:04 EDT",
"advisory-text":"Exercise a high degree of caution (with regional advisories)",
"recent-updates":"This travel advice was thoroughly reviewed and updated."
},
"fra":{
"name":"Alg\u00e9rie",
"url-slug":"algerie",
"friendly-date":"4 octobre 2016 11:04 HAE",
"advisory-text":"Faire preuve d\u2019une grande prudence (avec avertissements r\u00e9gionaux)",
"recent-updates":"Les pr\u00e9sents Conseils aux voyageurs ont \u00e9t\u00e9 mis \u00e0 jour \u00e0 la suite d\u2019un examen minutieux."
}
}
}
}

Python JSON parsing not storing values

I've looked over this many times, and cant seem to find the problem with it.
I am trying to pull 3 fields from a JSON response (engagements, shares, comments), sum them together, then print the sum.
It seems to be returning the fields correctly, but it returns zero in the final print.
I'm very new to this stuff, but would appreciate any help anyone can give me. I'm guessing there is something fundamental I am missing here.
import urllib2,time,csv,json,requests,urlparse,pdb
SEARCH_URL = urllib2.unquote("http://soyuz.elastic.tubularlabs.net:9200/intelligence/video/_search?q=channel_youtube_id:""%s""%20AND%20published:%3E20150715T000000Z%20AND%20published:%3C20150716T000000Z")
reader = csv.reader(open('input.csv', 'r+U'), delimiter=',', quoting=csv.QUOTE_NONE)
cookie = {"user": "2|1:0|10:1438908462|4:user|36:eyJhaWQiOiA1Njk3LCAiaWQiOiA2MzQ0fQ==|b5c4b3adbd96e54833bf8656625aedaf715d4905f39373b860c4b4bc98655e9e"}
# idsToProcess = []
# for row in reader:
# if len(row)>0:
# idsToProcess.append(row[0])
idsToProcess = ['qdGW_m8Rim4FeMM29keDEg']
for userID in idsToProcess:
# print "fetching for %s.." % fbid
url = SEARCH_URL % userID
soyuzResponse = None
response = requests.request("GET", url, cookies=cookie)
ret = response.json()
soyuzResponse = ret['hits']['hits'][0]['_source']['youtube']
print soyuzResponse
totalDelta = 0
totalEngagementsVal = 0
totalSharesVal = 0
totalCommentsVal = 0
valuesArr = []
for entry in valuesArr:
arrEngagements = entry['engagements']
arrShares = entry['shares']
arrComments = entry['comments']
if len(arrEngagements)>0:
totalEngagementsVal = arrEngagements
elif len(arrShares)>0:
totalSharesVal = arrShares
elif len(arrComments)>0:
totalCommentsVal = arrComments
print "%s,%s" % (userID,totalEngagementsVal+totalSharesVal+totalCommentsVal)
totalDelta += totalEngagementsVal+totalSharesVal+totalCommentsVal
time.sleep(0)
print "%s,%s" % (userID,totalDelta)
exit()
Here is the json I am parsing:
took: 371,
timed_out: false,
_shards: {
total: 128,
successful: 128,
failed: 0
},
hits: {
total: 1,
max_score: 9.335125,
hits: [
{
_index: "intelligence_v2",
_type: "video",
_id: "jW7mjVdzR_U",
_score: 9.335125,
_source: {
claim: [
"Blucollection%2Buser"
],
topics: [
{
title_non: "Toy",
topic_id: "/m/0138tl",
title: "Toy"
},
{
title_non: "Sofia the First",
topic_id: "/m/0ncq483",
title: "Sofia the First"
}
],
likes: 1045,
duration: 318,
channel_owner_type: "influencer",
category: "Entertainment",
imported: "20150809T230652Z",
title: "Princess Sofia Cash Register Toy Surprise - Play Doh Caja Registradora Disney Sofia the First",
audience_location: [
{
country: "US",
value: 100
}
],
comments: 10,
twitter: {
tweets: 6,
engagements: 6
},
description: "Disney Princess "Sofia Cash Register" toy unboxing review by DisneyCollector. This is the authentic Royal toy of Sofia fit for a little Princess of Enchantia. Young Girls learn early on how mathematics is important in our lives, and learn to do math, developing creativity with a super fun game! Thx 4 watching this "Disney Princess Sofia cash register" unboxing review. In this video i also used Disney Frozen Princess Anna, Nickelodeon Peppa Pig blind bag and plastilina Play-Doh. Revisión del juguete Princesita Sofía Caja Registradora Real para niños y niñas. Las niñas aprenden desde muy temprano cómo las matemáticas es importante en nuestras vidas, y aprenden a hacer matemáticas, el desarrollo de la creatividad con un juego súper divertido! Here's how to say Princess in other languages: printzesa, 公主, prinses, prenses, printsess, princesse, Prinzessin, puteri, banphrionsa, Principesse, principessa, プリンセス, princese, puteri, prinsessa,prinsesse, princesa, công chúa, tywysoges, Princesses Disney, Prinzessinen, 공주, Princesas Disney, Disney πριγκίπισσες, Дисней принцесс, 디즈니 공주, ディズニーのお姫様, Vorstin, koningsdochter, Fürstin, πριγκίπισσα, księżniczka, królewna, принцесса. Here's how register is called in other languages: Caja Registradora de Princesa Sofía, Caisse Enregistreuse Princesse Sofia, Kassa, Registrierkasse Sofia die Erste Auf einmal Prinzessin, Registratore di Cassa di La Principessa Sofia, Caixa Registadora da Princesa Sofia, ηλεκτρονική ταμειακή μηχανή Σοφία η Πριγκίπισσα, 電子式金銭登録機 ちいさなプリンセス ソフィア, София Прекрасная кассовый аппарат, 디즈니주니어 리틀 프린세스 소피아 전자 금전 등록기, máy tính tiền điện tử, daftar uang elektronik, elektronik yazarkasa, Sofia den första kassaapparat leksak, Jej Wysokość Zosia kasa zabawki, Sofia het prinsesje kassa speelgoed, София Първа касов апарат играчка, casa de marcat jucărie Sofia Întâi. Princess Sofia SLEEPOVER Slumber Party - Princesita Sofía Pijamada Real. https://www.youtube.com/watch?v=WSa-Tp7HfyQ Princesita Sofía Castillo Mágico Parlante juguete de niñas. https://www.youtube.com/watch?v=ALQm_3uhIyg Sofia the First Magical Talking Castle Royal Prep Academy. https://www.youtube.com/watch?v=gcUiY0Suzrc Play-Doh Meal Makin' Kitchen with Princess Sofia the First. https://www.youtube.com/watch?v=x_-OxnRXj6g Sofia the First Royal Prep Academy Dolls Character Collection. https://www.youtube.com/watch?v=_kNY6AkSp9g Peppa Pig Picnic Adventure Car With Princess Sofia the First. https://www.youtube.com/watch?v=KIPH3txlq1o Watch "Sofia the First Talking Magic Castle" talking Clover: https://www.youtube.com/watch?v=ALQm_3uhIyg Play-Doh Sofia the First Magic Talking Castle w/ Peppa Pig: https://www.youtube.com/watch?v=-slXqMiDrY0 Play-Doh Sofia the First Going to School Portable Classroom http://www.youtube.com/watch?v=0R-dkVAIUlA",
views: 941726,
channel_network: null,
channel_subscribers: 5054024,
youtube_id: "jW7mjVdzR_U",
facebook: {
engagements: 9,
likes: 2,
shares: 7,
comments: 0
},
location_demo_count: 1,
is_public: true,
engagements: 1070,
channel_country: "US",
demo_count: null,
monetizable: true,
youtube: {
engagements: 1055,
likes: 1045,
comments: 10
},
published: "20150715T100003Z",
channel_youtube_id: "qdGW_m8Rim4FeMM29keDEg"
}
}
]
}
}
Response from terminal after running script:
{u'engagements': 1055, u'likes': 1045, u'comments': 10}
qdGW_m8Rim4FeMM29keDEg,0
qdGW_m8Rim4FeMM29keDEg,0
Your problem is these two lines:
valuesArr = []
for entry in valuesArr:
Because valuesArr is empty, the for loop never iterates, and that's where your totals are being summed.

Categories

Resources