Reformatting json file - python

I'm trying to reformat a JSON file so I can convert it into a Python dictionary. The file contains line-separated JSON objects with different product info (looks like this):
{"asin": "7301113188", "category": ["Appliances", "Refrigerators, Freezers & Ice Makers"], "description": [], "fit": "", "title": "Tupperware Freezer Square Round Container Set of 6", "also_buy": [], "image": [], "tech2": "", "brand": "Tupperware", "feature": ["Each 3-pc. set includes two 7/8-cup/200 mL and one 1-3/4-cup/400 mL.", "Use them to keep sandwich fillings, salads or leftovers fresh in the refrigerator.", "Gently twist the container to \"pop\" out frozen foods for reheating.", "Dishwasher Safe.", "Set weights less than 13 oz!"], "rank": [">#39,745 in Appliances (See top 100)"], "also_view": [], "details": {}, "main_cat": "Appliances", "similar_item": "", "date": "November 19, 2008", "price": ""}
{"asin": "7861850250", "category": ["Appliances", "Refrigerators, Freezers & Ice Makers"], "tech2": "", "brand": "Tupperware", "feature": ["2 X Tupperware Pure & Fresh Unique Covered Cool Cubes Ice Tray in Purple With Opening Lid Contain 14 Cubes - HerbalStore_24*7", "Package Contain :- 2 Tray", "Each ice tray has a specially designed seal that allows you to fill from the faucet with no spills on the way to the freezer. While freezing, this seal helps keep flavor in and freezer odors out, ensuring you have pure ice every time. For something special, try freezing lemonade, tea or fruit juices in these Ice Tray to give your beverages an extra-flavorful kick. Or add a piece of fruit to each cube for a stylish touch of elegance.", "Sold By:- HerbalStore_24*7", "Free Shipping"], "rank": [">#6,118 in Appliances (See top 100)"], "also_view": ["B004RUGHJW"], "details": {}, "main_cat": "Appliances", "similar_item": "", "date": "June 5, 2016", "price": "$3.62"}
I want the dict to contain key-pair values where each "asin" is a key and the rest of the product info is a value. What's the most optimal way to do this?

You can parse JSON dictionaries using json.loads.
import json
final = {}
for line in lines:
d = json.loads(line)
final[d['asin']] = d
del d['asin']
It's also possible to parse some JSON-like text using ast.literal_eval. The JSON-like text has to not have any boolean or null values, as is the case with your example. Below is the changes needed:
from ast import literal_eval
...
d = literal_eval(line)
...

Related

How can I print the original title of the top 50 movies from themoviedatabase.org with python?

hello all i'm learning python recently and i need to analyze the web of themoviedb.org website. I want to extract all the movies in the database and I want to print the original title of the first 50 movies.This is a piece of the json file that i receive as a response following my network request:
{"page":1,"results":[{"adult":false,"backdrop_path":"/5gPQKfFJnl8d1edbkOzKONo4mnr.jpg","genre_ids":[878,12,28],"id":76600,"original_language":"en","original_title":"Avatar: The Way of Water","overview":"Set more than a decade after the events of the first film, learn the story of the Sully family (Jake, Neytiri, and their kids), the trouble that follows them, the lengths they go to keep each other safe, the battles they fight to stay alive, and the tragedies they endure.","popularity":5332.225,"poster_path":"/t6HIqrRAclMCA60NsSmeqe9RmNV.jpg","release_date":"2022-12-14","title":"Avatar: The Way of Water","video":false,"vote_average":7.7,"vote_count":3497},{......}],"total_pages":36589,"total_results":731777}
And this is my code:
import requests
response = requests.get("https://api.themoviedb.org/3/discover/movie?api_key=my_key&language=en-US&sort_by=popularity.desc&include_adult=false&include_video=false&page=1&with_watch_monetization_types=flatrate")
jsonresponse=response.json()
page=jsonresponse["page"]
results=jsonresponse["results"]
for i in range(50):
for result in jsonresponse["original_title"][i]:
print(result)
My code don't work. Error: "KeyError: 'original_title'". How can I print the original title of the top 50 movies?
When formatting the json you posted properly:
{
"page": 1,
"results": [
{
"adult": false,
"backdrop_path": "/5gPQKfFJnl8d1edbkOzKONo4mnr.jpg",
"genre_ids": [
878,
12,
28
],
"id": 76600,
"original_language": "en",
"original_title": "Avatar: The Way of Water",
"overview": "Set more than a decade after the events of the first film, learn the story of the Sully family (Jake, Neytiri, and their kids), the trouble that follows them, the lengths they go to keep each other safe, the battles they fight to stay alive, and the tragedies they endure.",
"popularity": 5332.225,
"poster_path": "/t6HIqrRAclMCA60NsSmeqe9RmNV.jpg",
"release_date": "2022-12-14",
"title": "Avatar: The Way of Water",
"video": false,
"vote_average": 7.7,
"vote_count": 3497
},
{
....
}
],
"total_pages": 36589,
"total_results": 731777
}
one can easily see that original_tile is part of each dictionary / map in results. So using
for result in results:
print(result["original_title"])
should work.

Write a list of dictionaries (with varying keys) to one .csv file?

Given this dictionary:
{
"last_id": "9095247150673486907",
"stories": [
{
"description": "The $68.7 billion deal would be Microsoft\u2019s biggest takeover ever and the biggest deal in video game history. The acquisition would make Microsoft the world\u2019s third-largest gaming company by revenue,\u2026 The post Following the takeover of Activision by Microsoft, Sony is already being shaken up appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5310290716350155140",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641278000,
"title": "Following the takeover of Activision by Microsoft, Sony is already being shaken up",
"url": "https://gettotext.com/following-the-takeover-of-activision-by-microsoft-sony-is-already-being-shaken-up/"
},
{
"description": "Also Read | Acquisition of Activision Blizzard by Microsoft: an opportunity born out of chaos An announcement of such a nature could only inspire a good number of analysts, whose\u2026 The post Microsoft\u2019s takeover of Activision Blizzard ignites analysts appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "-14419799692027457",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641042000,
"title": "Microsoft\u2019s takeover of Activision Blizzard ignites analysts",
"url": "https://gettotext.com/microsofts-takeover-of-activision-blizzard-ignites-analysts/"
},
{
"description": "Practical in-ears, mini speakers with long battery life or powerful boom boxes \u2013 the manufacturer Anker offers a suitable product for almost every situation. On Ebay and Amazon you can\u2026 The post Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5221754710166764872",
"site": "gettotext.com",
"tags": [
"amzn"
],
"time": 1642640469000,
"title": "Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co",
"url": "https://gettotext.com/anker-on-ebay-and-amazon-on-offer-inexpensive-soundcore-3-motion-boom-co/"
},
{
"favicon_url": "https://static.tickertick.com/website_icons/trib.al.ico",
"id": "-3472956334378244458",
"site": "trib.al",
"tags": [
"goog"
],
"time": 1642640285000,
"title": "Google is forming a group dedicated to blockchain and related technologies under a newly appointed executive",
"url": "https://trib.al/nZz3omw"
},
{
"description": "Texas' attorney general on Wednesday sued Google, alleging the company asked local radio DJs to record personal endorsements for smartphones that they hadn't used or been provided.",
"favicon_url": "https://static.tickertick.com/website_icons/yahoo.com.ico",
"id": "9095247150673486907",
"site": "yahoo.com",
"tags": [
"goog"
],
"time": 1642639680000,
"title": "Texas sues Google over local radio ads for its smartphones",
"url": "https://finance.yahoo.com/m/b44151c6-7276-30d9-bc62-bfe18c6297be/texas-sues-google-over-local.html?.tsrc=rss"
}
]
}
...how can I write the 'stories' list of dictionaries to one csv file, such that the keys are the header row, and the values are all the rest of the rows. Note, that some keys don't appear in ALL of the records (example, some story dictionaries don't have a 'description' key, and some do).
Psuedo might include:
Get all keys in the 'stories' list and assign those as the df's header
Iterate through each story in the 'stories' list and append the appropriate rows, leaving a nan if there isn't a matching key for every column
Looking for a pythonic way of doing this relatively quickly.
UPDATE
Trying this:
# Save to excel file
with open("newsheadlines.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(response['stories'])
...gives this output
Does that help?
Simplest "pythonic" way to do so is by the pandas package.
import pandas as pd
pd.DataFrame(d["stories"]).to_csv('tmp.csv')
# To retrieve it
stories = pd.read_csv('tmp.csv', index_col=0)

How can i get from format json this text?

I have a JSON file that contains several images and annotations. Each image has an id, and each annotation references a caption and the image_id of the image. There are thousands of images and multiple annotations refer to the same image. Here's a sample for only one image and its annotations (link to full data):
{
"images": [
{
"license": 5,
"url": "http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg",
"file_name": "COCO_train2014_000000057870.jpg",
"id": 57870,
"width": 640,
"date_captured": "2013-11-14 16:28:13",
"height": 480
}
],
"annotations": [
{
"image_id": 57870,
"id": 787980,
"caption": "A restaurant has modern wooden tables and chairs."
},
{
"image_id": 57870,
"id": 789366,
"caption": "A long restaurant table with rattan rounded back chairs."
},
{
"image_id": 57870,
"id": 789888,
"caption": "a long table with a plant on top of it surrounded with wooden chairs "
},
{
"image_id": 57870,
"id": 791316,
"caption": "A long table with a flower arrangement in the middle for meetings"
},
{
"image_id": 57870,
"id": 794853,
"caption": "A table is adorned with wooden chairs with blue accents."
}
]
}
I need to reconstruct the format of the text in this file to be like this:
COCO_train2014_000000057870.jpg#0 A restaurant has modern wooden tables and chairs.
COCO_train2014_000000057870.jpg#1 A long restaurant table with rattan rounded back chairs.
COCO_train2014_000000057870.jpg#2 a long table with a plant on top of it surrounded with wooden chairs
COCO_train2014_000000057870.jpg#3 A long table with a flower arrangement in the middle for meetings
COCO_train2014_000000057870.jpg#4 A table is adorned with wooden chairs with blue accents.
I know the idea but couldn't write it in programming well using Python. I need first to check if the image_id is equal or not and if it is equal I need to get their ids and number it from 0 to 4 and get their captions.
After reading in the data, reorganizing into an dictionary indexed by ID will make it easy to access the correct image when iterating the annotations. Below does this, but also adds each caption to a list of captions added to each image:
import json
with open('captions_train2014.json') as f:
data = json.load(f)
# Collect all images into a dictionary indexed by ID
images = {p['id']:p for p in data['images']}
# To each image, add a list of captions
for image in images.values():
image['captions'] = []
# For each annotation, add its caption to its
# corresponding image's caption list.
for annotation in data['annotations']:
image_id = annotation['image_id']
annotation_id = annotation['id']
images[image_id]['captions'].append(annotation['caption'])
# Iterate over images and print captions in the format requested.
for image in images.values():
for i,caption in enumerate(image['captions']):
print(f"{image['file_name']}#{i} {caption}")
Output:
COCO_train2014_000000057870.jpg#0 A restaurant has modern wooden tables and chairs.
COCO_train2014_000000057870.jpg#1 A long restaurant table with rattan rounded back chairs.
COCO_train2014_000000057870.jpg#2 a long table with a plant on top of it surrounded with wooden chairs
COCO_train2014_000000057870.jpg#3 A long table with a flower arrangement in the middle for meetings
COCO_train2014_000000057870.jpg#4 A table is adorned with wooden chairs with blue accents.
COCO_train2014_000000384029.jpg#0 A man preparing desserts in a kitchen covered in frosting.
COCO_train2014_000000384029.jpg#1 A chef is preparing and decorating many small pastries.
COCO_train2014_000000384029.jpg#2 A baker prepares various types of baked goods.
COCO_train2014_000000384029.jpg#3 a close up of a person grabbing a pastry in a container
COCO_train2014_000000384029.jpg#4 Close up of a hand touching various pastries.
COCO_train2014_000000222016.jpg#0 a big red telephone booth that a man is standing in
COCO_train2014_000000222016.jpg#1 a person standing inside of a phone booth
COCO_train2014_000000222016.jpg#2 this is an image of a man in a phone booth.
COCO_train2014_000000222016.jpg#3 A man is standing in a red phone booth.
COCO_train2014_000000222016.jpg#4 A man using a phone in a phone booth.
...

How to denormalize JSON objects of objects? [duplicate]

I have a json object like
{
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
i want to flatten it so desired output looks like
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00", "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}
so far i have tried:
from pandas.io.json import json_normalize
json_normalize(sample_object)
where sample_object contains json object, i am looping through a large file of such objects which i want to flatten in desired format.
json_normalize is not giving me desired output, i want to keep tags as it is but flatten props and repeat parent object info.
You want some json_normalize behavior, but with a custom twist. So use json_normalize or similar on a portion of the data, then combine it with the remainder of data.
The code below prefers the "or similar" route, reaching deep into the pandas codebase to get the nested_to_record helper function, which flattens dictionaries. It's used to create individual rows that combine the base data (keys/values common across all properties) with the flattened data specific to each props entry. There is a commented-out line that does the equivalent thing without nested_to_record, but it somewhat inelegantly flattens into a DataFrame, then exports out to a dict.
from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record
data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
rowdict = OrderedDict(data)
flattened_prop = nested_to_record({'props': prop})
# flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
rowdict.update(flattened_prop)
rows.append(rowdict)
df = pd.DataFrame(rows)
Resulting in:
please try this:
import copy
obj = {
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
props = obj.pop("props")
for p in props:
res = copy.deepcopy(obj)
for k in p:
res["props."+k] = p[k]
print(res)
basically it use pop("props") to get the obj without "props" (which is the common part to use in all result objects),
then we iterate through props, and create new objects that contain the base object, and then fill "props.key" for every key in every prop.

Storing dictionary variables in list after test

I have a json structured like this:
{ "status":"OK", "copyright":"Copyright (c) 2017 Pro Publica Inc. All Rights Reserved.","results":[
{
"member_id": "B001288",
"total_votes": "100",
"offset": "0",
"votes": [
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "84",
"bill": {
"number": "H.J.Res.57",
"bill_uri": "https://api.propublica.org/congress/v1/115/bills/hjres57.json",
"title": "Providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State plans under the Elementary and Secondary Education Act of 1965.",
"latest_action": "Message on Senate action sent to the House."
},
"description": "A joint resolution providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State ...",
"question": "On the Joint Resolution",
"date": "2017-03-09",
"time": "12:02:00",
"position": "No"
},
Sometimes the "bill" parameter is there, sometimes it is blank, like:
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "79",
"bill": {
},
"description": "James Richard Perry, of Texas, to be Secretary of Energy",
"question": "On the Nomination",
"date": "2017-03-02",
"time": "13:46:00",
"position": "No"
},
I want to access and store the "bill_uri" in a list, so I can access it later on. I've already performed .json() through the requests package to process it into python. print votes_json["results"][0]["votes"][0]["bill"]["bill_uri"] etc. works just fine, but when I do:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if votes_json["results"][0]["votes"][n]["bill"]["bill_uri"] in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n])["bill"]["bill_uri"]
print bill_urls_2
I get the error KeyError: 'bill_uri'. I think I have a problem with the structure of the if statement, specifically what key I'm looking for in the dictionary. Could someone provide an explanation/link to explanation about how to use in to find keys? Or pinpoint the error in how I'm using it?
Update: Aha! I got this to work:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill" in votes_json["results"][0]["votes"][n]:
if "bill_uri" in votes_json["results"][0]["votes"][n]["bill"]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2
Thank you to everyone who gave me advice.
The error here is cause by the fact that you are looking for a key in the dictionary by called that key itself. Here's a small example:
my_dict = {'A': 1, 'B':2, 'C':3}
Now C may or may not exist in the dict every time. This is how I can check if C exists in the dict:
if 'C' in my_dict:
print(True)
What you are doing is:
if my_dict['C'] in my_dict:
print(True)
If C doesn't exist to begin with my_dict['C'] isn't found and gives you an error.
What you need to do is:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill_uri" in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2

Categories

Resources