Storing dictionary variables in list after test - python

I have a json structured like this:
{ "status":"OK", "copyright":"Copyright (c) 2017 Pro Publica Inc. All Rights Reserved.","results":[
{
"member_id": "B001288",
"total_votes": "100",
"offset": "0",
"votes": [
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "84",
"bill": {
"number": "H.J.Res.57",
"bill_uri": "https://api.propublica.org/congress/v1/115/bills/hjres57.json",
"title": "Providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State plans under the Elementary and Secondary Education Act of 1965.",
"latest_action": "Message on Senate action sent to the House."
},
"description": "A joint resolution providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State ...",
"question": "On the Joint Resolution",
"date": "2017-03-09",
"time": "12:02:00",
"position": "No"
},
Sometimes the "bill" parameter is there, sometimes it is blank, like:
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "79",
"bill": {
},
"description": "James Richard Perry, of Texas, to be Secretary of Energy",
"question": "On the Nomination",
"date": "2017-03-02",
"time": "13:46:00",
"position": "No"
},
I want to access and store the "bill_uri" in a list, so I can access it later on. I've already performed .json() through the requests package to process it into python. print votes_json["results"][0]["votes"][0]["bill"]["bill_uri"] etc. works just fine, but when I do:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if votes_json["results"][0]["votes"][n]["bill"]["bill_uri"] in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n])["bill"]["bill_uri"]
print bill_urls_2
I get the error KeyError: 'bill_uri'. I think I have a problem with the structure of the if statement, specifically what key I'm looking for in the dictionary. Could someone provide an explanation/link to explanation about how to use in to find keys? Or pinpoint the error in how I'm using it?
Update: Aha! I got this to work:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill" in votes_json["results"][0]["votes"][n]:
if "bill_uri" in votes_json["results"][0]["votes"][n]["bill"]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2
Thank you to everyone who gave me advice.

The error here is cause by the fact that you are looking for a key in the dictionary by called that key itself. Here's a small example:
my_dict = {'A': 1, 'B':2, 'C':3}
Now C may or may not exist in the dict every time. This is how I can check if C exists in the dict:
if 'C' in my_dict:
print(True)
What you are doing is:
if my_dict['C'] in my_dict:
print(True)
If C doesn't exist to begin with my_dict['C'] isn't found and gives you an error.
What you need to do is:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill_uri" in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2

Related

Python Pandas json_normalize with multiple lists of dicts

I'm trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record
I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?
Thanks for any help that can be provided!
Sample base Python (without the record path)
import pandas as pd
import json
with open('./example.json', encoding="UTF-8") as json_file:
json_dict = json.load(json_file)
df = pd.json_normalize(json_dict['WIDGET'])
print(df)
df.to_csv('./test.csv', index=False)
Sample JSON
{
"WIDGET": [
{
"ID": "6",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "1",
"TITLE": "Battery's Missing",
"CATEGORY": "User Error",
"LAST_SERVICE": "2020-01-04T17:39:37Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "1",
"COMPONENTS": {
"WHATNOTS": {
"WHATNOT1": "Battery Compartment",
"WHATNOT2": "Whirlygig"
}
},
"DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
"STATUS": "0",
"CONTACT_TYPE": {
"CALL": "1"
}
},
{
"ID": "1004",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "4",
"TITLE": "Flames emit from unit",
"CATEGORY": "Dangerous",
"LAST_SERVICE": "2015-06-04T21:40:12Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "0",
"DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
"CONSEQUENCE": "Could burn things",
"SOLUTION": "Call an exorcist",
"KNOWN_PROBLEMS": {
"PROBLEM": [
{
"TYPE": "RECALL",
"NAME": "Bad Servo",
"DESCRIPTION": "Bad servo's shipped in initial product"
},
{
"TYPE": "FAILURE",
"NAME": "Operating outside normal conditions",
"DESCRIPTION": "Device failed when customer threw into wood chipper"
}
]
},
"STATUS": "1",
"REPAIR_BULLETINS": {
"BULLETIN": [
{
"#id": "4",
"#text": "Known target of the occult"
},
{
"#id": "5",
"#text": "Not meant to be thrown into wood chippers"
}
]
},
"CONTACT_TYPE": {
"CALL": "1"
}
}
]
}
Sample CSV
ID
PROBLEM
SEVERITY_LEVEL
TITLE
CATEGORY
LAST_SERVICE
NOTICE_DATE
FIXABLE
DIAGNOSIS
STATUS
COMPONENTS.WHATNOTS.WHATNOT1
COMPONENTS.WHATNOTS.WHATNOT2
CONTACT_TYPE.CALL
CONSEQUENCE
SOLUTION
KNOWN_PROBLEMS.PROBLEM
REPAIR_BULLETINS.BULLETIN
6
Electrical
1
Battery's Missing
User Error
2020-01-04T17:39:37Z
2022-01-01T08:00:00Z
1
Customer needs to put batteries in the battery compartment
0
Battery Compartment
Whirlygig
1
1004
Electrical
4
Flames emit from unit
Dangerous
2015-06-04T21:40:12Z
2022-01-01T08:00:00Z
0
A demon seems to have possessed the unit and his expelling flames from it
1
1
Could burn things
Call an exorcist
[{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}]
[{'#id': '4', '#text': 'Known target of the occult'}, {'#id': '5', '#text': 'Not meant to be thrown into wood chippers'}]
I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work
for row in range(len(json_dict['WIDGET'])):
try:
lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
If you have to manually add the varying keys from the larger dataset, here's a way to extract them automatically by identifying them as type list (and provided they are nested by 2 levels only)
linkage = []
for item in json_dict['WIDGET']:
for k1 in item.keys(): #get keys from first level
if isinstance(item[k1], str):
continue
#print(item[k1])
for k2 in item[k1].keys(): #get keys from second level
if isinstance(item[k1][k2], str):
continue
#print(item[k1][k2])
if isinstance(item[k1][k2], list):
linkage.append((k1, k2))
print(linkage)
# [('KNOWN_PROBLEMS', 'PROBLEM'), ('REPAIR_BULLETINS', 'BULLETIN')]
for row in range(len(json_dict['WIDGET'])):
for link in linkage:
try:
lis = json_dict['WIDGET'][row][link[0]][link[1]]
del json_dict['WIDGET'][row][link[0]][link[1]] #delete original dict value (which is a list)
for i, item in enumerate(lis):
json_dict['WIDGET'][row][link[0]][str(i)] = item #replace list with dict value (which is a dict)
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
Output:
0 1
ID 6 1004
PROBLEM Electrical Electrical
SEVERITY_LEVEL 1 4
TITLE Battery's Missing Flames emit from unit
CATEGORY User Error Dangerous
LAST_SERVICE 2020-01-04T17:39:37Z 2015-06-04T21:40:12Z
NOTICE_DATE 2022-01-01T08:00:00Z 2022-01-01T08:00:00Z
FIXABLE 1 0
DIAGNOSIS Customer needs to put batt... A demon seems to have poss...
STATUS 0 1
COMPONENTS.WHATNOTS.WHATNOT1 Battery Compartment NaN
COMPONENTS.WHATNOTS.WHATNOT2 Whirlygig NaN
CONTACT_TYPE.CALL 1 1
CONSEQUENCE NaN Could burn things
SOLUTION NaN Call an exorcist
KNOWN_PROBLEMS.0.TYPE NaN RECALL
KNOWN_PROBLEMS.0.NAME NaN Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION NaN Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE NaN FAILURE
KNOWN_PROBLEMS.1.NAME NaN Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION NaN Device failed when custome...
REPAIR_BULLETINS.0.#id NaN 4
REPAIR_BULLETINS.0.#text NaN Known target of the occult
REPAIR_BULLETINS.1.#id NaN 5
REPAIR_BULLETINS.1.#text NaN Not meant to be thrown int...

Write a list of dictionaries (with varying keys) to one .csv file?

Given this dictionary:
{
"last_id": "9095247150673486907",
"stories": [
{
"description": "The $68.7 billion deal would be Microsoft\u2019s biggest takeover ever and the biggest deal in video game history. The acquisition would make Microsoft the world\u2019s third-largest gaming company by revenue,\u2026 The post Following the takeover of Activision by Microsoft, Sony is already being shaken up appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5310290716350155140",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641278000,
"title": "Following the takeover of Activision by Microsoft, Sony is already being shaken up",
"url": "https://gettotext.com/following-the-takeover-of-activision-by-microsoft-sony-is-already-being-shaken-up/"
},
{
"description": "Also Read | Acquisition of Activision Blizzard by Microsoft: an opportunity born out of chaos An announcement of such a nature could only inspire a good number of analysts, whose\u2026 The post Microsoft\u2019s takeover of Activision Blizzard ignites analysts appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "-14419799692027457",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641042000,
"title": "Microsoft\u2019s takeover of Activision Blizzard ignites analysts",
"url": "https://gettotext.com/microsofts-takeover-of-activision-blizzard-ignites-analysts/"
},
{
"description": "Practical in-ears, mini speakers with long battery life or powerful boom boxes \u2013 the manufacturer Anker offers a suitable product for almost every situation. On Ebay and Amazon you can\u2026 The post Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5221754710166764872",
"site": "gettotext.com",
"tags": [
"amzn"
],
"time": 1642640469000,
"title": "Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co",
"url": "https://gettotext.com/anker-on-ebay-and-amazon-on-offer-inexpensive-soundcore-3-motion-boom-co/"
},
{
"favicon_url": "https://static.tickertick.com/website_icons/trib.al.ico",
"id": "-3472956334378244458",
"site": "trib.al",
"tags": [
"goog"
],
"time": 1642640285000,
"title": "Google is forming a group dedicated to blockchain and related technologies under a newly appointed executive",
"url": "https://trib.al/nZz3omw"
},
{
"description": "Texas' attorney general on Wednesday sued Google, alleging the company asked local radio DJs to record personal endorsements for smartphones that they hadn't used or been provided.",
"favicon_url": "https://static.tickertick.com/website_icons/yahoo.com.ico",
"id": "9095247150673486907",
"site": "yahoo.com",
"tags": [
"goog"
],
"time": 1642639680000,
"title": "Texas sues Google over local radio ads for its smartphones",
"url": "https://finance.yahoo.com/m/b44151c6-7276-30d9-bc62-bfe18c6297be/texas-sues-google-over-local.html?.tsrc=rss"
}
]
}
...how can I write the 'stories' list of dictionaries to one csv file, such that the keys are the header row, and the values are all the rest of the rows. Note, that some keys don't appear in ALL of the records (example, some story dictionaries don't have a 'description' key, and some do).
Psuedo might include:
Get all keys in the 'stories' list and assign those as the df's header
Iterate through each story in the 'stories' list and append the appropriate rows, leaving a nan if there isn't a matching key for every column
Looking for a pythonic way of doing this relatively quickly.
UPDATE
Trying this:
# Save to excel file
with open("newsheadlines.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(response['stories'])
...gives this output
Does that help?
Simplest "pythonic" way to do so is by the pandas package.
import pandas as pd
pd.DataFrame(d["stories"]).to_csv('tmp.csv')
# To retrieve it
stories = pd.read_csv('tmp.csv', index_col=0)

Printing pair of a dict

Im new in python but always trying to learn.
Today I got this error while trying select a key from dictionary:
print(data['town'])
KeyError: 'town'
My code:
import requests
defworld = "Pacera"
defcity = 'Svargrond'
requisicao = requests.get(f"https://api.tibiadata.com/v2/houses/{defworld}/{defcity}.json")
data = requisicao.json()
print(data['town'])
The json/dict looks this:
{
"houses": {
"town": "Venore",
"world": "Antica",
"type": "houses",
"houses": [
{
"houseid": 35006,
"name": "Dagger Alley 1",
"size": 57,
"rent": 2665,
"status": "rented"
}, {
"houseid": 35009,
"name": "Dream Street 1 (Shop)",
"size": 94,
"rent": 4330,
"status": "rented"
},
...
]
},
"information": {
"api_version": 2,
"execution_time": 0.0011,
"last_updated": "2017-12-15 08:00:00",
"timestamp": "2017-12-15 08:00:02"
}
}
The question is, how to print the pairs?
Thanks
You have to access the town object by accessing the houses field first, since there is nesting.
You want print(data['houses']['town']).
To avoid your first error, do
print(data["houses"]["town"])
(since it's {"houses": {"town": ...}}, not {"town": ...}).
To e.g. print all of the names of the houses, do
for house in data["houses"]["houses"]:
print(house["name"])
As answered, you must do data['houses']['town']. A better approach so that you don't raise an error, you can do:
houses = data.get('houses', None)
if houses is not None:
print(houses.get('town', None))
.get is a method in a dict that takes two parameters, the first one is the key, and the second parameter is ghe default value to return if the key isn't found.
So if you do in your example data.get('town', None), this will return None because town isn't found as a key in data.

Flask If Statement - Range for list index

customer_data.json (loaded as customer_data)
{
"customers": [
{
"username": "anonymous",
"id": "1234",
"password": "12341234",
"email": "1234#gmail.com",
"status": false,
"books": [
"Things Fall Apart",
"Fairy Tales",
"Divine Comedy"
]
}
]
}
Example post in new_catalog.json. (loaded as posts)
{
"books": [
{
"author": "Chinua Achebe",
"country": "Nigeria",
"language": "English",
"link": "https://en.wikipedia.org/wiki/Things_Fall_Apart\n",
"pages": 209,
"title": "Things Fall Apart",
"year": 1958,
"hold": false
}
}
Necessary code in Flask_practice.py
for customer in customer_data['customers']:
if len(customer['books']) > 0:
for book in customer['books']:
holds.append(book)
for post in posts['books']:
if post['title'] == holds[range(len(holds))]:
matching_posts['books'].append(post)
holds[range(len(holds))] doesn't work.
I am trying to go through each of the books in holds using holds[0], holds[1] etc and test to see if the title is equal to a book title in new_catalog.json.
I'm still new to Flask, Stack Overflow and coding in general so there may be a really simple solution to this problem.
I am trying to go through each of the books in holds using holds[0], holds[1] etc and test to see if the title is equal to a book title
Translated almost literally to Python:
# For each post...
for post in posts['books']:
# ...go through each of the books in `holds`...
for hold in holds:
# ...and see if the title is equal to a book title
if post['title'] == hold:
matching_posts['books'].append(post)
Or, if you don't want to append(post) for each item in holds:
for post in posts['books']:
if post['title'] in holds:
matching_posts['books'].append(post)

How to find rows in a dataframe based on other rows and other dataframes

From the question I asked here I took a JSON response looking similar to this:
(please note: id's in my sample data below are numeric strings but some are alphanumeric)
data=↓**
{
"state": "active",
"team_size": 20,
"teams": {
"id": "12345679",
"name": "Good Guys",
"level": 10,
"attacks": 4,
"destruction_percentage": 22.6,
"members": [
{
"id": "1",
"name": "John",
"level": 12
},
{
"id": "2",
"name": "Tom",
"level": 11,
"attacks": [
{
"attackerTag": "2",
"defenderTag": "4",
"damage": 64,
"order": 7
}
]
}
]
},
"opponent": {
"id": "987654321",
"name": "Bad Guys",
"level": 17,
"attacks": 5,
"damage": 20.95,
"members": [
{
"id": "3",
"name": "Betty",
"level": 17,
"attacks": [
{
"attacker_id": "3",
"defender_id": "1",
"damage": 70,
"order": 1
},
{
"attacker_id": "3",
"defender_id": "7",
"damage": 100,
"order": 11
}
],
"opponentAttacks": 0,
"some_useless_data": "Want to ignore, this doesn't show in every record"
},
{
"id": "4",
"name": "Fred",
"level": 9,
"attacks": [
{
"attacker_id": "4",
"defender_id": "9",
"damage": 70,
"order": 4
}
],
"opponentAttacks": 0
}
]
}
}
I loaded this using:
df = json_normalize([data['team'], data['opponent']],
'members',
['id', 'name'],
meta_prefix='team.',
errors='ignore')
print(df.iloc(1))
attacks [{'damage': 70, 'order': 4, 'defender_id': '9'...
id 4
level 9
name Fred
opponentAttacks 0
some_useless_data NaN
team.name Bad Guys
team.id 987654321
Name: 3, dtype: object
I have a 3 part question in essense.
How do I get a row like the one above using the member tag? I've tried:
member = df[df['id']=="1"].iloc[0]
#Now this works, but am I correctly doing this?
#It just feels weird is all.
How would I retrieve a member's defenses based only given that only attacks are recorded and not defenses (even though defender_id is given)? I have tried:
df.where(df['tag']==df['attacks'].str.get('defender_id'), df['attacks'], axis=0)
#This is totally not working.. Where am I going wrong?
Since I am retrieving new data from an API, I need to check vs the old data in my database to see if there are any new attacks. I can then loop through the new attacks where I then display to the user the attack info.
This I honestly cannot figure out, I've tried looking into this question and this one as well that I felt were anywhere close to what I needed and am still having trouble wrapping my brain around the concept. Essentially my logic is as follows:
def get_new_attacks(old_data, new_data)
'''params
old_data: Dataframe loaded from JSON in database
new_data: Dataframe loaded from JSON API response
hopefully having new attacks
returns:
iterator over the new attacks
'''
#calculate a dataframe with new attacks listed
return df.iterrows()
I know the function above shows little to no effort other than the docs I gave (basically to show my desired input/output) but trust me I've been wracking my brain over this part the most. I've been looking into merging all attacks then doing reset_index() and that just raises an error due to the attacks being a list. The map() function in the second question I linked above has me stumped.
Referring to your questions in order (code below):
I looks like id is a unique index of the data and so you can use df.set_index('id') which allows you to access data by player id via df.loc['1'] for example.
As far as I understand your data, all the dictionaries listed in each of the attacks are self-contained in a sense that the corresponding player id is not needed (as attacker_id or defender_id seems to be enough to identify the data). So instead of dealing with a rows that contains lists I recommend swapping that data out in its own data frame which makes it easily accessible.
Once you store attacks in its own data frame you can simply compare indices in order to filter out the old data.
Here's some example code to illustrate the various points:
# Question 1.
df.set_index('id', inplace=True)
print(df.loc['1']) # For example player id 1.
# Question 2 & 3.
attacks = pd.concat(map(
lambda x: pd.DataFrame.from_dict(x).set_index('order'), # Is 'order' the right index?
df['attacks'].dropna()
))
# Question 2.
print(attacks[attacks['defender_id'] == '1']) # For example defender_id 1.
# Question 3.
old_attacks = attacks.iloc[:2] # For example.
new_attacks = attacks[~attacks.index.isin(old_attacks.index)]
print(new_attacks)

Categories

Resources