Python Pandas json_normalize with multiple lists of dicts - python

I'm trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record
I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?
Thanks for any help that can be provided!
Sample base Python (without the record path)
import pandas as pd
import json
with open('./example.json', encoding="UTF-8") as json_file:
json_dict = json.load(json_file)
df = pd.json_normalize(json_dict['WIDGET'])
print(df)
df.to_csv('./test.csv', index=False)
Sample JSON
{
"WIDGET": [
{
"ID": "6",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "1",
"TITLE": "Battery's Missing",
"CATEGORY": "User Error",
"LAST_SERVICE": "2020-01-04T17:39:37Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "1",
"COMPONENTS": {
"WHATNOTS": {
"WHATNOT1": "Battery Compartment",
"WHATNOT2": "Whirlygig"
}
},
"DIAGNOSIS": "Customer needs to put batteries in the battery compartment",
"STATUS": "0",
"CONTACT_TYPE": {
"CALL": "1"
}
},
{
"ID": "1004",
"PROBLEM": "Electrical",
"SEVERITY_LEVEL": "4",
"TITLE": "Flames emit from unit",
"CATEGORY": "Dangerous",
"LAST_SERVICE": "2015-06-04T21:40:12Z",
"NOTICE_DATE": "2022-01-01T08:00:00Z",
"FIXABLE": "0",
"DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it",
"CONSEQUENCE": "Could burn things",
"SOLUTION": "Call an exorcist",
"KNOWN_PROBLEMS": {
"PROBLEM": [
{
"TYPE": "RECALL",
"NAME": "Bad Servo",
"DESCRIPTION": "Bad servo's shipped in initial product"
},
{
"TYPE": "FAILURE",
"NAME": "Operating outside normal conditions",
"DESCRIPTION": "Device failed when customer threw into wood chipper"
}
]
},
"STATUS": "1",
"REPAIR_BULLETINS": {
"BULLETIN": [
{
"#id": "4",
"#text": "Known target of the occult"
},
{
"#id": "5",
"#text": "Not meant to be thrown into wood chippers"
}
]
},
"CONTACT_TYPE": {
"CALL": "1"
}
}
]
}
Sample CSV
ID
PROBLEM
SEVERITY_LEVEL
TITLE
CATEGORY
LAST_SERVICE
NOTICE_DATE
FIXABLE
DIAGNOSIS
STATUS
COMPONENTS.WHATNOTS.WHATNOT1
COMPONENTS.WHATNOTS.WHATNOT2
CONTACT_TYPE.CALL
CONSEQUENCE
SOLUTION
KNOWN_PROBLEMS.PROBLEM
REPAIR_BULLETINS.BULLETIN
6
Electrical
1
Battery's Missing
User Error
2020-01-04T17:39:37Z
2022-01-01T08:00:00Z
1
Customer needs to put batteries in the battery compartment
0
Battery Compartment
Whirlygig
1
1004
Electrical
4
Flames emit from unit
Dangerous
2015-06-04T21:40:12Z
2022-01-01T08:00:00Z
0
A demon seems to have possessed the unit and his expelling flames from it
1
1
Could burn things
Call an exorcist
[{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}]
[{'#id': '4', '#text': 'Known target of the occult'}, {'#id': '5', '#text': 'Not meant to be thrown into wood chippers'}]

I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work
for row in range(len(json_dict['WIDGET'])):
try:
lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item
lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN']
for i, item in enumerate(lis):
json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
If you have to manually add the varying keys from the larger dataset, here's a way to extract them automatically by identifying them as type list (and provided they are nested by 2 levels only)
linkage = []
for item in json_dict['WIDGET']:
for k1 in item.keys(): #get keys from first level
if isinstance(item[k1], str):
continue
#print(item[k1])
for k2 in item[k1].keys(): #get keys from second level
if isinstance(item[k1][k2], str):
continue
#print(item[k1][k2])
if isinstance(item[k1][k2], list):
linkage.append((k1, k2))
print(linkage)
# [('KNOWN_PROBLEMS', 'PROBLEM'), ('REPAIR_BULLETINS', 'BULLETIN')]
for row in range(len(json_dict['WIDGET'])):
for link in linkage:
try:
lis = json_dict['WIDGET'][row][link[0]][link[1]]
del json_dict['WIDGET'][row][link[0]][link[1]] #delete original dict value (which is a list)
for i, item in enumerate(lis):
json_dict['WIDGET'][row][link[0]][str(i)] = item #replace list with dict value (which is a dict)
except KeyError:
continue
df = pd.json_normalize(json_dict['WIDGET']).T
print(df)
Output:
0 1
ID 6 1004
PROBLEM Electrical Electrical
SEVERITY_LEVEL 1 4
TITLE Battery's Missing Flames emit from unit
CATEGORY User Error Dangerous
LAST_SERVICE 2020-01-04T17:39:37Z 2015-06-04T21:40:12Z
NOTICE_DATE 2022-01-01T08:00:00Z 2022-01-01T08:00:00Z
FIXABLE 1 0
DIAGNOSIS Customer needs to put batt... A demon seems to have poss...
STATUS 0 1
COMPONENTS.WHATNOTS.WHATNOT1 Battery Compartment NaN
COMPONENTS.WHATNOTS.WHATNOT2 Whirlygig NaN
CONTACT_TYPE.CALL 1 1
CONSEQUENCE NaN Could burn things
SOLUTION NaN Call an exorcist
KNOWN_PROBLEMS.0.TYPE NaN RECALL
KNOWN_PROBLEMS.0.NAME NaN Bad Servo
KNOWN_PROBLEMS.0.DESCRIPTION NaN Bad servo's shipped in ini...
KNOWN_PROBLEMS.1.TYPE NaN FAILURE
KNOWN_PROBLEMS.1.NAME NaN Operating outside normal c...
KNOWN_PROBLEMS.1.DESCRIPTION NaN Device failed when custome...
REPAIR_BULLETINS.0.#id NaN 4
REPAIR_BULLETINS.0.#text NaN Known target of the occult
REPAIR_BULLETINS.1.#id NaN 5
REPAIR_BULLETINS.1.#text NaN Not meant to be thrown int...

Related

How to flatten dict in a DataFrame & concatenate all resultant rows

I am using Github's GraphQL API to fetch some issue details.
I used Python Requests to fetch the data locally.
This is how the output.json looks like
{
"data": {
"viewer": {
"login": "some_user"
},
"repository": {
"issues": {
"edges": [
{
"node": {
"id": "I_kwDOHQ63-s5auKbD",
"title": "test issue 1",
"number": 146,
"createdAt": "2023-01-06T06:39:54Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:42:00Z",
"comments": {
"edges": [
{
"node": {
"id": "IC_kwDOHQ63-s5R2XCV",
"body": "comment 01"
}
},
{
"node": {
"id": "IC_kwDOHQ63-s5R2XC9",
"body": "comment 02"
}
}
]
},
"labels": {
"edges": []
}
},
"cursor": "Y3Vyc29yOnYyOpHOWrimww=="
},
{
"node": {
"id": "I_kwDOHQ63-s5auKm8",
"title": "test issue 2",
"number": 147,
"createdAt": "2023-01-06T06:40:34Z",
"closedAt": null,
"state": "OPEN",
"updatedAt": "2023-01-06T06:40:34Z",
"comments": {
"edges": []
},
"labels": {
"edges": [
{
"node": {
"name": "food"
}
},
{
"node": {
"name": "healthy"
}
}
]
}
},
"cursor": "Y3Vyc29yOnYyOpHOWripvA=="
}
]
}
}
}
}
The json was put inside a list using
result = response.json()["data"]["repository"]["issues"]["edges"]
And then this list was put inside a DataFrame
import pandas as pd
df = pd.DataFrame (result, columns = ['node', 'cursor'])
df
These are the contents of the data frame
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
{'edges': [{'node': {'id': 'IC_kwDOHQ63-s5R2XCV","body": "comment 01"}},{'node': {'id': 'IC_kwDOHQ63-s5R2XC9","body": "comment 02"}}]}
{'edges': []}
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
{'edges': []}
{'edges': [{'node': {'name': 'food"}},{'node': {'name': 'healthy"}}]}
I would like to split/explode the comments and labels columns.
The values in these columns are nested dictionaries
I would like there to be as many rows for a single issue, as there are comments & labels.
I would like to flatten out the data frame.
So this involves split/explode and concat.
There are several stackoverflow answers that delve on this topic. And I have tried the code from several of them.
I can not paste the links to those questions, because stackoverflow marks my question as spam due to many links.
But these are the steps I have tried
df3 = df2['comments'].apply(pd.Series)
Drill down further
df4 = df3['edges'].apply(pd.Series)
df4
Drill down further
df5 = df4['node'].apply(pd.Series)
df5
The last statement above gives me the KeyError: 'node'
I understand, this is because node is not a key in the DataFrame.
But how else can i split this dictionary and concatenate all columns back to my issues row?
This is how I would like the output to look like
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 01
Null
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
None
OPEN
2023-01-06T06:42:00Z
comment 02
Null
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
None
OPEN
2023-01-06T06:40:34Z
Null
healthy
If dct is your dictionary from the question you can try:
df = pd.DataFrame(d['node'] for d in dct['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df.to_markdown(index=False))
Prints:
id
title
number
createdAt
closedAt
state
updatedAt
comments
labels
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 01
nan
I_kwDOHQ63-s5auKbD
test issue 1
146
2023-01-06T06:39:54Z
OPEN
2023-01-06T06:42:00Z
comment 02
nan
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
food
I_kwDOHQ63-s5auKm8
test issue 2
147
2023-01-06T06:40:34Z
OPEN
2023-01-06T06:40:34Z
nan
healthy
#andrej-kesely has answered my question.
I have selected his response as the answer for this question.
I am now posting a consolidated script that includes my poor code and andrej's great code.
In this script i want to fetch details from Github's GraphQL API Server.
And put it inside pandas.
Primary source for this script is this gist.
And a major chunk of remaining code is an answer by #andrej-kesely.
Now onto the consolidated script.
First import the necessary packages and set headers
import requests
import json
import pandas as pd
headers = {"Authorization": "token <your_github_personal_access_token>"}
Now define the query that will fetch data from github.
In my particular case, I am fetching issue details form a particular repo
it can be something else for you.
query = """
{
viewer {
login
}
repository(name: "your_github_repo", owner: "your_github_user_name") {
issues(states: OPEN, last: 2) {
edges {
node {
id
title
number
createdAt
closedAt
state
updatedAt
comments(first: 10) {
edges {
node {
id
body
}
}
}
labels(orderBy: {field: NAME, direction: ASC}, first: 10) {
edges {
node {
name
}
}
}
comments(first: 10) {
edges {
node {
id
body
}
}
}
}
cursor
}
}
}
}
"""
Execute the query and save the response
def run_query(query):
request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
if request.status_code == 200:
return request.json()
else:
raise Exception("Query failed to run by returning code of {}. {}".format(request.status_code, query))
result = run_query(query)
And now is the trickiest part.
In my query response, there are several nested dictionaries.
I would like to split them - more details in my question above.
This magic code from #andrej-kesely does that for you.
df = pd.DataFrame(d['node'] for d in result['data']['repository']['issues']['edges'])
df['comments'] = df['comments'].str['edges']
df = df.explode('comments')
df['comments'] = df['comments'].str['node'].str['body']
df['labels'] = df['labels'].str['edges']
df = df.explode('labels')
df['labels'] = df['labels'].str['node'].str['name']
print(df)

Printing pair of a dict

Im new in python but always trying to learn.
Today I got this error while trying select a key from dictionary:
print(data['town'])
KeyError: 'town'
My code:
import requests
defworld = "Pacera"
defcity = 'Svargrond'
requisicao = requests.get(f"https://api.tibiadata.com/v2/houses/{defworld}/{defcity}.json")
data = requisicao.json()
print(data['town'])
The json/dict looks this:
{
"houses": {
"town": "Venore",
"world": "Antica",
"type": "houses",
"houses": [
{
"houseid": 35006,
"name": "Dagger Alley 1",
"size": 57,
"rent": 2665,
"status": "rented"
}, {
"houseid": 35009,
"name": "Dream Street 1 (Shop)",
"size": 94,
"rent": 4330,
"status": "rented"
},
...
]
},
"information": {
"api_version": 2,
"execution_time": 0.0011,
"last_updated": "2017-12-15 08:00:00",
"timestamp": "2017-12-15 08:00:02"
}
}
The question is, how to print the pairs?
Thanks
You have to access the town object by accessing the houses field first, since there is nesting.
You want print(data['houses']['town']).
To avoid your first error, do
print(data["houses"]["town"])
(since it's {"houses": {"town": ...}}, not {"town": ...}).
To e.g. print all of the names of the houses, do
for house in data["houses"]["houses"]:
print(house["name"])
As answered, you must do data['houses']['town']. A better approach so that you don't raise an error, you can do:
houses = data.get('houses', None)
if houses is not None:
print(houses.get('town', None))
.get is a method in a dict that takes two parameters, the first one is the key, and the second parameter is ghe default value to return if the key isn't found.
So if you do in your example data.get('town', None), this will return None because town isn't found as a key in data.

How can I convert nested dictionary to pd.dataframe faster?

I have a json file which looks like this
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": [
{
"my_value": 5,
"value_2": 10
},
{
"my_value": 24,
"value_2": 30
}
]
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": [
{
"my_value": 9,
"value_2": 10
},
{
"my_value": 54,
"value_2": 19
}
]
}
],
"closed": "no"
}]
}
desired result:
Question_no question my_value_sum value_2_sum user comment
Q.1 what is ? 29 40 john It is defined as
Q.1 what is ? 63 29 Sam as John said above it simply means
What I have tried is data = json_normalize(file_json, "main") and then using a for loop like
for ans, row in data.iterrows():
....
....
df = df.append(the data)
But the issue using this is that it is taking a lot of time that my client would refuse the solution. there is around 1200 items in the main list and there are 450 json files like this to convert. So this intermediate process of conversion would take almost an hour to complete.
EDIT:
is it possible to get the sum of the my_value and value_2 as a column? (updated the desired result also)
Select dictionary by main with parameter record_path and meta:
data = pd.json_normalize(file_json["main"],
record_path='answer',
meta=['question_no', 'question'])
print (data)
user comment question_no question
0 John It is defined as Q.1 what is ?
1 Sam as John said above it simply means Q.1 what is ?
Then if order is important convert last N columns to first positions:
N = 2
data = data[data.columns[-N:].tolist() + data.columns[:-N].tolist()]
print (data)
question_no question user comment
0 Q.1 what is ? John It is defined as
1 Q.1 what is ? Sam as John said above it simply means

Using pandas and json_normalize to flatten nested JSON API response

I have a deeply nested JSON that I am trying to turn into a Pandas Dataframe using json_normalize.
A generic sample of the JSON data I'm working with looks looks like this (I've added context of what I'm trying to do at the bottom of the post):
{
"per_page": 2,
"total": 1,
"data": [{
"total_time": 0,
"collection_mode": "default",
"href": "https://api.surveymonkey.com/v3/responses/5007154325",
"custom_variables": {
"custvar_1": "one",
"custvar_2": "two"
},
"custom_value": "custom identifier for the response",
"edit_url": "https://www.surveymonkey.com/r/",
"analyze_url": "https://www.surveymonkey.com/analyze/browse/",
"ip_address": "",
"pages": [
{
"id": "103332310",
"questions": [{
"answers": [{
"choice_id": "3057839051"
}
],
"id": "319352786"
}
]
},
{
"id": "44783164",
"questions": [{
"id": "153745381",
"answers": [{
"text": "some_name"
}
]
}
]
},
{
"id": "44783183",
"questions": [{
"id": "153745436",
"answers": [{
"col_id": "1087201352",
"choice_id": "1087201369",
"row_id": "1087201362"
}, {
"col_id": "1087201353",
"choice_id": "1087201373",
"row_id": "1087201362"
}
]
}
]
}
],
"date_modified": "1970-01-17T19:07:34+00:00",
"response_status": "completed",
"id": "5007154325",
"collector_id": "50253586",
"recipient_id": "0",
"date_created": "1970-01-17T19:07:34+00:00",
"survey_id": "105723396"
}
],
"page": 1,
"links": {
"self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
}
}
I'd like to end up with a dataframe that contains the question_id, page_id, response_id, and response data like this:
choice_id col_id row_id text question_id page_id response_id
0 3057839051 NaN NaN NaN 319352786 103332310 5007154325
1 NaN NaN NaN some_name 153745381 44783164 5007154325
2 1087201369 1087201352 1087201362 NaN 153745436 44783183 5007154325
3 1087201373 1087201353 1087201362 NaN 153745436 44783183 5007154325
I can get close by running the following code (Python 3.6):
df = json_normalize(data=so_survey_responses['data'], record_path=['pages', 'questions'], meta='id', record_prefix ='question_')
print(df)
Which returns:
question_answers question_id id
0 [{'choice_id': '3057839051'}] 319352786 5007154325
1 [{'text': 'some_name'}] 153745381 5007154325
2 [{'col_id': '1087201352', 'choice_id': '108720... 153745436 5007154325
But if I try to run json_normalize at a deeper nest and keep the 'question_id' data from the above result, I can only get the page_id values to return, not true question_id values:
answers_df = json_normalize(data=so_survey_responses['data'], record_path=['pages', 'questions', 'answers'], meta=['id', ['questions', 'id'], ['pages', 'id']])
print(answers_df)
Returns:
choice_id col_id row_id text id questions.id pages.id
0 3057839051 NaN NaN NaN 5007154325 103332310 103332310
1 NaN NaN NaN some_name 5007154325 44783164 44783164
2 1087201369 1087201352 1087201362 NaN 5007154325 44783183 44783183
3 1087201373 1087201353 1087201362 NaN 5007154325 44783183 44783183
A complicating factor may be that all the above (question_id, page_id, response_id) are 'id:' in the JSON data.
I'm sure this is possible, but I can't get there. Any examples of how to do this?
Additional context:
I'm trying to create a dataframe of SurveyMonkey API response output.
My long term goal is to re-create the "all responses" excel sheet that their export service provides.
I plan to do this by getting the response dataframe set up (above), and then use .apply() to match responses with their survey structure API output.
I've found the SurveyMonkey API pretty lackluster at providing useful output, but I'm new to Pandas so it's probably on me.
You need to modify the meta parameter of your last option, and, if you want to rename columns to be exactly the way you want, you could do it with rename:
answers_df = json_normalize(data=so_survey_responses['data'],
record_path=['pages', 'questions', 'answers'],
meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])\
.rename(index=str,
columns={'id': 'response_id', 'pages.questions.id': 'question_id', 'pages.id': 'page_id'})
There is no way to do this in a completely generic way using json_normalize(). You can use the record_path and meta arguments to indicate how you want the JSON to be processed.
However, you can use the flatten package to flatten your deeply nested JSON and then convert that to a Pandas dataframe. The page has example usage of how to flatten a deeply-nested JSON and convert to a Pandas dataframe.

Storing dictionary variables in list after test

I have a json structured like this:
{ "status":"OK", "copyright":"Copyright (c) 2017 Pro Publica Inc. All Rights Reserved.","results":[
{
"member_id": "B001288",
"total_votes": "100",
"offset": "0",
"votes": [
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "84",
"bill": {
"number": "H.J.Res.57",
"bill_uri": "https://api.propublica.org/congress/v1/115/bills/hjres57.json",
"title": "Providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State plans under the Elementary and Secondary Education Act of 1965.",
"latest_action": "Message on Senate action sent to the House."
},
"description": "A joint resolution providing for congressional disapproval under chapter 8 of title 5, United States Code, of the rule submitted by the Department of Education relating to accountability and State ...",
"question": "On the Joint Resolution",
"date": "2017-03-09",
"time": "12:02:00",
"position": "No"
},
Sometimes the "bill" parameter is there, sometimes it is blank, like:
{
"member_id": "B001288",
"chamber": "Senate",
"congress": "115",
"session": "1",
"roll_call": "79",
"bill": {
},
"description": "James Richard Perry, of Texas, to be Secretary of Energy",
"question": "On the Nomination",
"date": "2017-03-02",
"time": "13:46:00",
"position": "No"
},
I want to access and store the "bill_uri" in a list, so I can access it later on. I've already performed .json() through the requests package to process it into python. print votes_json["results"][0]["votes"][0]["bill"]["bill_uri"] etc. works just fine, but when I do:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if votes_json["results"][0]["votes"][n]["bill"]["bill_uri"] in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n])["bill"]["bill_uri"]
print bill_urls_2
I get the error KeyError: 'bill_uri'. I think I have a problem with the structure of the if statement, specifically what key I'm looking for in the dictionary. Could someone provide an explanation/link to explanation about how to use in to find keys? Or pinpoint the error in how I'm using it?
Update: Aha! I got this to work:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill" in votes_json["results"][0]["votes"][n]:
if "bill_uri" in votes_json["results"][0]["votes"][n]["bill"]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2
Thank you to everyone who gave me advice.
The error here is cause by the fact that you are looking for a key in the dictionary by called that key itself. Here's a small example:
my_dict = {'A': 1, 'B':2, 'C':3}
Now C may or may not exist in the dict every time. This is how I can check if C exists in the dict:
if 'C' in my_dict:
print(True)
What you are doing is:
if my_dict['C'] in my_dict:
print(True)
If C doesn't exist to begin with my_dict['C'] isn't found and gives you an error.
What you need to do is:
bill_urls_2 = []
for n in range(0, len(votes_json["results"][0]["votes"])):
if "bill_uri" in votes_json["results"][0]["votes"][n]:
bill_urls_2.append(votes_json["results"][0]["votes"][n]["bill"]["bill_uri"])
print bill_urls_2

Categories

Resources