How to access JSON object within JSON array python? - python

I am using a wrapper for Reddit's API to return information about comments. The way the information is returned is very confusing to me and I am having trouble getting the info I need.
So the API returns information in this format:
comment(all_awardings=[], associated_award=None, author='raidoctober', author_flair_background_color=None, author_flair_css_class=None, author_flair_richtext=[], author_flair_template_id=None, author_flair_text=None, author_flair_text_color=None, author_flair_type='text', author_fullname='t2_1ekqex92', author_patreon_flair=False, author_premium=False, awarders=[], body="Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.", collapsed_because_crowd_control=None, created_utc=1591296781, gildings={}, id='fsw0scp', is_submitter=True, link_id='t3_gwn3rw', locked=False, no_follow=True, parent_id='t1_fsvyhq1', permalink='/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/', retrieved_on=1591301318, score=1, send_replies=True, stickied=False, subreddit='motorcycles', subreddit_id='t5_2qi6d', top_awarded_type=None, total_awards_received=0, treatment_tags=[], created=1591321981.0, d_={'all_awardings': [], 'associated_award': None, 'author': 'raidoctober', 'author_flair_background_color': None, 'author_flair_css_class': None, 'author_flair_richtext': [], 'author_flair_template_id': None, 'author_flair_text': None, 'author_flair_text_color': None, 'author_flair_type': 'text', 'author_fullname': 't2_1ekqex92', 'author_patreon_flair': False, 'author_premium': False, 'awarders': [], 'body': "Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.", 'collapsed_because_crowd_control': None, 'created_utc': 1591296781, 'gildings': {}, 'id': 'fsw0scp', 'is_submitter': True, 'link_id': 't3_gwn3rw', 'locked': False, 'no_follow': True, 'parent_id': 't1_fsvyhq1', 'permalink': '/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/', 'retrieved_on': 1591301318, 'score': 1, 'send_replies': True, 'stickied': False, 'subreddit': 'motorcycles', 'subreddit_id': 't5_2qi6d', 'top_awarded_type': None, 'total_awards_received': 0, 'treatment_tags': [], 'created': 1591321981.0})
I tried to convert to JSON using
x = json.dumps(hit, sort_keys=True, indent=4)
# hit is the information returned (it is the comment before conversion)
which converts the comment into this JSON format:
[
[],
null,
"raidoctober",
null,
null,
[],
null,
null,
null,
"text",
"t2_1ekqex92",
false,
false,
[],
"Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.",
null,
1591296781,
{},
"fsw0scp",
true,
"t3_gwn3rw",
false,
true,
"t1_fsvyhq1",
"/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/",
1591301318,
1,
true,
false,
"motorcycles",
"t5_2qi6d",
null,
0,
[],
1591321981.0,
{
"all_awardings": [],
"associated_award": null,
"author": "raidoctober",
"author_flair_background_color": null,
"author_flair_css_class": null,
"author_flair_richtext": [],
"author_flair_template_id": null,
"author_flair_text": null,
"author_flair_text_color": null,
"author_flair_type": "text",
"author_fullname": "t2_1ekqex92",
"author_patreon_flair": false,
"author_premium": false,
"awarders": [],
"body": "Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.",
"collapsed_because_crowd_control": null,
"created": 1591321981.0,
"created_utc": 1591296781,
"gildings": {},
"id": "fsw0scp",
"is_submitter": true,
"link_id": "t3_gwn3rw",
"locked": false,
"no_follow": true,
"parent_id": "t1_fsvyhq1",
"permalink": "/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/",
"retrieved_on": 1591301318,
"score": 1,
"send_replies": true,
"stickied": false,
"subreddit": "motorcycles",
"subreddit_id": "t5_2qi6d",
"top_awarded_type": null,
"total_awards_received": 0,
"treatment_tags": []
}
]
I've tried indexing to access it but sometimes the size of the array is different so the results were inaccurate.
I need the "author", "body", and "permalink" tags.
I'm sorry if this is too vague! If you need more information/clarification please let me know.

Does this help?
hit = [
[],
None,
"raidoctober",
None,
None,
[],
None,
None,
None,
"text",
"t2_1ekqex92",
False,
False,
[],
"Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.",
None,
1591296781,
{},
"fsw0scp",
True,
"t3_gwn3rw",
False,
True,
"t1_fsvyhq1",
"/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/",
1591301318,
1,
True,
False,
"motorcycles",
"t5_2qi6d",
None,
0,
[],
1591321981.0,
{
"all_awardings": [],
"associated_award": None,
"author": "raidoctober",
"author_flair_background_color": None,
"author_flair_css_class": None,
"author_flair_richtext": [],
"author_flair_template_id": None,
"author_flair_text": None,
"author_flair_text_color": None,
"author_flair_type": "text",
"author_fullname": "t2_1ekqex92",
"author_patreon_flair": False,
"author_premium": False,
"awarders": [],
"body": "Haha, yeah I thought about it. But it's probably not worth it cause of all the taxes, copart fees, cost of turning a Salvage title into a rebuilt and the insurance deductible.",
"collapsed_because_crowd_control": None,
"created": 1591321981.0,
"created_utc": 1591296781,
"gildings": {},
"id": "fsw0scp",
"is_submitter": True,
"link_id": "t3_gwn3rw",
"locked": False,
"no_follow": True,
"parent_id": "t1_fsvyhq1",
"permalink": "/r/motorcycles/comments/gwn3rw/did_copart_steal_my_motorcycle/fsw0scp/",
"retrieved_on": 1591301318,
"score": 1,
"send_replies": True,
"stickied": False,
"subreddit": "motorcycles",
"subreddit_id": "t5_2qi6d",
"top_awarded_type": None,
"total_awards_received": 0,
"treatment_tags": []
}
]
for item in hit:
if type(item) is dict:
if "author" in item and "body" in item and "permalink" in item:
reqd_dict = {"author": item['author'], "body": ['body'], "permalink": ['permalink']}
print("Found it!!")

Related

parse weird yaml file uploaded to server with python

I have a config server where we read the service config from.
In there we have a yaml file that I need to read but it has a weird format on the server looking like:
{
"document[0].Name": "os",
"document[0].Rules.Rule1": false,
"document[0].Rules.Rule2": true,
"document[0].MinScore": 100,
"document[0].MaxScore": 100,
"document[0].ClusterId": 22,
"document[0].Enabled": true,
"document[0].Module": "device",
"document[0].Description": "",
"document[0].Modified": 1577880000000,
"document[0].Created": 1577880000000,
"document[0].RequiredReview": false,
"document[0].Type": "NO_CODE",
"document[1].Name": "rule with params test",
"document[1].Rules.Rule": false,
"document[1].MinScore": 100,
"document[1].MaxScore": 100,
"document[1].ClusterId": 29,
"document[1].Enabled": true,
"document[1].Module": "device",
"document[1].Description": "rule with params test",
"document[1].Modified": 1577880000000,
"document[1].Created": 1577880000000,
"document[1].RequiredReview": false,
"document[1].Type": "NO_CODE",
"document[1].ParametersRules[0].Features.feature1.op": ">",
"document[1].ParametersRules[0].Features.feature1.value": 10,
"document[1].ParametersRules[0].Features.feature2.op": "==",
"document[1].ParametersRules[0].Features.feature2.value": true,
"document[1].ParametersRules[0].Features.feature3.op": "range",
"document[1].ParametersRules[0].Features.feature3.value[0]": 4,
"document[1].ParametersRules[0].Features.feature3.value[1]": 10,
"document[1].ParametersRules[0].Features.feature4.op": "!=",
"document[1].ParametersRules[0].Features.feature4.value": "None",
"document[1].ParametersRules[0].DecisionType": "all",
"document[1].ParametersRules[1].Features.feature5.op": "<",
"document[1].ParametersRules[1].Features.feature5.value": 1000,
"document[1].ParametersRules[1].DecisionType": "any"
}
and this is how the dict supposed to look like (might not be perfect I did it by hand):
[
{
"Name": "os",
"Rules": { "Rule1": false, "Rule2": true },
"MinScore": 100,
"MaxScore": 100,
"ClusterId": 22,
"Enabled": true,
"Module": "device",
"Description": "",
"Modified": 1577880000000,
"Created": 1577880000000,
"RequiredReview": false,
"Type": "NO_CODE"
},
{
"Name": "rule with params test",
"Rules": { "Rule": false},
"MinScore": 100,
"MaxScore": 100,
"ClusterId": 29,
"Enabled": true,
"Module": "device",
"Description": "rule with params test",
"Modified": 1577880000000,
"Created": 1577880000000,
"RequiredReview": false,
"Type": "NO_CODE",
"ParametersRules":[
{"Features": {"feature1": {"op": ">", "value": 10},
"feature2": {"op": "==", "value": true},
"feature3": {"op": "range", "value": [4,10]},
"feature4": {"op": "!=", "value": "None"}} ,
"DecisionType": "all"},
{"Features": { "feature5": { "op": "<", "value": 1000 }},
"DecisionType": "any"}
]
}
]
I don't have a way to change how the file is uploaded to the server (it's a different team and quite the headache) so I need to parse it using python.
My thought is that someone probably encountered it before so there must be a package that solves it, and I hoped that someone here might know.
Thanks.
i have a sample , i hope it'll help you
import yaml
import os
file_dir = os.path.dirname(os.path.abspath(__file__))
config = yaml.full_load(open(f"{file_dir}/file.json"))
yaml_file = open(f'{file_dir}/meta.yaml', 'w+')
yaml.dump(config, yaml_file, allow_unicode=True) # this one make your json file to yaml
your current output is :
- ClusterId: 22
Created: 1577880000000
Description: ''
Enabled: true
MaxScore: 100
MinScore: 100
Modified: 1577880000000
Module: device
Name: os
RequiredReview: false
Rules:
Rule1: false
Rule2: true
Type: NO_CODE
- ClusterId: 29
Created: 1577880000000
Description: rule with params test
Enabled: true
MaxScore: 100
MinScore: 100
Modified: 1577880000000
Module: device
Name: rule with params test
ParametersRules:
- DecisionType: all
Features:
feature1:
op: '>'
value: 10
feature2:
op: ==
value: true
feature3:
op: range
value:
- 4
- 10
feature4:
op: '!='
value: None
- DecisionType: any
Features:
feature5:
op: <
value: 1000
RequiredReview: false
Rules:
Rule: false
Type: NO_CODE
Here is my approach so far. It's far from perfect, but hope it gives you an idea of how to tackle it.
from __future__ import annotations # can be removed in Python 3.10+
def clean_value(o: str | bool | int) -> str | bool | int | None:
"""handle int, None, or bool values encoded as a string"""
if isinstance(o, str):
lowercase = o.lower()
if lowercase.isnumeric():
return int(o)
elif lowercase == 'none':
return None
elif lowercase in ('true', 'false'):
return lowercase == 'true'
# return eval(o.capitalize())
return o
# noinspection PyUnboundLocalVariable
def process(o: dict):
# final return list
docs_list = []
doc: dict[str, list | dict | str | bool | int | None]
doc_idx: int
def add_new_doc(new_idx: int):
"""Push new item to result list, and increment index."""
nonlocal doc_idx, doc
doc_idx = new_idx
doc = {}
docs_list.append(doc)
# add initial `dict` object to return list
add_new_doc(0)
for k, v in o.items():
doc_id, key, *parts = k.split('.')
doc_id: str
key: str
parts: list[str]
curr_doc_idx = int(doc_id.rsplit('[', 1)[1].rstrip(']'))
if curr_doc_idx > doc_idx:
add_new_doc(curr_doc_idx)
if not parts:
final_val = clean_value(v)
elif key in doc:
# For example, when we encounter `document[0].Rules.Rule2`, but we've already encountered
# `document[0].Rules.Rule1` - so in this case, we add value to the existing dict.
final_val = temp_dict = doc[key]
temp_dict: dict
for p in parts[:-1]:
temp_dict = temp_dict.setdefault(p, {})
temp_dict[parts[-1]] = clean_value(v)
else:
final_val = temp_dict = {}
for p in parts[:-1]:
temp_dict = temp_dict[p] = {}
temp_dict[parts[-1]] = clean_value(v)
doc[key] = final_val
return docs_list
if __name__ == '__main__':
import json
from pprint import pprint
j = """{
"document[0].Name": "os",
"document[0].Rules.Rule1": false,
"document[0].Rules.Rule2": "true",
"document[0].MinScore": 100,
"document[0].MaxScore": 100,
"document[0].ClusterId": 22,
"document[0].Enabled": true,
"document[0].Module": "device",
"document[0].Description": "",
"document[0].Modified": 1577880000000,
"document[0].Created": 1577880000000,
"document[0].RequiredReview": false,
"document[0].Type": "NO_CODE",
"document[1].Name": "rule with params test",
"document[1].Rules.Rule": false,
"document[1].MinScore": 100,
"document[1].MaxScore": 100,
"document[1].ClusterId": 29,
"document[1].Enabled": true,
"document[1].Module": "device",
"document[1].Description": "rule with params test",
"document[1].Modified": 1577880000000,
"document[1].Created": 1577880000000,
"document[1].RequiredReview": false,
"document[1].Type": "NO_CODE",
"document[1].ParametersRules[0].Features.feature1.op": ">",
"document[1].ParametersRules[0].Features.feature1.value": 10,
"document[1].ParametersRules[0].Features.feature2.op": "==",
"document[1].ParametersRules[0].Features.feature2.value": true,
"document[1].ParametersRules[0].Features.feature3.op": "range",
"document[1].ParametersRules[0].Features.feature3.value[0]": 4,
"document[1].ParametersRules[0].Features.feature3.value[1]": 10,
"document[1].ParametersRules[0].Features.feature4.op": "!=",
"document[1].ParametersRules[0].Features.feature4.value": "None",
"document[1].ParametersRules[0].DecisionType": "all",
"document[1].ParametersRules[1].Features.feature5.op": "<",
"document[1].ParametersRules[1].Features.feature5.value": 1000,
"document[1].ParametersRules[1].DecisionType": "any"
}"""
d: dict[str, str | bool | int | None] = json.loads(j)
result = process(d)
pprint(result)
Result:
[{'ClusterId': 22,
'Created': 1577880000000,
'Description': '',
'Enabled': True,
'MaxScore': 100,
'MinScore': 100,
'Modified': 1577880000000,
'Module': 'device',
'Name': 'os',
'RequiredReview': False,
'Rules': {'Rule1': False, 'Rule2': True},
'Type': 'NO_CODE'},
{'ClusterId': 29,
'Created': 1577880000000,
'Description': 'rule with params test',
'Enabled': True,
'MaxScore': 100,
'MinScore': 100,
'Modified': 1577880000000,
'Module': 'device',
'Name': 'rule with params test',
'ParametersRules[0]': {'DecisionType': 'all',
'Features': {'feature1': {'value': 10},
'feature2': {'op': '==', 'value': True},
'feature3': {'op': 'range',
'value[0]': 4,
'value[1]': 10},
'feature4': {'op': '!=', 'value': None}}},
'ParametersRules[1]': {'DecisionType': 'any',
'Features': {'feature5': {'value': 1000}}},
'RequiredReview': False,
'Rules': {'Rule': False},
'Type': 'NO_CODE'}]
Of course one of the problems is that it doesn't accounted for nested paths like document[1].ParametersRules[0].Features.feature1.op which should ideally create a new sub-list to add values to.

Is there a way to insert json into postgres database using pycopg2?

I'm trying to insert the following data into a postgres database
{
"id": 131739425477632000,
"user_name": "KithureKindiki",
"content": "#Fchurii You're right, Francis.",
"deleted": 1,
"created": "2011-11-02 14:28:21",
"modified": "2019-01-10 13:05:42",
"tweet": "{\"contributors\": null, \"truncated\": false, \"text\": \"#Fchurii You're right, Francis.\", \"is_quote_status\": false, \"in_reply_to_status_id\": 131738250736971778, \"id\": 131739425477632000, \"favorite_count\": 0, \"source\": \"Twitter Web Client\", \"retweeted\": false, \"coordinates\": null, \"entities\": {\"symbols\": [], \"user_mentions\": [{\"indices\": [0, 8], \"id_str\": \"284946979\", \"screen_name\": \"Fchurii\", \"name\": \"Francis Gachuri\", \"id\": 284946979}], \"hashtags\": [], \"urls\": []}, \"in_reply_to_screen_name\": \"Fchurii\", \"in_reply_to_user_id\": 284946979, \"retweet_count\": 0, \"id_str\": \"131739425477632000\", \"favorited\": false, \"user\": {\"follow_request_sent\": false, \"has_extended_profile\": false, \"profile_use_background_image\": true, \"contributors_enabled\": false, \"id\": 399935104, \"verified\": false, \"translator_type\": \"none\", \"profile_text_color\": \"333333\", \"profile_image_url_https\": \"https://pbs.twimg.com/profile_images/538310980468764672/xpJnlD_-_normal.jpeg\", \"profile_sidebar_fill_color\": \"DDEEF6\", \"entities\": {\"description\": {\"urls\": []}}, \"followers_count\": 23555, \"profile_sidebar_border_color\": \"C0DEED\", \"id_str\": \"399935104\", \"default_profile_image\": false, \"listed_count\": 17, \"is_translation_enabled\": false, \"utc_offset\": null, \"statuses_count\": 246, \"description\": \"Majority Leader, The Senate of the Republic of Kenya\", \"friends_count\": 244, \"location\": \"\", \"profile_link_color\": \"1DA1F2\", \"profile_image_url\": \"http://pbs.twimg.com/profile_images/538310980468764672/xpJnlD_-_normal.jpeg\", \"notifications\": false, \"geo_enabled\": false, \"profile_background_color\": \"C0DEED\", \"profile_background_image_url\": \"http://abs.twimg.com/images/themes/theme1/bg.png\", \"screen_name\": \"KithureKindiki\", \"lang\": \"en\", \"following\": false, \"profile_background_tile\": false, \"favourites_count\": 11, \"name\": \"Kithure Kindiki\", \"url\": null, \"created_at\": \"Fri Oct 28 08:09:57 +0000 2011\", \"profile_background_image_url_https\": \"https://abs.twimg.com/images/themes/theme1/bg.png\", \"time_zone\": null, \"protected\": false, \"default_profile\": true, \"is_translator\": false}, \"geo\": null, \"in_reply_to_user_id_str\": \"284946979\", \"lang\": \"en\", \"created_at\": \"Wed Nov 02 14:28:21 +0000 2011\", \"in_reply_to_status_id_str\": \"131738250736971778\", \"place\": null}",
"politician_id": 41,
"approved": 1,
"reviewed": 1,
"reviewed_at": "2019-01-10 13:05:42",
"review_message": null,
"retweeted_id": null,
"retweeted_content": null,
"retweeted_user_name": null
}
using the following code
qwery = f"INSERT INTO deleted_tweets(id,user_name,content,deleted,created,modified,tweet,politician_id,approved,reviewed,reviewed_at,review_message,retweeted_id,retweeted_content,retweeted_user_name) VALUES {row['id'], row['user_name'], row['content'], bool(row['deleted']), row['created'], row['modified'],row['tweet'],row['politician_id'],bool(row['approved']), bool(row['reviewed']),row['reviewed_at'],row['review_message'],row['retweeted_id'],row['retweeted_content'],row['retweeted_user_name']}"
qwery = qwery.replace('None', 'null')
cursor.execute(qwery)
However, I get the following error
*** psycopg2.errors.SyntaxError: syntax error at or near "re"
LINE 1: ... null, "truncated": false, "text": "#Fchurii You\'re right, ...
I know this is due to the single quote but I'm not sure how to overcome it. I've tried adding backslash to the string something like \"text\": \"#Fchurii You\\'re right, Francis.\",
but still getting the same error. Any ideas on how to bypass this?
Try:
query = "INSERT INTO deleted_tweets (id,user_name,content,deleted,created,modified,tweet,politician_id,approved,reviewed,reviewed_at,review_message,retweeted_id,retweeted_content,retweeted_user_name) VALUES (%s)"
data = [row['id'], row['user_name'], row['content'], bool(row['deleted']), row['created'], row['modified'], row['tweet'], row['politician_id'], bool(row['approved']), bool(row['reviewed']), row['reviewed_at'], row['review_message'], row['retweeted_id'], row['retweeted_content'], row['retweeted_user_name']]
data_without_nulls = ['null' if x is None else x for x in data]
cursor.execute(query, data_without_nulls)

Access nested dict key in following style: selected = {k: tweets[i]._json[k] for k in {'created_at', 'id', 'full_text'}

After hours of research I am kind of lost. No Problem seems to match mine.
The problems is the following:
I have a JSON containing all sorts of information about a tweet. Much of which is nested, meaning a JSON as a value for a key. The keys inside of the { } are the keys of which I want to retrieve the key-value pair. With 'first level keys' there is no problem whatsoever. It retrieves them just fine. But I dont know how to acces the 'deeper level' keys. I know how to access a lower level value, namely with dictObject['FirstLevelKey']['SecondLevelKey]. The Problem though is that this returns the value of this certain key and not the key itself. I somehow need to tell the code where exactly to find the key inside the brackets { }.
As an example: There is a 'First level' key inside of my main JSON(tweets[i]._json) named 'user' which has a JSON as a value containing the key 'geo_enabled'. How could I tell my Programm to retrieve this key the same way as my 'first level' keys 'created_at', 'id', 'full_text'?
I hope I was able to express my problem in an understandable manner. Thanks in advance.
selected = {k: tweets[i]._json[k] for k in {'created_at', 'id', 'full_text', tweets[i]._json['user']['geo_enabled']} obviously doesn't work
{"created_at": "Thu Dec 10 14:12:18 +0000 2020",
"id": 1337804994,
"id_str": "1337037427630804994",
"full_text": "hello",
"user": {
"id": 25360913,
"id_str": "25360913",
"translator_type": "none"
},
"geo": False,
"coordinates": False,
"retweeted": False,
"lang": "de"
}
Here is some info about accessing nested dictionary values: https://www.geeksforgeeks.org/python-nested-dictionary/
You can do it this way very simply:
tweet_dict = {
"created_at": "Thu Dec 10 14:12:18 +0000 2020",
"id": 1337804994,
"id_str": "1337037427630804994",
"full_text": "hello",
"user": {
"id": 25360913,
"id_str": "25360913",
"translator_type": "none"
},
"geo": False,
"coordinates": False,
"retweeted": False,
"lang": "de"
}
new_dict = {
**tweet_dict, # unpack the tweet dict
'user_id': tweet_dict['user']['id'], # add the user_id key
}
# pretty print the output
from pprint import pprint
pprint(new_dict)
Or without creating a new dict:
tweet_dict['user_id'] = tweet_dict['user']['id'], # add the user_id key
Output:
{'coordinates': False,
'created_at': 'Thu Dec 10 14:12:18 +0000 2020',
'full_text': 'hello',
'geo': False,
'id': 1337804994,
'id_str': '1337037427630804994',
'lang': 'de',
'retweeted': False,
'user': {'id': 25360913, 'id_str': '25360913', 'translator_type': 'none'},
'user_id': 25360913}

Schrodingers JSON - when working with a json doc it is erroring as both a list and a dict

I have a Python list from an API call:
[{'_id': '5f563c1bf8eaa9d98eca231f',
'allEnabledDs': None,
'allEnabledIdSor': None,
'correlationFilterEntitySource': True,
'correlation_completed': True,
'correlation_completed_dt': '2020-09-07T13:56:43.547Z',
'created_at': '2020-09-07T13:56:43.469Z',
'dsConnectionList': None,
'folderToLabelMapping': None,
'idConnectionList': None,
'identities_scanned': 0,
'identityResolutionScan': False,
'info': None,
'isCustomScanProfile': None,
'modelId': None,
'name': 'Identity Discovery Scan',
'origin': 'Identity Discovery Scan',
'piisummary_completed_dt': '2020-09-07T13:56:43.642Z',
'scan_progress_status': {'Started': '2020-09-07T13:56:43.469Z'},
'shouldCreateClassifiers': None,
'skipIdScan': None,
'state': 'Started',
'stopRequested': True,
'type': 'identityDiscoveryScan',
'updated_at': '2020-09-07T16:59:45.294Z'}]
And this is my code:
for i in live_scans:
url = url
payload = {}
headers = {
"Authorization": token
}
r = requests.get(url, headers=headers, data=payload)
j_doc = r.json()
d = {k:v for k,v in (x.split(':') for x in j_doc)}
if j_doc['state'] == "Stopped":
print("YAY!")
if d['state'] == "Stopped":
print("YAY!")
However when using this code:
if n_dict['state'] == "Stopped":
print("YAY!")
This error occurs:
TypeError: list indices must be integers or slices, not str>
And when attempting to split the list into a dict with:
d = {k:v for k,v in (x.split(':') for x in j_doc)}
Can someone give me a pointer into why this is happening and how to fix it?
As #Hitobat mentioned in commend - you have list with dictionary inside so you have to use [0] to get this dictionary. Or you have to use for-loop if you have more elements on list
data = [{'_id': '5f563c1bf8eaa9d98eca231f', 'allEnabledDs': None, 'allEnabledIdSor': None, 'correlationFilterEntitySource': True, 'created_at': '2020-09-07T13:56:43.469Z', 'dsConnectionList': None, 'folderToLabelMapping': None, 'idConnectionList': None, 'identityResolutionScan': False, 'info': None, 'isCustomScanProfile': None, 'modelId': None, 'name': 'Identity Discovery Scan', 'origin': 'Identity Discovery Scan', 'scan_progress_status': {'Started': '2020-09-07T13:56:43.469Z'}, 'shouldCreateClassifiers': None, 'skipIdScan': None, 'state': 'Started', 'type': 'identityDiscoveryScan', 'updated_at': '2020-09-07T16:59:45.294Z', 'identities_scanned': 0, 'correlation_completed': True, 'correlation_completed_dt': '2020-09-07T13:56:43.547Z', 'piisummary_completed_dt': '2020-09-07T13:56:43.642Z', 'stopRequested': True}]
print( data[0]['state'] )
for item in data:
print( item['state'] )
Next time you can use type() to check what you have -
print( type(data) )
if it list then you can test length and/or check first element
print( len(data) )
print( type( data[0] ) )
if it is dict then you can check what keys you can use
print( data[0].keys() )
This way you can recognize how to get expected element(s).
You can also use json to format it with indentations and see how it looks like
import json
print( json.dumps(data, indent=2) )
Results:
[
{
"_id": "5f563c1bf8eaa9d98eca231f",
"allEnabledDs": null,
"allEnabledIdSor": null,
"correlationFilterEntitySource": true,
"created_at": "2020-09-07T13:56:43.469Z",
"dsConnectionList": null,
"folderToLabelMapping": null,
"idConnectionList": null,
"identityResolutionScan": false,
"info": null,
"isCustomScanProfile": null,
"modelId": null,
"name": "Identity Discovery Scan",
"origin": "Identity Discovery Scan",
"scan_progress_status": {
"Started": "2020-09-07T13:56:43.469Z"
},
"shouldCreateClassifiers": null,
"skipIdScan": null,
"state": "Started",
"type": "identityDiscoveryScan",
"updated_at": "2020-09-07T16:59:45.294Z",
"identities_scanned": 0,
"correlation_completed": true,
"correlation_completed_dt": "2020-09-07T13:56:43.547Z",
"piisummary_completed_dt": "2020-09-07T13:56:43.642Z",
"stopRequested": true
}
]
Similar way you can use pprint (Pretty Print)
import pprint
pprint.pprint(data)
Result:
[{'_id': '5f563c1bf8eaa9d98eca231f',
'allEnabledDs': None,
'allEnabledIdSor': None,
'correlationFilterEntitySource': True,
'correlation_completed': True,
'correlation_completed_dt': '2020-09-07T13:56:43.547Z',
'created_at': '2020-09-07T13:56:43.469Z',
'dsConnectionList': None,
'folderToLabelMapping': None,
'idConnectionList': None,
'identities_scanned': 0,
'identityResolutionScan': False,
'info': None,
'isCustomScanProfile': None,
'modelId': None,
'name': 'Identity Discovery Scan',
'origin': 'Identity Discovery Scan',
'piisummary_completed_dt': '2020-09-07T13:56:43.642Z',
'scan_progress_status': {'Started': '2020-09-07T13:56:43.469Z'},
'shouldCreateClassifiers': None,
'skipIdScan': None,
'state': 'Started',
'stopRequested': True,
'type': 'identityDiscoveryScan',
'updated_at': '2020-09-07T16:59:45.294Z'}]

Trouble parsing Twitter API JSON output for User profile data with Python

I thought this other SO thread would have answered my question (http://stackoverflow.com/questions/4883751/trouble-reading-json-object-in-python), as it is very similar to my problem, but the data there are a little different than the data in my case.
I have about 470 records pulled from the Twitter API for twitter user data, something like:
{
steve: {
follow_request_sent: false,
profile_use_background_image: true,
default_profile_image: false,
geo_enabled: true,
verified: false,
profile_image_url_https: "https://si0.twimg.com/profile_images/1416115378/profile_normal.jpg",
profile_sidebar_fill_color: "F8E846",
id: 1376271,
profile_text_color: "000000",
followers_count: 2042,
profile_sidebar_border_color: "FFFFFF",
location: "Dallas and 51°33′28″N 0°6′10″W",
profile_background_color: "7d0000",
listed_count: 110,
status: {
favorited: false,
contributors: null,
truncated: false,
text: "So Microsoft's cloud is down. Can't say I have noticed. To the cloud! (the Amazon one of course)",
created_at: "Wed Feb 29 15:51:44 +0000 2012",
retweeted: false,
in_reply_to_status_id: null,
coordinates: null,
id: 174884564718723070,
source: "TweetDeck",
in_reply_to_status_id_str: null,
in_reply_to_screen_name: null,
id_str: "174884564718723073",
place: null,
retweet_count: 0,
geo: null,
in_reply_to_user_id_str: null,
in_reply_to_user_id: null
},
utc_offset: -21600,
statuses_count: 11504,
description: "Network engineer. Cisco, Juniper, F5, HP, EMC, etc. If it is in the data center I deal with it. Arsenal and Mavericks supporter to the max over at #steverossen",
friends_count: 822,
profile_link_color: "0000ff",
profile_image_url: "http://a0.twimg.com/profile_images/1416115378/profile_normal.jpg",
is_translator: false,
show_all_inline_media: false,
profile_background_image_url_https: "https://si0.twimg.com/profile_background_images/192104695/stadium.jpg",
id_str: "1376271",
profile_background_image_url: "http://a2.twimg.com/profile_background_images/192104695/stadium.jpg",
screen_name: "steve",
lang: "en",
profile_background_tile: false,
favourites_count: 0,
name: "Steve Rossen",
notifications: false,
url: "http://steverossen.com",
created_at: "Sat Mar 17 21:36:32 +0000 2007",
contributors_enabled: false,
time_zone: "Central Time (US & Canada)",
protected: false,
default_profile: false,
following: false
},
}
the problem being that each record starts with the person's twitter handle so is different for each record. So I've only been able to get so far as using:
import json
import csv
f = open('my.json')
data = json.load(f)
f.close()
for item in data:
print item
to print out those handles but can't figure out how to get into each person's record without having a key.
what am I grossly overlooking here? I would atleast like to get at the "description", which is nested inside of the users name as a key.
Maybe I'm missing what exactly you are looking for, but couldn't you do this:
import json
f = open('my.json')
data = json.load(f)
f.close()
for key in data.keys():
print data[key]["description"]

Categories

Resources