Decoding list containing dictionaries - python

I need to get certain values out of a list of dictionaries, which looks like that and is assigned to the variable 'segment_values':
[{'distance': 114.6,
'duration': 20.6,
'instruction': 'Head north',
'name': '-',
'type': 11,
'way_points': [0, 5]},
{'distance': 288.1,
'duration': 28.5,
'instruction': 'Turn right onto Knaufstraße',
'name': 'Knaufstraße',
'type': 1,
'way_points': [5, 17]},
{'distance': 3626.0,
'duration': 273.0,
'instruction': 'Turn slight right onto B320',
'name': 'B320',
'type': 5,
'way_points': [17, 115]},
{'distance': 54983.9,
'duration': 2679.3,
'instruction': 'Keep right onto Ennstal-Bundesstraße, B320',
'name': 'Ennstal-Bundesstraße, B320',
'type': 13,
'way_points': [115, 675]},
{'distance': 11065.1,
'duration': 531.1,
'instruction': 'Keep left onto Pyhrn Autobahn, A9',
'name': 'Pyhrn Autobahn, A9',
'type': 12,
'way_points': [675, 780]},
{'distance': 800.7,
'duration': 64.1,
'instruction': 'Keep right',
'name': '-',
'type': 13,
'way_points': [780, 804]},
{'distance': 49.6,
'duration': 4.0,
'instruction': 'Keep left',
'name': '-',
'type': 12,
'way_points': [804, 807]},
{'distance': 102057.2,
'duration': 4915.0,
'instruction': 'Keep right',
'name': '-',
'type': 13,
'way_points': [807, 2104]},
{'distance': 56143.4,
'duration': 2784.5,
'instruction': 'Keep left onto S6',
'name': 'S6',
'type': 12,
'way_points': [2104, 2524]},
{'distance': 7580.6,
'duration': 389.8,
'instruction': 'Keep left',
'name': '-',
'type': 12,
'way_points': [2524, 2641]},
{'distance': 789.0,
'duration': 63.1,
'instruction': 'Keep right',
'name': '-',
'type': 13,
'way_points': [2641, 2663]},
{'distance': 815.9,
'duration': 65.3,
'instruction': 'Keep left',
'name': '-',
'type': 12,
'way_points': [2663, 2684]},
{'distance': 682.9,
'duration': 54.6,
'instruction': 'Turn left onto Heinrich-Drimmel-Platz',
'name': 'Heinrich-Drimmel-Platz',
'type': 0,
'way_points': [2684, 2711]},
{'distance': 988.1,
'duration': 79.0,
'instruction': 'Turn left onto Arsenalstraße',
'name': 'Arsenalstraße',
'type': 0,
'way_points': [2711, 2723]},
{'distance': 11.7,
'duration': 2.1,
'instruction': 'Turn left',
'name': '-',
'type': 0,
'way_points': [2723, 2725]},
{'distance': 0.0,
'duration': 0.0,
'instruction': 'Arrive at your destination, on the left',
'name': '-',
'type': 10,
'way_points': [2725, 2725]}]
I need to get the duration values and the waypoint values out of that code segment.
For the duration I tried:
segment_values= data['features'][0]['properties']['segments'][0]['steps'] #gets me the above code
print(segment_values[0:]['duration'])
Shouldn't this print me all dictionaries, and the values at duration in each of them?
I also tried this:
duration = data['features'][0]['properties']['segments'][0]['steps'][0:]['duration']
print(duration)
Both tries give me "TypeError: list indices must be integers or slices, not str
"
Where am I going wrong?

Your data is a list of dictionaries.
For this reason you need to cycle through its content in order to access data.
Please try this print statement to look at the data more closely:
for item in data_list:
print(item)
In order to access duration per each item you can use similar code:
for item in data_list:
print(item['duration'])
You can also use list comprehension to achieve the same result:
duration = [item['duration'] for item in data_list]
List comprehension is a Pythonic way to obtain the same result, you can read more about it here.
The same principle can be applied twice if a key in your data contains a list or another iterable, here's another example:
for item in data:
print("\nPrinting waypoints for name: " + item['name'])
for way_point in item['way_points']:
print(way_point)

duration = [x['duration'] for x in segment_values]
waypoints =[x['way_points'] for x in segment_values]

You might be thinking of higher-level wrappers like pandas, which would let you do
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randn(3, 2), index=list('abc'), columns=list('xy'))
>>> df
x y
a -0.192041 -0.312067
b -0.595700 0.339085
c -0.524242 0.946350
>>> df.x
a -0.192041
b -0.595700
c -0.524242
Name: x, dtype: float64
>>> df[0:].x
a -0.192041
b -0.595700
c -0.524242
Name: x, dtype: float64
>>> df[1:].y
b 0.339085
c 0.946350
Name: y, dtype: float64

Another tool for this is glom, which provides helpers for logic like this (pip install glom).
>>> from glom import glom
>>> from pprint import pprint
>>> data = <your data>
>>> pprint(glom(data, [{'wp': 'way_points', 'dist': 'distance'}]))
[{'dist': 114.6, 'wp': [0, 5]},
{'dist': 288.1, 'wp': [5, 17]},
{'dist': 3626.0, 'wp': [17, 115]},
{'dist': 54983.9, 'wp': [115, 675]},
{'dist': 11065.1, 'wp': [675, 780]},
{'dist': 800.7, 'wp': [780, 804]},
{'dist': 49.6, 'wp': [804, 807]},
{'dist': 102057.2, 'wp': [807, 2104]},
{'dist': 56143.4, 'wp': [2104, 2524]},
{'dist': 7580.6, 'wp': [2524, 2641]},
{'dist': 789.0, 'wp': [2641, 2663]},
{'dist': 815.9, 'wp': [2663, 2684]},
{'dist': 682.9, 'wp': [2684, 2711]},
{'dist': 988.1, 'wp': [2711, 2723]},
{'dist': 11.7, 'wp': [2723, 2725]},
{'dist': 0.0, 'wp': [2725, 2725]}]
You can get a feel on how other cases might work from the documentation:
https://glom.readthedocs.io/en/latest/faq.html#how-does-glom-work
def glom(target, spec):
# if the spec is a string or a Path, perform a deep-get on the target
if isinstance(spec, (basestring, Path)):
return _get_path(target, spec)
# if the spec is callable, call it on the target
elif callable(spec):
return spec(target)
# if the spec is a dict, assign the result of
# the glom on the right to the field key on the left
elif isinstance(spec, dict):
ret = {}
for field, subspec in spec.items():
ret[field] = glom(target, subspec)
return ret
# if the spec is a list, run the spec inside the list on every
# element in the list and return the new list
elif isinstance(spec, list):
subspec = spec[0]
iterator = _get_iterator(target)
return [glom(t, subspec) for t in iterator]
# if the spec is a tuple of specs, chain the specs by running the
# first spec on the target, then running the second spec on the
# result of the first, and so on.
elif isinstance(spec, tuple):
res = target
for subspec in spec:
res = glom(res, subspec)
return res
else:
raise TypeError('expected one of the above types')

Related

Check if inputted key and value exists within a Python dictionary

I was tasked to create a CRUD program using Python dictionaries. I need to write code to check if the inputted key and value already exists in the dictionary, so here is the code of dictionary plus the input that prompt the user to search for ID:
products = [
{'id': 1, 'name': 'Light bulb', 'price': 100, 'stock': 16},
{'id': 2, 'name': 'Measuring tape', 'price': 200, 'stock': 34},
{'id': 3, 'name': 'Fan', 'price': 120, 'stock': 79},
{'id': 4, 'name': 'Flat shoes', 'price': 260, 'stock': 47},
{'id': 5, 'name': 'Swiss Army knife', 'price': 80, 'stock': 12},
{'id': 6, 'name': 'Guitar', 'price': 193, 'stock': 25},
{'id': 7, 'name': 'Marble', 'price': 30, 'stock': 45},
{'id': 8, 'name': 'Stapler', 'price': 220, 'stock': 78},
{'id': 9, 'name': 'Wrench and hammer', 'price': 65, 'stock': 12}
]
id_search = int(input("Enter ID product you want to search: ")
I wanted to make the if-else statement to see if the ID exists in the products dictionary, otherwise display the message that the ID is not found. I tried the following
if id_search in products:
print("Product ID found")
else:
print("Product ID not found")
But the result is always "Product ID not found".
You have a list of dicts not just a dict. You have to search through list of dicts:
found = False
for product in products:
if product.get("id") == id_search:
found = True
break
print(found)
Note that if your dicts are sorted, you probably can use binary search.
if len([[*my_dict.values()][0] for my_dict in products if [*my_dict.values()][0] == id_search]) > 0:
print("Product ID found")
else:
print("Product ID not found")

Flattening deeply nested JSON into pandas data frame

I am trying to import a deeply nested JSON into pandas dataframe. Here is the structure of the JSON file (this is only the first record (retweets[:1]):
[{'lang': 'en',
'author_id': '1076979440372965377',
'reply_settings': 'everyone',
'entities': {'mentions': [{'start': 3,
'end': 17,
'username': 'Terry81987010',
'url': '',
'location': 'Florida',
'entities': {'description': {'hashtags': [{'start': 29,
'end': 32,
'tag': '2A'}]}},
'created_at': '2019-02-01T23:01:11.000Z',
'protected': False,
'public_metrics': {'followers_count': 520,
'following_count': 567,
'tweet_count': 34376,
'listed_count': 1},
'name': "Terry's Take",
'verified': False,
'id': '1091471553437593605',
'description': 'Less government more Freedom #2A is a constitutional right. Trump2020, common sense rules, God bless America! Vet 82nd Airborne F/A, proud Republican',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1289626661911134208/WfztLkr1_normal.jpg'},
{'start': 19,
'end': 32,
'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}]},
'conversation_id': '1253462541881106433',
'created_at': '2020-04-23T23:15:32.000Z',
'id': '1253462541881106433',
'possibly_sensitive': False,
'referenced_tweets': [{'type': 'retweeted',
'id': '1253052684489437184',
'in_reply_to_user_id': '91882544',
'attachments': {'media_keys': ['3_1253052312144293888',
'3_1253052620937277442'],
'media': [{}, {}]},
'entities': {'annotations': [{'start': 126,
'end': 128,
'probability': 0.514,
'type': 'Organization',
'normalized_text': 'CDC'},
{'start': 145,
'end': 146,
'probability': 0.5139,
'type': 'Place',
'normalized_text': 'NY'}],
'mentions': [{'start': 0,
'end': 13,
'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}],
'urls': [{'start': 187,
'end': 210,
'expanded_url': 'https://twitter.com/Terry81987010/status/1253052684489437184/photo/1',
'display_url': 'pic.twitter.com/H4NpN5ZMkW'},
{'start': 187,
'end': 210,
'expanded_url': 'https://twitter.com/Terry81987010/status/1253052684489437184/photo/1',
'display_url': 'pic.twitter.com/H4NpN5ZMkW'}]},
'lang': 'en',
'author_id': '1091471553437593605',
'reply_settings': 'everyone',
'conversation_id': '1253050942716551168',
'created_at': '2020-04-22T20:06:55.000Z',
'possibly_sensitive': False,
'referenced_tweets': [{'type': 'replied_to', 'id': '1253050942716551168'}],
'public_metrics': {'retweet_count': 208,
'reply_count': 57,
'like_count': 402,
'quote_count': 38},
'source': 'Twitter Web App',
'text': "#DineshDSouza Here's some proof of artificially inflating the cv deaths. Noone is dying of pneumonia anymore according to the CDC. And of course NY getting paid for each cv death $60,000",
'context_annotations': [{'domain': {'id': '10',
'name': 'Person',
'description': 'Named people in the world like Nelson Mandela'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}},
{'domain': {'id': '35',
'name': 'Politician',
'description': 'Politicians in the world, like Joe Biden'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}}],
'author': {'url': '',
'username': 'Terry81987010',
'location': 'Florida',
'entities': {'description': {'hashtags': [{'start': 29,
'end': 32,
'tag': '2A'}]}},
'created_at': '2019-02-01T23:01:11.000Z',
'protected': False,
'public_metrics': {'followers_count': 520,
'following_count': 567,
'tweet_count': 34376,
'listed_count': 1},
'name': "Terry's Take",
'verified': False,
'id': '1091471553437593605',
'description': 'Less government more Freedom #2A is a constitutional right. Trump2020, common sense rules, God bless America! Vet 82nd Airborne F/A, proud Republican',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1289626661911134208/WfztLkr1_normal.jpg'},
'in_reply_to_user': {'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}}],
'public_metrics': {'retweet_count': 208,
'reply_count': 0,
'like_count': 0,
'quote_count': 0},
'source': 'Twitter for iPhone',
'text': "RT #Terry81987010: #DineshDSouza Here's some proof of artificially inflating the cv deaths. Noone is dying of pneumonia anymore according t…",
'context_annotations': [{'domain': {'id': '10',
'name': 'Person',
'description': 'Named people in the world like Nelson Mandela'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}},
{'domain': {'id': '35',
'name': 'Politician',
'description': 'Politicians in the world, like Joe Biden'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}}],
'author': {'url': '',
'username': 'set1952',
'location': 'Etats-Unis',
'created_at': '2018-12-23T23:14:42.000Z',
'protected': False,
'public_metrics': {'followers_count': 103,
'following_count': 44,
'tweet_count': 44803,
'listed_count': 0},
'name': 'SunSet1952',
'verified': False,
'id': '1076979440372965377',
'description': '',
'profile_image_url': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png'},
'__twarc': {'url': 'https://api.twitter.com/2/tweets/search/all?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&max_results=500&query=retweets_of%3ATerry81987010&start_time=2020-03-09T00%3A00%3A00%2B00%3A00&end_time=2020-04-24T00%3A00%3A00%2B00%3A00',
'version': '2.0.8',
'retrieved_at': '2021-05-17T17:13:17+00:00'}},
Here is my code:
retweets = []
for line in open('Data/usersRetweetsFlatten_sample.json', 'r'):
retweets.append(json.loads(line))
df = json_normalize(
retweets, 'referenced_tweets', ['referenced_tweets', 'type'],
meta_prefix= ".",
errors='ignore'
)
df[['author_id', 'type', '.type', 'id', 'in_reply_to_user_id', 'referenced_tweets']].head()
Here is the resulting dataframe:
As you can see, the column referenced_tweets is not flattened yet (please note that there are two different referenced_tweets arrays in my JSON file: one is in a deeper level insdide the other "referenced_tweets"). For example, the one at the higher level return this:
>>> retweets[0]["referenced_tweets"][0]["type"]
"retweeted"
and the one in the deeper level return this:
>>> retweets[0]["referenced_tweets"][0]["referenced_tweets"][0]["type"]
'replied_to'
QUESTION: I was wondering how I can flatten the deeper referenced_tweets. I want to have two separate columns as referenced_tweets.type and referenced_tweets.id, where the value of the column referenced_tweets.type in the above example should be replied_to.
I think the issue here is that your data is double nested... there is a key referenced_tweets within referenced_tweets.
import json
from pandas import json_normalize
with open("flatten.json", "r") as file:
data = json.load(file)
df = json_normalize(
data,
record_path=["referenced_tweets", "referenced_tweets"],
meta=[
"author_id",
# ["author", "username"], # not possible
# "author", # possible but not useful
["referenced_tweets", "id"],
["referenced_tweets", "type"],
["referenced_tweets", "in_reply_to_user_id"],
["referenced_tweets", "in_reply_to_user", "username"],
]
)
print(df)
See also: https://stackoverflow.com/a/37668569/42659
Note: Above code will fail if second nested referenced_tweet is missing.
Edit: Alternatively you could further normalize your data (which you already partly normalized with your code) in your question with an additional manual iteration. See example below. Note: Code is not optimized and may be slow depending on the amount of data.
# load your `data` with `json.load()` or `json.loads()`
df = json_normalize(
data,
record_path="referenced_tweets",
meta=["referenced_tweets", "type"],
meta_prefix= ".",
errors="ignore",
)
columns = [*df.columns, "_type", "_id"]
normalized_data = []
def append(row, type, id):
normalized_row = [*row.to_list(), type, id]
normalized_data.append(normalized_row)
for _, row in df.iterrows():
# a list/array is expected
if type(row["referenced_tweets"]) is list:
for tweet in row["referenced_tweets"]:
append(row, tweet["type"], tweet["id"])
# if empty list
else:
append(row, None, None)
else:
append(row, None, None)
enhanced_df = pd.DataFrame(data=normalized_data, columns=columns)
enhanced_df.drop("referenced_tweets", 1)
print(enhanced_df)
Edit 2: referenced_tweets should be an array. However, if there is no referenced tweet, the Twitter API seems to omit referenced_tweets completely. In that case, the cell value is NaN (float) instead of an empty list. I updated the code above to take that into account.

Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON)

I'm trying to connect to my office's SmartSheet API via Python to create some performance tracking dashboards that utilize data outside of SmartSheet. All I want to do is create a simple DataFrame where fields reflect columnId and cell values reflect the displayValue key in the Smartsheet dictionary. I am doing this using a standard API requests.get rather than SmartSheet's API documentation because I've found the latter less easy to work with.
The table (sample) is set up as:
Number Letter Name
1 A Joe
2 B Jim
3 C Jon
The JSON syntax from the sheet GET request is:
{'id': 339338304219012,
'name': 'Sample Smartsheet',
'version': 1,
'totalRowCount': 3,
'accessLevel': 'OWNER',
'effectiveAttachmentOptions': ['GOOGLE_DRIVE',
'EVERNOTE',
'DROPBOX',
'ONEDRIVE',
'LINK',
'FILE',
'BOX_COM',
'EGNYTE'],
'ganttEnabled': False,
'dependenciesEnabled': False,
'resourceManagementEnabled': False,
'cellImageUploadEnabled': True,
'userSettings': {'criticalPathEnabled': False, 'displaySummaryTasks': True},
'userPermissions': {'summaryPermissions': 'ADMIN'},
'hasSummaryFields': False,
'permalink': 'https://app.smartsheet.com/sheets/5vxMCJQhMV7VFFPMVfJgg2hX79rj3fXgVGG8fp61',
'createdAt': '2020-02-13T16:32:02Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'isMultiPicklistEnabled': True,
'columns': [{'id': 6273865019090820,
'version': 0,
'index': 0,
'title': 'Number',
'type': 'TEXT_NUMBER',
'primary': True,
'validation': False,
'width': 150},
{'id': 4022065205405572,
'version': 0,
'index': 1,
'title': 'Letter',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150},
{'id': 8525664832776068,
'version': 0,
'index': 2,
'title': 'Name',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150}],
'rows': [{'id': 8660990817003396,
'rowNumber': 1,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 1.0, 'displayValue': '1'},
{'columnId': 4022065205405572, 'value': 'A', 'displayValue': 'A'},
{'columnId': 8525664832776068, 'value': 'Joe', 'displayValue': 'Joe'}]},
{'id': 498216492394372,
'rowNumber': 2,
'siblingId': 8660990817003396,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 2.0, 'displayValue': '2'},
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'},
{'columnId': 8525664832776068, 'value': 'Jim', 'displayValue': 'Jim'}]},
{'id': 5001816119764868,
'rowNumber': 3,
'siblingId': 498216492394372,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 3.0, 'displayValue': '3'},
{'columnId': 4022065205405572, 'value': 'C', 'displayValue': 'C'},
{'columnId': 8525664832776068, 'value': 'Jon', 'displayValue': 'Jon'}]}]}
Here are the two ways I've approached the problem:
INPUT:
from pandas.io.json import json_normalize
samplej = sample.json()
s_rows = json_normalize(data=samplej['rows'], record_path='cells', meta=['id', 'rowNumber'])
s_rows
OUTPUT:
DataFrame with columnId, value, disdlayValue, id, and rowNumber as their own fields.
If I could figure out how to transpose this data in the right way I could probably make it work, but that seems incredibly complicated.
INPUT:
samplej = sample.json()
cellist = []
def get_cells():
srows = samplej['rows']
for s_cells in srows:
scells = s_cells['cells']
cellist.append(scells)
get_cells()
celldf = pd.DataFrame(cellist)
celldf
OUTPUT:
This returns a DataFrame with the correct number of columns and rows, but each cell is populated with a dictionary that looks like
In [14]:
celldf.loc[1,1]
Out [14]:
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'}
If there was a way to remove everything except the value corresponding to the displayValue key in every cell, this would probably solve my problem. Again, though, it seems weirdly complicated.
I'm fairly new to Python and working with API's, so there may be a simple way to address the problem I'm overlooking. Or, if you have a suggestion for approaching the possible solutions I outlined above I'm all ears. Thanks for your help!
You must make use of the columns field:
colnames = {x['id']: x['title'] for x in samplej['columns']}
columns = [x['title'] for x in samplej['columns']]
cellist = [{colnames[scells['columnId']]: scells['displayValue']
for scells in s_cells['cells']} for s_cells in samplej['rows']]
celldf = pd.DataFrame(cellist, columns=columns)
This gives as expected:
Number Letter Name
0 1 A Joe
1 2 B Jim
2 3 C Jon
If some cells could contain only a columnId but no displayValue field, scells['displayValue'] should be replaced in above code with scells.get('displayValue', defaultValue), where defaultValue could be None, np.nan or any other relevant default.

Write json responce for each request into a file

I wrote a code which is making a request to API and recieving output in JSON. So my question is how to write output for each request in file. Now my code is doing the last one request.
import requests
import json
with open("query4.txt", "rt") as file:
data_file = file.read()
for line in data_file.split("\n"):
drX, drY, fromX, fromY, dist = line.split(",")
url = "https://api.openrouteservice.org/directions?"
params = [
["api_key", "my_api_key"],
["coordinates", "%s,%s|%s,%s" % (drY, drX, fromY, fromX)],
["profile", "driving-car"]
]
headers = {
"Accept": "application/json, application/geo+json,"
"application/gpx+xml, img/png; charset=utf-8"}
responce = requests.get(url=url, params=params, headers=headers)
# print(responce.url)
# print(responce.text)
result = json.loads(responce.text)
# print(result)
with open("result.txt", "w+") as f_o:
for rows in result["routes"]:
f_o.writelines(json.dumps(rows["summary"]["distance"])) # depending on how do you want the result
print(result["routes"])
I have an output like this:
{'routes': [{'warnings': [{'code': 1, 'message': 'There may be restrictions on some roads'}], 'summary': {'distance': 899.6, 'duration': 102.1}, 'geometry_format': 'encodedpolyline', 'geometry': 'u~uiHir{iFb#SXzADTlAk#JOJ]#_#CWW{AKo#k#eDEYKo#y#{EGc#G]GYCOa#gCc#iCoBsLNGlAm#VK^Sh#Un#tD', 'segments': [{'distance': 899.6, 'duration': 102.1, 'steps': [{'distance': 22.1, 'duration': 5.3, 'type': 11, 'instruction': 'Head south', 'name': '', 'way_points': [0, 1]}, {'distance': 45.4, 'duration': 10.9, 'type': 1, 'instruction': 'Turn right', 'name': '', 'way_points': [1, 3]}, {'distance': 645.5, 'duration': 52.3, 'type': 0, 'instruction': 'Turn left onto Партизанська вулиця', 'name': 'Партизанська вулиця', 'way_points': [3, 21]}, {'distance': 114.4, 'duration': 20.6, 'type': 1, 'instruction': 'Turn right', 'name': '', 'way_points': [21, 26]}, {'distance': 72.1, 'duration': 13, 'type': 1, 'instruction': 'Turn right', 'name': '', 'way_points': [26, 27]}, {'distance': 0, 'duration': 0, 'type': 10, 'instruction': 'Arrive at your destination, on the left', 'name': '', 'way_points': [27, 27]}]}], 'way_points': [0, 27], 'extras': {'roadaccessrestrictions': {'values': [[0, 1, 0], [1, 3, 2], [3, 27, 0]], 'summary': [{'value': 0, 'distance': 854.2, 'amount': 94.95}, {'value': 2, 'distance': 45.4, 'amount': 5.05}]}}, 'bbox': [38.484536, 48.941171, 38.492904, 48.943022]}], 'bbox': [38.484536, 48.941171, 38.492904, 48.943022], 'info': {'attribution': 'openrouteservice.org | OpenStreetMap contributors', 'engine': {'version': '5.0.1', 'build_date': '2019-05-29T14:22:56Z'}, 'service': 'routing', 'timestamp': 1568280549854, 'query': {'profile': 'driving-car', 'preference': 'fastest', 'coordinates': [[38.485115, 48.942059], [38.492073, 48.941676]], 'language': 'en', 'units': 'm', 'geometry': True, 'geometry_format': 'encodedpolyline', 'instructions_format': 'text', 'instructions': True, 'elevation': False}}}
{'routes': [{'summary': {'distance': 2298, 'duration': 365.6}, 'geometry_format': 'encodedpolyline', 'geometry': 'u~a{Gee`zDLIvBvDpClCtA|AXHXCp#m#bBsBvBmC`AmAtIoKNVLXHPb#c#`A_AFENGzAc#XKZCJ?PDLBH#F?T?PC~CcATOt#Sd#QLKBCBAb#]ZG|#OY_DQ}IE{DC_DAg#Eg#q#aFgBuH^GjBFj#
I did NeverHopeless answer, but i've got the same:
result = json.loads(responce.text)
i = 0
with open(f"result-{i}.txt", "w+") as f_o:
i += 1
for rows in result["routes"]:
f_o.writelines(json.dumps(rows["summary"]["distance"])) # depending on how do you want the result
print(result["routes"])
Output now looks like this
899.622982138.832633191.8
I'm expecting to get this:
2298
2138.8
3263
3191.8
Every value is a distance from different requests so i need to have each on the new line.
You need to open and keep open the output file before your loop:
import requests
import json
with open("query4.txt", "rt") as file:
data_file = file.read()
with open("result.txt", "w") as f_o:
for line in data_file.split("\n"):
drX, drY, fromX, fromY, dist = line.split(",")
url = "https://api.openrouteservice.org/directions?"
params = [
["api_key", "my_api_key"],
["coordinates", "%s,%s|%s,%s" % (drY, drX, fromY, fromX)],
["profile", "driving-car"]
]
headers = {
"Accept": "application/json, application/geo+json,"
"application/gpx+xml, img/png; charset=utf-8"}
responce = requests.get(url=url, params=params, headers=headers)
# print(responce.url)
# print(responce.text)
result = json.loads(responce.text)
# print(result)
for rows in result["routes"]:
print(rows["summary"]["distance"], file=f_o) # depending on how do you want the result
# print(result["routes"])
I think it's better to write results in different files with timestamp. in this way you don't rewrite on your older file and also you can find them easier.
current_time = time.strftime("%m_%d_%y %H_%M_%S", time.localtime())
with open(current_time + ".txt", "w+") as f_o:
for rows in result["routes"]:
f_o.writelines(json.dumps(rows["summary"]["distance"])) # depending on how do you want the result
print(result["routes"])
You need to make this filename "result.txt" dynamic. Currently it is overwriting content.
Perhaps like this:
i = 0 # <----- Keep it outside your for loop or it will be always set to zero
with open(f"result-{i}.txt", "w+") as f_o:
i += 1
Or instead of integers, you may better use timestamp in filename.

Using json_normalize to flatten nested json

I'm trying to flatten a json file using json_normalize in Python (Pandas), but being a noob at this I always seem to end up in a KeyError.
What I would like to achieve is a DataFrame with all the Plays in a game.
I've tried numerous variants of paths and prefixes, but no success. Googled a lot as well, but I'm still falling short.
What I would like to end up with is a DataFrame like:
period, time, type, player1, player2, xcord, ycord
import pandas as pd
import json
with open('PlayByPlay.json') as data_file:
data = json.load(data_file)
from pandas.io.json import json_normalize
records = json_normalize(data)
plays = records['data.game.plays.play'][0]
plays
Would generate
{'aoi': [8470324, 8473449, 8475158, 8475215, 8477499, 8477933],
'apb': [],
'as': 0,
'asog': 0,
'desc': 'Zack Kassian hit Kyle Okposo',
'eventid': 7,
'formalEventId': 'EDM7',
'hoi': [8471678, 8475178, 8475660, 8476454, 8476457, 8476472],
'hpb': [],
'hs': 0,
'hsog': 0,
'localtime': '5:12 PM',
'p1name': 'Zack Kassian',
'p2name': 'Kyle Okposo',
'p3name': '',
'period': 1,
'pid': 8475178,
'pid1': 8475178,
'pid2': 8473449,
'pid3': '',
'playername': 'Zack Kassian',
'strength': 701,
'sweater': '44',
'teamid': 22,
'time': '00:28',
'type': 'Hit',
'xcoord': 22,
'ycoord': 38}
Json
{'data': {'game': {'awayteamid': 7,
'awayteamname': 'Buffalo Sabres',
'awayteamnick': 'Sabres',
'hometeamid': 22,
'hometeamname': 'Edmonton Oilers',
'hometeamnick': 'Oilers',
'plays': {'play': [{'aoi': [8470324,
8473449,
8475158,
8475215,
8477499,
8477933],
'apb': [],
'as': 0,
'asog': 0,
'desc': 'Zack Kassian hit Kyle Okposo',
'eventid': 7,
'formalEventId': 'EDM7',
'hoi': [8471678, 8475178, 8475660, 8476454, 8476457, 8476472],
'hpb': [],
'hs': 0,
'hsog': 0,
'localtime': '5:12 PM',
'p1name': 'Zack Kassian',
'p2name': 'Kyle Okposo',
'p3name': '',
'period': 1,
'pid': 8475178,
'pid1': 8475178,
'pid2': 8473449,
'pid3': '',
'playername': 'Zack Kassian',
'strength': 701,
'sweater': '44',
'teamid': 22,
'time': '00:28',
'type': 'Hit',
'xcoord': 22,
'ycoord': 38},
{'aoi': [8471742, 8475179, 8475215, 8475220, 8475235, 8475728],
'apb': [],
'as': 0,
'asog': 0,
'desc': 'Jesse Puljujarvi Tip-In saved by Robin Lehner',
'eventid': 59,
'formalEventId': 'EDM59',
'hoi': [8473468, 8474034, 8475660, 8477498, 8477934, 8479344],
'hpb': [],
'hs': 0,
'hsog': 1,
'localtime': '5:13 PM',
'p1name': 'Jesse Puljujarvi',
'p2name': 'Robin Lehner',
'p3name': '',
'period': 1,
'pid': 8479344,
'pid1': 8479344,
'pid2': 8475215,
'pid3': '',
'playername': 'Jesse Puljujarvi',
'strength': 701,
'sweater': '98',
'teamid': 22,
'time': '01:32',
'type': 'Shot',
'xcoord': 81,
'ycoord': 3}]}},
'refreshInterval': 0}}
If you have only one game, this will create the dataframe you want:
json_normalize(data['data']['game']['plays']['play'])
Then you just need to extract the columns you're interested in.
it might be un-intuitive to use this API when the structure becomes complicated.
but the key is: json_normalize extracts JSON fields into table.
for my case: I have a table
----------
| fact | // each row is a json object {'a':a, 'b':b....}
----------
rrrrr = []
for index, row in data.iterrows():
r1 = json_normalize(row['fact'])
rrrrr.append(r1)
rr1 = pd.concat(rrrrr)

Categories

Resources