Im iterating through a nested json tree with Pandas dataframe. The issue im having is more or less simple to solve, but im out of ideas. When im traversing trough the nested json tree i get to a part where i cant get out of it and continue on another branch (i.e. when i reach Placeholder 1 i cant return and continue with Placeholder 2 (see json below). Here is my code so far:
def recursiveImport(df):
for row,_ in enumerate(df):
# Get ID, Name, Type
id = df['ID'].values[row]
name = df['Name'].values[row]
type = df['Type'].values[row]
# Iterate through Value
if type == 'struct':
for i in df.at[row, 'Value']:
df = pd.json_normalize(i)
recursiveImport(df)
elif type != 'struct':
value = df['Value'].values[row]
print(f'Value: {value}')
return
data = pd.read_json('work_gmt.json', orient='records')
print(data)
recursiveImport(data)
And the (minified) data im using for this is below (you can use a online json viewer to get a better look):
[{"ID":11,"Name":"Data","Type":"struct","Value":[[{"ID":0,"Name":"humidity","Type":"u32","Value":0},{"ID":0,"Name":"meta","Type":"struct","Value":[{"ID":0,"Name":"height","Type":"e32","Value":[0,0]},{"ID":0,"Name":"voltage","Type":"u16","Value":0},{"ID":0,"Name":"Placeholder 1","Type":"u16","Value":0}]},{"ID":0,"Name":"Placeholder 2","Type":"struct","Value":[{"ID":0,"Name":"volume","Type":"struct","Value":[{"ID":0,"Name":"volume profile","Type":"struct","Value":[{"ID":0,"Name":"upper","Type":"u8","Value":0},{"ID":0,"Name":"middle","Type":"u8","Value":0},{"ID":0,"Name":"down","Type":"u8","Value":0}]}]}]}]]}]
I tried using an indexed approach and keep track of each branch, but that didn't work for me. Perhaps i have to use a Stack/Queue to keep track? Thanks in advance!
Cheers!
Related
I'm making a script with Python to search for competitors with a Google API.
Just for you to see how it works:
First I make a request and save data inside a Json:
# make the http GET request to Scale SERP
api_result = requests.get('https://api.scaleserp.com/search', params)
# Save data inside Json
dados = api_result.json()
Then a create some lists to get position, title, domain and things like that, then I create a loop for to append the position from my competitors inside my lists:
# Create the lists
sPositions = []
sDomains = []
sUrls = []
sTitles = []
sDescription = []
sType = []
# Create loop for to look for information about competitors
for sCompetitors in dados['organic_results']:
sPositions.append(sCompetitors['position'])
sDomains.append(sCompetitors['domain'])
sUrls.append(sCompetitors['link'])
sTitles.append(sCompetitors['title'])
sDescription.append(sCompetitors['snippet'])
sType.append(sCompetitors['type'])
The problem is that not every bracket of my Json is going to have the same values. Some of them won't have the "domain" value. So I need something like "when there is no 'domain' value, append 'no domain' to sDomains list.
I'm glad if anyone could help.
Thanks!!
you should use the get method for dicts so you can set a default value incase the key doesn't exist:
for sCompetitors in dados['organic_results']:
sPositions.append(sCompetitors.get('position', 'no position'))
sDomains.append(sCompetitors.get('domain', 'no domain'))
sUrls.append(sCompetitors.get('link', 'no link'))
sTitles.append(sCompetitors.get('title', 'no title'))
sDescription.append(sCompetitors.get('snippet', 'no snippet'))
sType.append(sCompetitors.get('type', 'no type'))
I've followed a tutorial to write a Flask REST API and have a special request about a Python code.
The offered code is following:
# data list is where my objects are stored
def put_one(name):
list_by_id = [list for list in data_list if list['name'] == name]
list_by_id[0]['name'] = [new_name]
print({'list_by_id' : list_by_id[0]})
It works, which is nice, and even though I understand what line 2 is doing, I would like to rewrite it in a way that it's clear how the function iterates over the different lists. I already have an approach but it returns Key Error: 0
def put(name):
list_by_id = []
list = []
for list in data_list:
if(list['name'] == name):
list_by_id = list
list_by_id[0]['name'] = request.json['name']
return jsonify({'list_by_id' : list_by_id[0]})
My goal with this is also to be able to put other elements, that don't necessarily have the type 'name'. If I get to rewrite the function in an other way I'll be more likely to adapt it to my needs.
I've looked for tools to convert one way of coding into the other and answers in forums before coming here and couldn't find it.
It may not be beatiful code, but it gets the job done:
def put(value):
for i in range(len(data_list)):
key_list = list(data_list[i].keys())
if data_list[i][key_list[0]] == value:
print(f"old value: {key_list[0], data_list[i][key_list[0]]}")
data_list[i][key_list[0]] = request.json[test_key]
print(f"new value: {key_list[0], data_list[i][key_list[0]]}")
break
Now it doesn't matter what the key value is, with this iteration the method will only change the value when it finds in the data_list. Before the code breaked at every iteration cause the keys were different and they played a role.
import requests
import json
r = requests.get("https://api.investing.com/api/search/?t=Equities&q=amd") # i get json text from this api
data = json.loads(r.text)
if data['articles'][0]['exchange'] == 'Sydney': # the error is here KeyError: 'exchange'
print('success')
else:
print('fail')
if i want to get the url '/equities/segue-resources-ltd' by checking if the 'exchange' is 'Sydney' which is stored in this part of the json text, {"id":948190,"url":"/equities/segue-resources-ltd","description":"Segue Resources Ltd","symbol":"AMD","exchange":"Sydney","flag":"AU","type":"Equities"}
If i'm understanding this correctly, the exchange identifier only appears in part of the json response. So, in order to get your result using the same data variable in your question, we can do this:
result = [val["url"] for val in data["quotes"] if val["exchange"] == "Sydney"]
We are using a list comprehension here, where the loop is only going through data["quotes"] instead of the whole json response, and for each item in that json subset, we're returning the value for key == "url" where the exchange == "Sydney". Running the line above should get you:
['/equities/segue-resources-ltd']
As expected. If you aren't comfortable with list comprehensions, the more conventional loop-version of it looks like:
result = []
for val in data["quotes"]:
if val["exchange"] == "Sydney":
result.append(val["url"])
print(result)
KeyError: 'exchange' means that the dictionary data['articles'][0] did not have a key 'exchange'.
Depending on your use case, you may want to iterate over the whole list of articles:
for article in data['articles']:
if 'exchange' in article and article['exchange'] == 'Sydney':
... # Your code here
If you only want to check the first article, then use data['articles'][0].get('exchange'). The dict.get() method will return None if the key is not present instead of throwing a KeyError.
I have a dataframe with a column of URL's that I would like to parse into new columns with rows based on the value of a specified parameter if it is present in the URL. I am using a function that is looping through each row in the dataframe column and parsing the specified URL parameter, but when I try to select the column after the function has finished I am getting a keyError. Should I be setting the value to this new column in a different manner? Is there a more effective approach than looping through the values in my table and running this process?
Error:
KeyError: 'utm_source'
Example URLs (df['landing_page_url']):
https://lp.example.com/test/lp
https://lp.example.com/test/ny/?utm_source=facebook&ref=test&utm_campaign=ny-newyork_test&utm_term=nice
https://lp.example.com/test/ny/?utm_source=facebook
NaN
https://lp.example.com/test/la/?utm_term=lp-test&utm_source=facebook
Code:
import pandas as pd
import numpy as np
import math
from urllib.parse import parse_qs, urlparse
def get_query_field(url, field):
if isinstance(url, str):
try:
return parse_qs(urlparse(url).query)[field][0]
except KeyError:
return ''
else:
return ''
for i in df['landing_page_url']:
print(i) // returns URL
print(get_query_field(i, 'utm_source')) // returns proper values
df['utm_source'] == get_query_field(i, 'utm_source')
df['utm_campaign'] == get_query_field(i, 'utm_campaign')
df['utm_term'] == get_query_field(i, 'utm_term')
I don't think your for loop will work. It looks like each time it will overwrite the entire column you are trying to set. I wanted to test the speed against my method, but I'm nearly certain this will be faster that iterating.
#Simplify the function here as recommended by Nick
def get_query_field(url, field):
if isinstance(url, str):
return parse_qs(urlparse(url).query).get(field, [''])[0]
return ''
#Use apply to create new columns based on the url
df['utm_source'] = df['landing_page_url'].apply(get_query_field, args=['utm_source'])
df['utm_campaign'] = df['landing_page_url'].apply(get_query_field, args=['utm_campaign'])
df['utm_term'] = df['landing_page_url'].apply(get_query_field, args=['utm_term'])
Instead of
try:
return parse_qs(urlparse(url).query)[field][0]
except KeyError:
return ''
You can just do:
return parse_qs(urlparse(url).query).get(field, [''])[0]
The trick here is my_dict.get(key, default) instead of my_dict[key]. The default will be returned if the key doesn't exist
Is there a more effective approach than looping through the values in my table and running this process?
Not really. Looping through each url is going to have to be done either way. Right now though, you are overriding the dataframe for every url. Meaning that if two different URLs have different sources in the query, the last one in the list will win. I have no idea if this is intentional or not.
Also note: this line
df['utm_source'] == get_query_field(i, 'utm_source')
Is not actually doing anything. == is a comparison operator, "does left side match right side'. You probably meant to use = or df.append({'utm_source': get_query_field(..)})
I'm using the Temboo Twitter API for Python to download tweets. I want to interpret them but am having trouble pulling out certain values. It returns each tweet in JSON. I want to take certain items out of the JSON and pass them over for further use (favorite_count in the example below). print (json.loads(array)) works fine but the following line print (data['favorite_count']) does not and returns and error list indices must be integers, not str. Giving an integer value just returns and out of range index error.
Would really appreciate a solution to extracting a certain section from the JSON list.
homeTimelineResults = homeTimelineChoreo.execute_with_results(homeTimelineInputs)
if __name__ == "__main__":
array = homeTimelineResults.get_Response()
data = json.loads(array)
print (json.loads(array))
print (data['favorite_count'])
From the error you are getting, I would guess that data is a list, not a dictionary. What you could do then is something along these lines:
import collections
homeTimelineResults = homeTimelineChoreo.execute_with_results(homeTimelineInputs)
if __name__ == "__main__":
array = homeTimelineResults.get_Response()
data = json.loads(array)
if data and isinstance(data, collections.Iterable) and not isinstance(data, (str, bytes)):
result = data.pop(0)
print(result['favorite_count'])
Basically we are checking that data is indeed a list or tuple or something you can iterate over (but not a string or a sequence of bytes) and that it is not empty. This is the meaning of the if statement after data = json.loads(array).
If that is indeed the case, we pop the first element and - assuming that it is a dictionary - access its 'favorite_count' key. Of course this assumption is pretty dangerous and one should be a bit more careful and check first :-)