Have dict1 {subdict1,subdict2}, dict2 {subdict1,subdict2} and dict3 (doesnt have subdicts) into a list 'insights', need to create a gsheet file for each dict of 'insights' list but a sheet for each subdict, this is what its inside 'insights':
[{'city': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'city',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'gender',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'country',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'age',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''}},
{'city': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'city',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'gender',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'country',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'age',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''}},
{'name': 'follower_count',
'period': 'day',
'title': 'Follower Count',
'description': 'Total number of unique accounts following this profile',
'df': ''}]
As you can see in summary the list is the following:
insights = [
follower_demographics,
reached_demographics,
followers_count
]
And this is what each dictionary of the list have, in the case of 'follower_demographics' it breaks in a dictionary of ['city', 'gender', 'country', 'age'] where inside each one is this:
demographics = {
'name': '',
'period': '',
'title': '',
'type': '',
'description': '',
'df': ''
}
So I did the function below to create a file for the 3 dictionaries of 'insights', the problem is that it creates 4 files of 'follower_demographics' and each one with one respective dataframe.
def create_gsheet(insights, folder_id):
try:
# create a list to store the created files
files = []
# iterate over the items in the insights dictionary
for idx, (key, value) in enumerate(insights.items()):
# check if the value is a dictionary
if isinstance(value, dict):
# Create a new file with the name taken from the 'title' key
file = gc.create(value['title'], folder=folder_id)
print(f"Creating {value['title']} - {idx}/{len(insights)}")
# add the file to the list
files.append(file)
# Create a new sheet within the file with the name taken from the 'name' key
sheet = file.add_worksheet(value['type'] + '_' + value['name'])
# Set the sheet data to the df provided in the dictionary
sheet.set_dataframe(value['df'], (1,1), encoding='utf-8', fit=True)
sheet.frozen_rows = 1
# delete the default sheet1 from all the created files
for file in files:
file.del_worksheet(file.sheet1)
except Exception as error:
print(F'An error occurred: {error}')
sheet = None
And the result I want is that for example create 'follower_demographics' file and as sub_sheets 'city_follower_demographics', 'gender_follower_demographics' with their respective dataframes.
Related
I have JSON data that I loaded that appears to have a bit of a messy data structure where nested dictionaries are wrapped in single quotes and recognized as a string, rather than a single dictionary which I can loop through. What is the best way to drop the single quotes from the key-value property ('value').
Provided below is an example of the structure:
for val in json_data:
print(val)
{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'},
If I add a nested look targeting ['value'], it loops by character and not key-value pair in the dictionary.
Using json.loads to convert string to dict
import json
json_data = [{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'}]
# the result is a Python dictionary:
for val in json_data:
print(json.loads(val['value']))
this should be work!!
OK, I'm a newbie and I think I'm doing everything I should be, but I am still getting a KeyError: venues. (I also tried using "venue" instead and I am not at my maximum quota for the day at FourSquare)... I am using a Jupyter Notebook to do this
Using this code:
VERSION = '20200418'
RADIUS = 1000
LIMIT = 2
**url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
url
results = requests.get(url).json()**
I get 2 results (shown at end of this post)
When I try to take those results and put them into a dataframe, i get "KeyError: venues"
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-5acf500bf9ad> in <module>
1 # assign relevant part of JSON to venues
----> 2 venues = results['response']['venues']
3
4 # tranform venues into a dataframe
5 dataframe = json_normalize(venues)
KeyError: 'venues'
I'm not really sure where I am going wrong... This has worked for me with other locations... But then again, like I said, I'm new at this... (I haven't maxed out my queries, and I've tried using "venue" instead)... Thank you
FourSquareResults:
{'meta': {'code': 200, 'requestId': '5ec42de01a4b0a001baa10ff'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'warning': {'text': "There aren't a lot of results near you. Try something more general, reset your filters, or expand the search area."},
'headerLocation': 'Cranford',
'headerFullLocation': 'Cranford',
'headerLocationGranularity': 'city',
'totalResults': 20,
'suggestedBounds': {'ne': {'lat': 40.67401708586377,
'lng': -74.29300815204098},
'sw': {'lat': 40.65601706786374, 'lng': -74.31669390523408}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4c13c8d2b7b9c928d127aa37',
'name': 'Cranford Canoe Club',
'location': {'address': '250 Springfield Ave',
'crossStreet': 'Orange Avenue',
'lat': 40.66022488705574,
'lng': -74.3061084180977,
'labeledLatLngs': [{'label': 'display',
'lat': 40.66022488705574,
'lng': -74.3061084180977},
{'label': 'entrance', 'lat': 40.660264, 'lng': -74.306191}],
'distance': 543,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['250 Springfield Ave (Orange Avenue)',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4f4528bc4b90abdf24c9de85',
'name': 'Athletics & Sports',
'pluralName': 'Athletics & Sports',
'shortName': 'Athletics & Sports',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/sports_outdoors_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []},
'venuePage': {'id': '60380091'}},
'referralId': 'e-0-4c13c8d2b7b9c928d127aa37-0'},
{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4d965995e07ea35d07e2bd02',
'name': 'Mizu Sushi',
'location': {'address': '103 Union Ave.',
'lat': 40.65664427772896,
'lng': -74.30343966195308,
'labeledLatLngs': [{'label': 'display',
'lat': 40.65664427772896,
'lng': -74.30343966195308}],
'distance': 939,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['103 Union Ave.',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4bf58dd8d48988d1d2941735',
'name': 'Sushi Restaurant',
'pluralName': 'Sushi Restaurants',
'shortName': 'Sushi',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-4d965995e07ea35d07e2bd02-1'}]}]}}
Look more closely at response that you're getting - there's no "venues" key there. Closest one that I see is "groups" list, which has "items" list in it, and individual items have "venue" key in them.
I have retrieved a JSON object from an API. The JSON object looks like this:
{'copyright': 'Copyright (c) 2020 The New York Times Company. All Rights '
'Reserved.',
'response': {'docs': [{'_id': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'abstract': 'LEAD: RESEARCHERS at the Brookhaven '
'National Laboratory are employing a novel '
'model to study skin cancer in humans: '
'they are exposing tiny tropical fish to '
'ultraviolet radiation.',
'byline': {'organization': None,
'original': 'By Eric Schmitt',
'person': [{'firstname': 'Eric',
'lastname': 'Schmitt',
'middlename': None,
'organization': '',
'qualifier': None,
'rank': 1,
'role': 'reported',
'title': None}]},
'document_type': 'article',
'headline': {'content_kicker': None,
'kicker': None,
'main': 'Tiny Fish Help Solve Cancer '
'Riddle',
'name': None,
'print_headline': 'Tiny Fish Help Solve '
'Cancer Riddle',
'seo': None,
'sub': None},
'keywords': [{'major': 'N',
'name': 'organizations',
'rank': 1,
'value': 'Brookhaven National '
'Laboratory'},
{'major': 'N',
'name': 'subject',
'rank': 2,
'value': 'Ozone'},
{'major': 'N',
'name': 'subject',
'rank': 3,
'value': 'Radiation'},
{'major': 'N',
'name': 'subject',
'rank': 4,
'value': 'Cancer'},
{'major': 'N',
'name': 'subject',
'rank': 5,
'value': 'Research'},
{'major': 'N',
'name': 'subject',
'rank': 6,
'value': 'Fish and Other Marine Life'}],
'lead_paragraph': 'RESEARCHERS at the Brookhaven '
'National Laboratory are employing a '
'novel model to study skin cancer in '
'humans: they are exposing tiny '
'tropical fish to ultraviolet '
'radiation.',
'multimedia': [],
'news_desk': 'Science Desk',
'print_page': '3',
'print_section': 'C',
'pub_date': '1989-12-26T05:00:00+0000',
'section_name': 'Science',
'snippet': '',
'source': 'The New York Times',
'type_of_material': 'News',
'uri': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'web_url': 'https://www.nytimes.com/1989/12/26/science/tiny-fish-help-solve-cancer-riddle.html',
'word_count': 870},
{'_id': 'nyt://article/32a2431d-623a-525b-a21d-d401be865818',
'abstract': 'LEAD: Clouds, even the ones formed by '
...and continues like that, too long to show all of it here.
Now, when I want to list just one headline, I use:
pprint(articles['response']['docs'][0]['headline']['print_headline'])
And I get the output
'Tiny Fish Help Solve Cancer Riddle'
The problem is when I want to pick out all of the headlines from this JSON object, and make a list of them. I tried:
index = 0
for headline in articles:
headlineslist = ['response']['docs'][index]['headline']['print_headline'].split("''")
index = index + 1
headlineslist
But I get the error TypeError: list indices must be integers or slices, not str
In other words, it worked when I "listed" just one headline, at index [0], but not when I try to repeat the process over each index. How do I iterate through each index to get a list of outputs like the first one?
To iterate over the document list you can just do the following:
for doc in (articles['response']['docs']):
print(doc['headline']['print_headline'])
This would print all headlines.
I have a dataset (that pull its data from a dict) that I am attempting to clean and republish. Within this data set, there is a field with a sublist that I would like to extract specific data from.
Here's the data:
[{'id': 'oH58h122Jpv47pqXhL9p_Q', 'alias': 'original-pizza-brooklyn-4', 'name': 'Original Pizza', 'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/HVT0Vr_Vh52R_niODyPzCQ/o.jpg', 'is_closed': False, 'url': 'https://www.yelp.com/biz/original-pizza-brooklyn-4?adjust_creative=IelPnWlrTpzPtN2YRie19A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=IelPnWlrTpzPtN2YRie19A', 'review_count': 102, 'categories': [{'alias': 'pizza', 'title': 'Pizza'}], 'rating': 4.0, 'coordinates': {'latitude': 40.63781, 'longitude': -73.8963799}, 'transactions': [], 'price': '$', 'location': {'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}, 'phone': '+17185313559', 'display_phone': '(718) 531-3559', 'distance': 319.98144420799355},
Here's how the data is presented within the csv/spreadsheet:
location
{'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}
Is there a way to pull location.city for example?
The below code simply adds a few fields and exports it to a csv.
def data_set(data):
df = pd.DataFrame(data)
df['zip'] = get_zip()
df['region'] = get_region()
newdf = df.filter(['name', 'phone', 'location', 'zip', 'region', 'coordinates', 'rating', 'review_count',
'categories', 'url'], axis=1)
if not os.path.isfile('yelp_data.csv'):
newdf.to_csv('data.csv', header='column_names')
else: # else it exists so append without writing the header
newdf.to_csv('data.csv', mode='a', header=False)
If that doesn't make sense, please let me know. Thanks in advance!
I'm trying to split a dictionary with a list within a pandas column but it isn't working for me...
The column looks like so when called;
df.topics[3]
Output
"[{'urlkey': 'webdesign', 'name': 'Web Design', 'id': 659}, {'urlkey': 'productdesign', 'name': 'Product Design', 'id': 2993}, {'urlkey': 'internetpro', 'name': 'Internet Professionals', 'id': 10102}, {'urlkey': 'web', 'name': 'Web Technology', 'id': 10209}, {'urlkey': 'software-product-management', 'name': 'Software Product Management', 'id': 42278}, {'urlkey': 'new-product-development-software-tech', 'name': 'New Product Development: Software & Tech', 'id': 62946}, {'urlkey': 'product-management', 'name': 'Product Management', 'id': 93740}, {'urlkey': 'internet-startups', 'name': 'Internet Startups', 'id': 128595}]"
I want to only be left with the 'name' and 'id' to put into separate columns of topic_1, topic_2, and so forth.
Appreciate any help.
You can give this a try.
import json
df.topics.apply(lambda x : {x['id']:x['name'] for x in json.loads(x.replace("'",'"'))} )
Your output for the row you gave is :
{659: 'Web Design',
2993: 'Product Design',
10102: 'Internet Professionals',
10209: 'Web Technology',
42278: 'Software Product Management',
62946: 'New Product Development: Software & Tech',
93740: 'Product Management',
128595: 'Internet Startups'}
You should try a simple method
dt = df.topic[3]
li = []
for x in range(len(dt)):
t = {dt[x]['id']:dt[x]['name']}
li.append(t)
print(li)
Output is-
[{659: 'Web Design'},
{2993: 'Product Design'},
{10102: 'Internet Professionals'},
{10209: 'Web Technology'},
{42278: 'Software Product Management'},
{62946: 'New Product Development: Software & Tech'},
{93740: 'Product Management'},
{128595: 'Internet Startups'}]
First we takes the value of df.topic[3] in dt which is in form of list and dictionary inside the list, then we take an temp list li[] in which we add(append) our values, Now we run the loop for the length of values of de.topic(which we takes as dt), Now in t we are adding id or name by dt[0]['id'] or dt[0]['name'] which is '659:'Web Design' as x increase all values are comes in t, then by { : }
we are converting the values in Dictionary and append it to the temporary list li