Related
Have dict1 {subdict1,subdict2}, dict2 {subdict1,subdict2} and dict3 (doesnt have subdicts) into a list 'insights', need to create a gsheet file for each dict of 'insights' list but a sheet for each subdict, this is what its inside 'insights':
[{'city': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'city',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'gender',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'country',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'age',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''}},
{'city': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'city',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'gender',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'country',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'age',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''}},
{'name': 'follower_count',
'period': 'day',
'title': 'Follower Count',
'description': 'Total number of unique accounts following this profile',
'df': ''}]
As you can see in summary the list is the following:
insights = [
follower_demographics,
reached_demographics,
followers_count
]
And this is what each dictionary of the list have, in the case of 'follower_demographics' it breaks in a dictionary of ['city', 'gender', 'country', 'age'] where inside each one is this:
demographics = {
'name': '',
'period': '',
'title': '',
'type': '',
'description': '',
'df': ''
}
So I did the function below to create a file for the 3 dictionaries of 'insights', the problem is that it creates 4 files of 'follower_demographics' and each one with one respective dataframe.
def create_gsheet(insights, folder_id):
try:
# create a list to store the created files
files = []
# iterate over the items in the insights dictionary
for idx, (key, value) in enumerate(insights.items()):
# check if the value is a dictionary
if isinstance(value, dict):
# Create a new file with the name taken from the 'title' key
file = gc.create(value['title'], folder=folder_id)
print(f"Creating {value['title']} - {idx}/{len(insights)}")
# add the file to the list
files.append(file)
# Create a new sheet within the file with the name taken from the 'name' key
sheet = file.add_worksheet(value['type'] + '_' + value['name'])
# Set the sheet data to the df provided in the dictionary
sheet.set_dataframe(value['df'], (1,1), encoding='utf-8', fit=True)
sheet.frozen_rows = 1
# delete the default sheet1 from all the created files
for file in files:
file.del_worksheet(file.sheet1)
except Exception as error:
print(F'An error occurred: {error}')
sheet = None
And the result I want is that for example create 'follower_demographics' file and as sub_sheets 'city_follower_demographics', 'gender_follower_demographics' with their respective dataframes.
I looked up "nested dict" and "nested list" but either method work.
I have a python object with the following structure:
[{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
}]
what i need to output is a csv file in the following, flattened structure:
id, productname, variantid, size, currency, price
productID1, productname A, variantID1, M, USD, 1
productID1, productname A, variantID2, L, USD, 2
productID2, productname A, variantID3, XL, USD, 3
i tried this solution: Python: Writing Nested Dictionary to CSV
or this one: From Nested Dictionary to CSV File
i got rid of the [] around and within the data and e.g. i used this code snippet from 2 and adapted it to my needs. IRL i can't get rid of the [] because that's simple the format i get when calling the API.
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output:
writer = csv.writer(output, delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
for key in sorted(data):
value = data[key]
if len(value) > 0:
writer.writerow([key, value])
else:
for i in value:
writer.writerow([key, i, value])
but the output is like this:
"id";"productID1"
"name";"productname A"
"option";"{'size': {'type': 'list', 'name': 'size', 'choices': {'value': 'M'}}}"
"variant";"{'id': 'variantID1', 'choices': {'size': 'M'}, 'attributes': {'currency': 'USD', 'price': 1}}"
anyone can help me out, please?
thanks in advance
list indices must be integers not strings
The following presents a visual example of a python list:
0 carrot.
1 broccoli.
2 asparagus.
3 cauliflower.
4 corn.
5 cucumber.
6 eggplant.
7 bell pepper
0, 1, 2 are all "indices".
"carrot", "broccoli", etc... are all said to be "values"
Essentially, a python list is a machine which has integer inputs and arbitrary outputs.
Think of a python list as a black-box:
A number, such as 5, goes into the box.
you turn a crank handle attached to the box.
Maybe the string "cucumber" comes out of the box
You got an error: TypeError: list indices must be integers or slices, not str
There are various solutions.
Convert Strings into Integers
Convert the string into an integer.
listy_the_list = ["carrot", "broccoli", "asparagus", "cauliflower"]
string_index = "2"
integer_index = int(string_index)
element = listy_the_list[integer_index]
so yeah.... that works as long as your string-indicies look like numbers (e.g. "456" or "7")
The integer class constructor, int(), is not very smart.
For example, x = int("3 ") will produce an error.
You can try x = int(strying.strip()) to get rid of leading and trailing white-space characters.
Use a Container which Allows Keys to be Strings
Long ago, before before electronic computers existed, there were various types of containers in the world:
cookie jars
muffin tins
carboard boxes
glass jars
steel cans.
back-packs
duffel bags
closets/wardrobes
brief-cases
In computer programming there are also various types of "containers"
You do not have to use a list as your container, if you do not want to.
There are containers where the keys (AKA indices) are allowed to be strings, instead of integers.
In python, the standard container which like a list, but where the keys/indices can be strings, is a dictionary
thisdict = {
"make": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["brand"] == "Ford"
If you want to index into a container using strings, instead of integers, then use a dict, instead of a list
The following is an example of a python dict which has state names as input and state abreviations as output:
us_state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
i could actually iterate this list and create my own sublist, e.g. e list of variants
data = [{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
},
{'id': 'productID2', 'name': 'productname B',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'XL', 'salue':'XXL'},
]}},
'variant': [{
'id': 'variantID2',
'choices':
{'size': 'XL', 'size2':'XXL'},
'attributes':
{'currency': 'USD', 'price': 2}}]
}
]
new_list = {}
for item in data:
new_list.update(id=item['id'])
new_list.update (name=item['name'])
for variant in item['variant']:
new_list.update (varid=variant['id'])
for vchoice in variant['choices']:
new_list.update (vsize=variant['choices'][vchoice])
for attribute in variant['attributes']:
new_list.update (vprice=variant['attributes'][attribute])
for option in item['option']['size']['choices']:
new_list.update (osize=option['value'])
print (new_list)
but the output is always the last item of the iteration, because i always overwrite new_list with update().
{'id': 'productID2', 'name': 'productname B', 'varid': 'variantID2', 'vsize': 'XXL', 'vprice': 2, 'osize': 'XL'}
here's the final solution which worked for me:
data = [{
'id': 'productID1', 'name': 'productname A',
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}},
{'id':'variantID2',
'choices':
{'size': 'L'},
'attributes':
{'currency':'USD', 'price':2}}
]
},
{
'id': 'productID2', 'name': 'productname B',
'variant': [{
'id': 'variantID3',
'choices':
{'size': 'XL'},
'attributes':
{'currency': 'USD', 'price': 3}},
{'id':'variantID4',
'choices':
{'size': 'XXL'},
'attributes':
{'currency':'USD', 'price':4}}
]
}
]
for item in data:
for variant in item['variant']:
dic = {}
dic.update (ProductID=item['id'])
dic.update (Name=item['name'].title())
dic.update (ID=variant['id'])
dic.update (size=variant['choices']['size'])
dic.update (Price=variant['attributes']['price'])
products.append(dic)
keys = products[0].keys()
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output_file:
dict_writer = csv.DictWriter(output_file, keys,delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
dict_writer.writeheader()
dict_writer.writerows(products)
with the following output:
"ProductID";"Name";"ID";"size";"Price"
"productID1";"Productname A";"variantID1";"M";1
"productID1";"Productname A";"variantID2";"L";2
"productID2";"Productname B";"variantID3";"XL";3
"productID2";"Productname B";"variantID4";"XXL";4
which is exactly what i wanted.
OK, I'm a newbie and I think I'm doing everything I should be, but I am still getting a KeyError: venues. (I also tried using "venue" instead and I am not at my maximum quota for the day at FourSquare)... I am using a Jupyter Notebook to do this
Using this code:
VERSION = '20200418'
RADIUS = 1000
LIMIT = 2
**url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
url
results = requests.get(url).json()**
I get 2 results (shown at end of this post)
When I try to take those results and put them into a dataframe, i get "KeyError: venues"
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-5acf500bf9ad> in <module>
1 # assign relevant part of JSON to venues
----> 2 venues = results['response']['venues']
3
4 # tranform venues into a dataframe
5 dataframe = json_normalize(venues)
KeyError: 'venues'
I'm not really sure where I am going wrong... This has worked for me with other locations... But then again, like I said, I'm new at this... (I haven't maxed out my queries, and I've tried using "venue" instead)... Thank you
FourSquareResults:
{'meta': {'code': 200, 'requestId': '5ec42de01a4b0a001baa10ff'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'warning': {'text': "There aren't a lot of results near you. Try something more general, reset your filters, or expand the search area."},
'headerLocation': 'Cranford',
'headerFullLocation': 'Cranford',
'headerLocationGranularity': 'city',
'totalResults': 20,
'suggestedBounds': {'ne': {'lat': 40.67401708586377,
'lng': -74.29300815204098},
'sw': {'lat': 40.65601706786374, 'lng': -74.31669390523408}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4c13c8d2b7b9c928d127aa37',
'name': 'Cranford Canoe Club',
'location': {'address': '250 Springfield Ave',
'crossStreet': 'Orange Avenue',
'lat': 40.66022488705574,
'lng': -74.3061084180977,
'labeledLatLngs': [{'label': 'display',
'lat': 40.66022488705574,
'lng': -74.3061084180977},
{'label': 'entrance', 'lat': 40.660264, 'lng': -74.306191}],
'distance': 543,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['250 Springfield Ave (Orange Avenue)',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4f4528bc4b90abdf24c9de85',
'name': 'Athletics & Sports',
'pluralName': 'Athletics & Sports',
'shortName': 'Athletics & Sports',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/sports_outdoors_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []},
'venuePage': {'id': '60380091'}},
'referralId': 'e-0-4c13c8d2b7b9c928d127aa37-0'},
{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4d965995e07ea35d07e2bd02',
'name': 'Mizu Sushi',
'location': {'address': '103 Union Ave.',
'lat': 40.65664427772896,
'lng': -74.30343966195308,
'labeledLatLngs': [{'label': 'display',
'lat': 40.65664427772896,
'lng': -74.30343966195308}],
'distance': 939,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['103 Union Ave.',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4bf58dd8d48988d1d2941735',
'name': 'Sushi Restaurant',
'pluralName': 'Sushi Restaurants',
'shortName': 'Sushi',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-4d965995e07ea35d07e2bd02-1'}]}]}}
Look more closely at response that you're getting - there's no "venues" key there. Closest one that I see is "groups" list, which has "items" list in it, and individual items have "venue" key in them.
{'meta': {'code': 200, 'requestId': '5e7c703bb9a389001b7d1e8c'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Lagos',
'headerFullLocation': 'Lagos',
'headerLocationGranularity': 'city',
'totalResults': 39,
'suggestedBounds': {'ne': {'lat': 6.655478745000045,
'lng': 3.355524537252914},
'sw': {'lat': 6.565478654999954, 'lng': 3.2650912627470863}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '502806dce4b0f23b021f3b77',
'name': 'KFC',
'location': {'lat': 6.604589745106469,
'lng': 3.3089358809010045,
'labeledLatLngs': [{'label': 'display',
'lat': 6.604589745106469,
'lng': 3.3089358809010045}],
'distance': 672,
'cc': 'NG',
'city': 'Egbeda',
'state': 'Lagos',
'country': 'Nigeria',
'formattedAddress': ['Egbeda', 'Lagos', 'Nigeria']},
'categories': [{'id': '4bf58dd8d48988d16e941735',
'name': 'Fast Food Restaurant',
'pluralName': 'Fast Food Restaurants',
'shortName': 'Fast Food',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-502806dce4b0f23b021f3b77-0'},
That is a part of my file about called 'results'
I then
def getCAT(row):
try:
categories_list=row['categories']
except:
categories_list=row['venue.categories']
if len(categories_list)==0:
return None
else:
return categories_list[0]['name']
venues=results['response']['groups'][0]['items']
nearby_venues=pd.json_normalize(venues)
filtered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']
nearby_venues= nearby_venues.loc[: , filtered_cols]
nearby_venues['venue.categories']=nearby_venues.apply(getCAT, axis=1)
nearby_venues.columns=[col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()
I get KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file.
if I comment out that part, it runs well but with limited result. What am I doing wrong?
pandas.DataFrame.loc
property DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
Try to remove the venue. from the line iltered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']
I have a CSV file, which contains couple of columns. For Example :
FName,LName,Address1,City,Country,Phone,Email
Matt,Shew,"503, Avenue Park",Auckland,NZ,19809224478,matt#xxx.com
Patt,Smith,"503, Baker Street
Mickey Park
Suite 510",Austraila,AZ,19807824478,patt#xxx.com
Doug,Stew,"12, Main St.
21st Lane
Suit 290",Chicago,US,19809224478,doug#xxx.com
Henry,Mark,"88, Washington Park",NY,US,19809224478,matt#xxx.com
In excel it looks something likes this :
It's a usual human tendency to feed/copy-paste address in the particular manner, usually sometimes people copy their signature and paste it to the Address column which creates such situation.
I have tried reading this using Python CSV module and it looks like that python doesn't distinguish between the '\n' Newline between the field values and the end of line.
My code :
import csv
with open(file_path, 'r') as f_obj:
input_data = []
reader = csv.DictReader(f_obj)
for row in reader:
print row
The output looks somethings like this :
{'City': 'Auckland', 'Address1': '503, Avenue Park', 'LName': 'Shew', 'Phone': '19809224478', 'FName': 'Matt', 'Country': 'NZ', 'Email': 'matt#xxx.com'}
{'City': 'Austraila', 'Address1': '503, Baker Street\nMickey Park\nSuite 510', 'LName': 'Smith', 'Phone': '19807824478', 'FName': 'Patt', 'Country': 'AZ', 'Email': 'patt#xxx.com'}
{'City': 'Chicago', 'Address1': '12, Main St. \n21st Lane \nSuit 290', 'LName': 'Stew', 'Phone': '19809224478', 'FName': 'Doug', 'Country': 'US', 'Email': 'doug#xxx.com'}
{'City': 'NY', 'Address1': '88, Washington Park', 'LName': 'Mark', 'Phone': '19809224478', 'FName': 'Henry', 'Country': 'US', 'Email': 'matt#xxx.com'}
I just wanted to write the same content to a file where all the values for a Address1 keys should not have '\n' character and looks like :
{'City': 'Auckland', 'Address1': '503, Avenue Park', 'LName': 'Shew', 'Phone': '19809224478', 'FName': 'Matt', 'Country': 'NZ', 'Email': 'matt#xxx.com'}
{'City': 'Austraila', 'Address1': '503, Baker Street Mickey Park Suite 510', 'LName': 'Smith', 'Phone': '19807824478', 'FName': 'Patt', 'Country': 'AZ', 'Email': 'patt#xxx.com'}
{'City': 'Chicago', 'Address1': '12, Main St. 21st Lane Suit 290', 'LName': 'Stew', 'Phone': '19809224478', 'FName': 'Doug', 'Country': 'US', 'Email': 'doug#xxx.com'}
{'City': 'NY', 'Address1': '88, Washington Park', 'LName': 'Mark', 'Phone': '19809224478', 'FName': 'Henry', 'Country': 'US', 'Email': 'matt#xxx.com'}
Any suggestions guys ???
PS:
I have more than 100K such records in my csv file !!!
You can replace the print row with a dict comprehsion that replaces newlines in the values:
row = {k: v.replace('\n', ' ') for k, v in row.iteritems()}
print row