Nested Python Object to CSV - python

I looked up "nested dict" and "nested list" but either method work.
I have a python object with the following structure:
[{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
}]
what i need to output is a csv file in the following, flattened structure:
id, productname, variantid, size, currency, price
productID1, productname A, variantID1, M, USD, 1
productID1, productname A, variantID2, L, USD, 2
productID2, productname A, variantID3, XL, USD, 3
i tried this solution: Python: Writing Nested Dictionary to CSV
or this one: From Nested Dictionary to CSV File
i got rid of the [] around and within the data and e.g. i used this code snippet from 2 and adapted it to my needs. IRL i can't get rid of the [] because that's simple the format i get when calling the API.
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output:
writer = csv.writer(output, delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
for key in sorted(data):
value = data[key]
if len(value) > 0:
writer.writerow([key, value])
else:
for i in value:
writer.writerow([key, i, value])
but the output is like this:
"id";"productID1"
"name";"productname A"
"option";"{'size': {'type': 'list', 'name': 'size', 'choices': {'value': 'M'}}}"
"variant";"{'id': 'variantID1', 'choices': {'size': 'M'}, 'attributes': {'currency': 'USD', 'price': 1}}"
anyone can help me out, please?
thanks in advance

list indices must be integers not strings
The following presents a visual example of a python list:
0 carrot.
1 broccoli.
2 asparagus.
3 cauliflower.
4 corn.
5 cucumber.
6 eggplant.
7 bell pepper
0, 1, 2 are all "indices".
"carrot", "broccoli", etc... are all said to be "values"
Essentially, a python list is a machine which has integer inputs and arbitrary outputs.
Think of a python list as a black-box:
A number, such as 5, goes into the box.
you turn a crank handle attached to the box.
Maybe the string "cucumber" comes out of the box
You got an error: TypeError: list indices must be integers or slices, not str
There are various solutions.
Convert Strings into Integers
Convert the string into an integer.
listy_the_list = ["carrot", "broccoli", "asparagus", "cauliflower"]
string_index = "2"
integer_index = int(string_index)
element = listy_the_list[integer_index]
so yeah.... that works as long as your string-indicies look like numbers (e.g. "456" or "7")
The integer class constructor, int(), is not very smart.
For example, x = int("3 ") will produce an error.
You can try x = int(strying.strip()) to get rid of leading and trailing white-space characters.
Use a Container which Allows Keys to be Strings
Long ago, before before electronic computers existed, there were various types of containers in the world:
cookie jars
muffin tins
carboard boxes
glass jars
steel cans.
back-packs
duffel bags
closets/wardrobes
brief-cases
In computer programming there are also various types of "containers"
You do not have to use a list as your container, if you do not want to.
There are containers where the keys (AKA indices) are allowed to be strings, instead of integers.
In python, the standard container which like a list, but where the keys/indices can be strings, is a dictionary
thisdict = {
"make": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["brand"] == "Ford"
If you want to index into a container using strings, instead of integers, then use a dict, instead of a list
The following is an example of a python dict which has state names as input and state abreviations as output:
us_state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}

i could actually iterate this list and create my own sublist, e.g. e list of variants
data = [{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
},
{'id': 'productID2', 'name': 'productname B',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'XL', 'salue':'XXL'},
]}},
'variant': [{
'id': 'variantID2',
'choices':
{'size': 'XL', 'size2':'XXL'},
'attributes':
{'currency': 'USD', 'price': 2}}]
}
]
new_list = {}
for item in data:
new_list.update(id=item['id'])
new_list.update (name=item['name'])
for variant in item['variant']:
new_list.update (varid=variant['id'])
for vchoice in variant['choices']:
new_list.update (vsize=variant['choices'][vchoice])
for attribute in variant['attributes']:
new_list.update (vprice=variant['attributes'][attribute])
for option in item['option']['size']['choices']:
new_list.update (osize=option['value'])
print (new_list)
but the output is always the last item of the iteration, because i always overwrite new_list with update().
{'id': 'productID2', 'name': 'productname B', 'varid': 'variantID2', 'vsize': 'XXL', 'vprice': 2, 'osize': 'XL'}

here's the final solution which worked for me:
data = [{
'id': 'productID1', 'name': 'productname A',
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}},
{'id':'variantID2',
'choices':
{'size': 'L'},
'attributes':
{'currency':'USD', 'price':2}}
]
},
{
'id': 'productID2', 'name': 'productname B',
'variant': [{
'id': 'variantID3',
'choices':
{'size': 'XL'},
'attributes':
{'currency': 'USD', 'price': 3}},
{'id':'variantID4',
'choices':
{'size': 'XXL'},
'attributes':
{'currency':'USD', 'price':4}}
]
}
]
for item in data:
for variant in item['variant']:
dic = {}
dic.update (ProductID=item['id'])
dic.update (Name=item['name'].title())
dic.update (ID=variant['id'])
dic.update (size=variant['choices']['size'])
dic.update (Price=variant['attributes']['price'])
products.append(dic)
keys = products[0].keys()
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output_file:
dict_writer = csv.DictWriter(output_file, keys,delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
dict_writer.writeheader()
dict_writer.writerows(products)
with the following output:
"ProductID";"Name";"ID";"size";"Price"
"productID1";"Productname A";"variantID1";"M";1
"productID1";"Productname A";"variantID2";"L";2
"productID2";"Productname B";"variantID3";"XL";3
"productID2";"Productname B";"variantID4";"XXL";4
which is exactly what i wanted.

Related

How do I iterate a nested dictionary with string formatting?

I checked a few other posts and either they didn't contain the information I need or I didn't understand them. I want to make this program print the sentence for every entry in the nested dictionary, and maybe also make a function to do this as well (not familiar with these yet).
I know it will use a for loop but what I can't figure out is how to configure the keys(?).
people = {
1: {
'name': 'David Wallace',
'age': 50,
'occupation': 'CFO',
'ethnicity': 'American',
'location': 'New York'
},
2: {
'name': 'Michael',
'age': 42,
'occupation': 'Regional Manager',
'ethnicity': 'American',
'location': 'Scranton, Pennsylvania'
},
3: {
'name': 'Jim',
'age': 27,
'occupation': 'Sales Rep',
'ethnicity': 'American',
'location': 'Scranton, Pennsylvania'
}
}
print('{name} is a {age} year-old {ethnicity} {occupation} from {location}.'.format(**people))
You're really treating the top-level dict more like a list, so you can write a for loop traversing the top-level like so:
people = {
1: {
'name': 'David Wallace',
'age': 50,
'occupation': 'CFO',
'ethnicity': 'American',
'location': 'New York'
},
2: {
'name': 'Michael',
'age': 42,
'occupation': 'Regional Manager',
'ethnicity': 'American',
'location': 'Scranton, Pennsylvania'
},
3: {
'name': 'Jim',
'age': 27,
'occupation': 'Sales Rep',
'ethnicity': 'American',
'location': 'Scranton, Pennsylvania'
}
}
for person in people.values():
print('{name} is a {age} year-old {ethnicity} {occupation} from {location}.'.format(**person))
The full reference for Python dictionaries is here: https://docs.python.org/3/library/stdtypes.html#dict.items
Edit: Thanks to user Chris Charley for the suggestion to use people.values() instead of people.items()

Getting a KeyError: venues error in FourSquare/Python call

OK, I'm a newbie and I think I'm doing everything I should be, but I am still getting a KeyError: venues. (I also tried using "venue" instead and I am not at my maximum quota for the day at FourSquare)... I am using a Jupyter Notebook to do this
Using this code:
VERSION = '20200418'
RADIUS = 1000
LIMIT = 2
**url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
url
results = requests.get(url).json()**
I get 2 results (shown at end of this post)
When I try to take those results and put them into a dataframe, i get "KeyError: venues"
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-5acf500bf9ad> in <module>
1 # assign relevant part of JSON to venues
----> 2 venues = results['response']['venues']
3
4 # tranform venues into a dataframe
5 dataframe = json_normalize(venues)
KeyError: 'venues'
I'm not really sure where I am going wrong... This has worked for me with other locations... But then again, like I said, I'm new at this... (I haven't maxed out my queries, and I've tried using "venue" instead)... Thank you
FourSquareResults:
{'meta': {'code': 200, 'requestId': '5ec42de01a4b0a001baa10ff'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'warning': {'text': "There aren't a lot of results near you. Try something more general, reset your filters, or expand the search area."},
'headerLocation': 'Cranford',
'headerFullLocation': 'Cranford',
'headerLocationGranularity': 'city',
'totalResults': 20,
'suggestedBounds': {'ne': {'lat': 40.67401708586377,
'lng': -74.29300815204098},
'sw': {'lat': 40.65601706786374, 'lng': -74.31669390523408}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4c13c8d2b7b9c928d127aa37',
'name': 'Cranford Canoe Club',
'location': {'address': '250 Springfield Ave',
'crossStreet': 'Orange Avenue',
'lat': 40.66022488705574,
'lng': -74.3061084180977,
'labeledLatLngs': [{'label': 'display',
'lat': 40.66022488705574,
'lng': -74.3061084180977},
{'label': 'entrance', 'lat': 40.660264, 'lng': -74.306191}],
'distance': 543,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['250 Springfield Ave (Orange Avenue)',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4f4528bc4b90abdf24c9de85',
'name': 'Athletics & Sports',
'pluralName': 'Athletics & Sports',
'shortName': 'Athletics & Sports',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/sports_outdoors_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []},
'venuePage': {'id': '60380091'}},
'referralId': 'e-0-4c13c8d2b7b9c928d127aa37-0'},
{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4d965995e07ea35d07e2bd02',
'name': 'Mizu Sushi',
'location': {'address': '103 Union Ave.',
'lat': 40.65664427772896,
'lng': -74.30343966195308,
'labeledLatLngs': [{'label': 'display',
'lat': 40.65664427772896,
'lng': -74.30343966195308}],
'distance': 939,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['103 Union Ave.',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4bf58dd8d48988d1d2941735',
'name': 'Sushi Restaurant',
'pluralName': 'Sushi Restaurants',
'shortName': 'Sushi',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-4d965995e07ea35d07e2bd02-1'}]}]}}
Look more closely at response that you're getting - there's no "venues" key there. Closest one that I see is "groups" list, which has "items" list in it, and individual items have "venue" key in them.

Get value from data-set field sublist

I have a dataset (that pull its data from a dict) that I am attempting to clean and republish. Within this data set, there is a field with a sublist that I would like to extract specific data from.
Here's the data:
[{'id': 'oH58h122Jpv47pqXhL9p_Q', 'alias': 'original-pizza-brooklyn-4', 'name': 'Original Pizza', 'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/HVT0Vr_Vh52R_niODyPzCQ/o.jpg', 'is_closed': False, 'url': 'https://www.yelp.com/biz/original-pizza-brooklyn-4?adjust_creative=IelPnWlrTpzPtN2YRie19A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=IelPnWlrTpzPtN2YRie19A', 'review_count': 102, 'categories': [{'alias': 'pizza', 'title': 'Pizza'}], 'rating': 4.0, 'coordinates': {'latitude': 40.63781, 'longitude': -73.8963799}, 'transactions': [], 'price': '$', 'location': {'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}, 'phone': '+17185313559', 'display_phone': '(718) 531-3559', 'distance': 319.98144420799355},
Here's how the data is presented within the csv/spreadsheet:
location
{'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}
Is there a way to pull location.city for example?
The below code simply adds a few fields and exports it to a csv.
def data_set(data):
df = pd.DataFrame(data)
df['zip'] = get_zip()
df['region'] = get_region()
newdf = df.filter(['name', 'phone', 'location', 'zip', 'region', 'coordinates', 'rating', 'review_count',
'categories', 'url'], axis=1)
if not os.path.isfile('yelp_data.csv'):
newdf.to_csv('data.csv', header='column_names')
else: # else it exists so append without writing the header
newdf.to_csv('data.csv', mode='a', header=False)
If that doesn't make sense, please let me know. Thanks in advance!

Add two dictionaries into a json

I am trying to write two dictionaries into a JSON one after another in Python.
I have made two dictionaries which look like ---
dictionary_quant =
{'dmin': [0.003163, 14.325], 'magNst': [0.0, 414.0], 'horizontalError': [0.12, 12.9], 'nst': [3.0, 96.0], 'depth': [-3.09, 581.37], 'latitude': [-43.3468, 67.1524], 'rms': [0.0, 1.49], 'depthError': [0.0, 32.0], 'magError': [0.0, 1.34], 'mag': [-0.57, 6.9], 'gap': [18.0, 342.0], 'longitude': [-179.8024, 179.3064]}
dictionary_categorical =
{'magType': ['ml', 'md', 'mb', 'mb_lg', 'mwr', 'Md', 'mwb', nan, 'mww'], 'net': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismpkansas', 'hv', 'uu'], 'type': ['earthquake', 'explosion'], 'status': ['reviewed', 'automatic'], 'locationSource': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismp', 'hv', 'uu', 'ott', 'guc'], 'magSource': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismp', 'hv', 'uu', 'ott', 'guc']}
I am trying to write a json which looks like --
data = [
{
'name' : 'dmin',
'type' : 'quant',
'minmax' : [0.003163, 14.325]
},
{
'name' : 'magNSt',
'type' : 'quant',
'minmax' : [0.0, 414.0]
},
{....},
{....},
{
'name' : 'magType',
'type' : 'categor',
'categories' : ['ml', 'md', 'mb', 'mb_lg', 'mwr', 'Md', 'mwb', nan, 'mww']
},
{
'name' : 'net',
'type' : 'categor',
'categories' : ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismpkansas', 'hv', 'uu']
}
]
Assuming that the exact output format can be flexible (see comment below), this can be done as follow.
import json
dictionary_quant = {'dmin': [0.003163, 14.325], 'magNst': [0.0, 414.0], 'horizontalError': [0.12, 12.9], 'nst': [3.0, 96.0], 'depth': [-3.09, 581.37], 'latitude': [-43.3468, 67.1524], 'rms': [0.0, 1.49], 'depthError': [0.0, 32.0], 'magError': [0.0, 1.34], 'mag': [-0.57, 6.9], 'gap': [18.0, 342.0], 'longitude': [-179.8024, 179.3064]}
# Replaced the undefined keyword / variable "nan" with None
dictionary_categorical = {'magType': ['ml', 'md', 'mb', 'mb_lg', 'mwr', 'Md', 'mwb', None, 'mww'], 'net': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismpkansas', 'hv', 'uu'], 'type': ['earthquake', 'explosion'], 'status': ['reviewed', 'automatic'], 'locationSource': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismp', 'hv', 'uu', 'ott', 'guc'], 'magSource': ['ci', 'nc', 'us', 'ak', 'mb', 'uw', 'nn', 'pr', 'se', 'nm', 'ismp', 'hv', 'uu', 'ott', 'guc']}
#Start with an empty data list
data = []
# Add each item in dictionary_quant with type set to "quant" and the
# value on key minmax
for k, v in dictionary_quant.items():
data.append({'type': 'quant',
'name': k,
'minmax': v})
# Add each item in dictionary_categorical with type set to "categor"
# and the value on key "categories"
for k, v in dictionary_categorical.items():
data.append({'type': 'categor',
'name': k,
'categories': v})
# Note: The json.dumps() function will output list attribute elements
# one-per-line when using indented output.
print(json.dumps(data, indent=4))
Assuming you know beforehand the type of each subsequent dictionary, you could do the following:
def format_data(data, data_type, value_name):
return [{'name': key, 'type': data_type, value_name: val} for key, val in data.items()]
where data is your dict, data_type is either quant or categor and value_name is either minmax or categories.
Then, combined that would be:
combined = format(dictionary_quant, 'quant', 'minmax') + format_data(dictionary_categorical, 'categor', 'categories')

Python Dictionary comprehension for a list of dictionaries

I want to create a dictionary from the following list
[{'fips': '01001', 'state': 'AL', 'name': 'Autauga County'}, {'fips': '20005', 'state': 'KS', 'name': 'Atchison County'}, {'fips': '47145', 'state': 'TN', 'name': 'Roane County'}]
The result should have the name as the key and 'United States' as the value.
eg:
{'Autauga County': 'United States', 'Atchison County' : 'United States', 'Roane County' : 'United States'}
I can do this with a couple of for loops but i want to learn how to do it using Dictionary Comprehensions.
in_list = [{'fips': '01001', 'state': 'AL', 'name': 'Autauga County'},
{'fips': '20005', 'state': 'KS', 'name': 'Atchison County'},
{'fips': '47145', 'state': 'TN', 'name': 'Roane County'}]
out_dict = {x['name']: 'United States' for x in in_list if 'name' in x}
Some notes for learning:
Comprehensions are only for Python 2.7 onwards
Dictionary comprehensions are very similar to list comprehensions except with curly braces {} (and keys)
In case you didn't know, you can also add more complicated control-flow after the for loop in a comprehension such as [x for x in some_list if (cond)]
For completeness, if you can't use comprehensions, try this
out_dict = {}
for dict_item in in_list:
if not isinstance(dict_item, dict):
continue
if 'name' in dict_item:
in_name = dict_item['name']
out_dict[in_name] = 'United States'
As mentioned in the comments, for Python 2.6 you can replace the {k: v for k,v in iterator} with:
dict((k,v) for k,v in iterator)
You can read more about this in this question
Happy Coding!
Here's a little solution working for both python2.7.x and python 3.x:
data = [
{'fips': '01001', 'state': 'AL', 'name': 'Autauga County'},
{'fips': '20005', 'state': 'KS', 'name': 'Atchison County'},
{'fips': '47145', 'state': 'TN', 'name': 'Roane County'},
{'fips': 'xxx', 'state': 'yyy'}
]
output = {item['name']: 'United States' for item in data if 'name' in item}
print(output)
The loop/generator version is:
location_list = [{'fips': '01001', 'state': 'AL', 'name': 'Autauga County'},
{'fips': '20005', 'state': 'KS', 'name': 'Atchison County'},
{'fips': '47145', 'state': 'TN', 'name': 'Roane County'}]
location_dict = {location['name']:'United States' for location in location_list}
Output:
{'Autauga County': 'United States', 'Roane County': 'United States',
'Atchison County': 'United States'}
If you search on Stackoverflow for dictionary comprehension, solutions using the { } generator expression start to show up: Python Dictionary Comprehension
That should do the trick for you
states_dict = [{'fips': '01001', 'state': 'AL', 'name': 'Autauga County'}, {'fips': '20005', 'state': 'KS', 'name': 'Atchison County'}, {'fips': '47145', 'state': 'TN', 'name': 'Roane County'}]
{states_dict[i]['name']:'United States' for i, elem in enumerate(states_dict)}

Categories

Resources