I'm building API for my app in Python and Flask. I'm trying to output some data in JSON format, however I get weirdly formatted data from my sql query. I would like to see numbers (i.e. field initial_price) as number, not as a Decimal('3.99'), similarly with id and timestamp format.
Also, is it the right way to produce JSON?
This is output from my API:
$ curl 127.0.0.1:5000/1/product?code="9571%2F702"
[{'update_time': datetime.datetime(2013, 1, 7, 22, 25, 50), 'code': '9571/702', 'description': '', 'gender': '', 'brand': 'Zara', 'initial_price': Decimal('3.99'), 'image_link': 'http://static.zara.net/photos//2012/I/0/3/p/9571/702/401/9571702401_1_1_3.jpg', 'currency': 'GBP', 'colors': 'NAVY', 'link': 'http://www.zara.com/webapp/wcs/stores/servlet/product/uk/en/zara-neu-W2012-s/341501/883021/', 'current_price': Decimal('990.00'), 'original_category': 'Girl (2-14 years)', 'id': 1623L, 'name': '"I LOVE ..." T-SHIRT'},
{'update_time': datetime.datetime(2013, 1, 7, 22, 25, 50), 'code': '9571/702', 'description': '', 'gender': '', 'brand': 'Zara', 'initial_price': Decimal('3.99'), 'image_link': 'http://static.zara.net/photos//2012/I/0/3/p/9571/702/401/9571702401_1_1_3.jpg', 'currency': 'GBP', 'colors': 'LIGHT', 'link': 'http://www.zara.com/webapp/wcs/stores/servlet/product/uk/en/zara-neu-W2012-s/341501/883021/', 'current_price': Decimal('990.00'), 'original_category': 'Girl (2-14 years)', 'id': 1624L, 'name': '"I LOVE ..." T-SHIRT'},
{'update_time': datetime.datetime(2013, 1, 7, 22, 25, 50), 'code': '9571/702', 'description': '', 'gender': '', 'brand': 'Zara', 'initial_price': Decimal('3.99'), 'image_link': 'http://static.zara.net/photos//2012/I/0/3/p/9571/702/401/9571702401_1_1_3.jpg', 'currency': 'GBP', 'colors': 'ECRU', 'link': 'http://www.zara.com/webapp/wcs/stores/servlet/product/uk/en/zara-neu-W2012-s/341501/883021/', 'current_price': Decimal('990.00'), 'original_category': 'Girl (2-14 years)', 'id': 1625L, 'name': '"I LOVE ..." T-SHIRT'}]
My code is as following:
from flask import Flask, url_for, session, redirect, escape, request
from subprocess import Popen, PIPE
import socket
import MySQLdb
import urllib
#app.route('/1/product')
def product_search():
[some not important stuff here...]
#creating list of codes with lowest Damerau-Levenshtein numbers
best_matching_codes = []
for k, v in lv:
if v == min:
best_matching_codes.append(k)
#returning JSON with best matching products info
products_json = []
for code in best_matching_codes:
cur = db.cursor()
query = "SELECT * FROM %s WHERE code LIKE '%s'" % (PRODUCTS_TABLE_NAME, product_code)
cur.execute(query)
columns = [desc[0] for desc in cur.description]
rows = cur.fetchall()
for row in rows:
products_json.append(dict((k,v) for k,v in zip(columns,row)))
return str(products_json)
Use the json module. It will output proper json from a dict:
import json
return json.dumps(products_json)
Using str does not produce valid json!
You need to use a json library to produce json.
Add to top:
import json
Change last line to:
return json.dumps(products_json)
Related
I have 70k files all of which look similar to this:
{'id': 24, 'name': None, 'city': 'City', 'region_id': 19,
'story_id': 1, 'description': 'text', 'uik': None, 'ustatus': 'status',
'wuiki_tik_name': '', 'reaction': None, 'reaction_official': '',
'created_at': '2011-09-07T07:24:44.420Z', 'lat': 54.7, 'lng': 20.5,
'regions': {'id': 19, 'name': 'name'}, 'stories': {'id': 1, 'name': '2011-12-04'}, 'assets': [], 'taggings': [{'tags': {'id': 6, 'name': 'name',
'tag_groups': {'id': 3, 'name': 'Violation'}}},
{'tags': {'id': 8, 'name': 'name', 'tag_groups': {'id': 5, 'name': 'resource'}}},
{'tags': {'id': 1, 'name': '01. Federal', 'tag_groups': {'id': 1, 'name': 'Level'}}},
{'tags': {'id': 3, 'name': '03. Local', 'tag_groups': {'id': 1, 'name': 'stuff'}}},
{'tags': {'id': 2, 'name': '02. Regional', 'tag_groups':
{'id': 1, 'name': 'Level'}}}], 'message_id': None, '_count': {'assets': 0, 'other_messages': 0, 'similars': 0, 'taggings': 5}}
The ultimate goal is to export it into a single CSV file. It can be successfully done without flattening. But since it has a lot of nested values, I would like to flatten it, and this is where I began facing problems related to data types. Here's the code:
import json
from pandas.io.json import json_normalize
import glob
path = glob.glob("all_messages/*.json")
for file in path:
with open(file, "r") as filer:
content = json.loads(json.dumps(filer.read()))
if content != 404:
df_main = json_normalize(content)
df_regions = json_normalize(content, record_path=['regions'], record_prefix='regions.', meta=['id'])
df_stories = json_normalize(content, record_path=['stories'], record_prefix='stories.', meta=['id'])
#... More code related to normalization
df_out.to_csv('combined_json.csv')
This code occasionally throws:
AttributeError: 'str' object has no attribute 'values' or ValueError: DataFrame constructor not properly called!. I realise that this is caused by json.dumps() JSON string output. However, I have failed to turn it into anything useable.
Any possible solutions to this?
If you only need to change ' to ":
...
for file in path:
with open(file, "r") as filer:
filer.replace("\'", "\"")
...
Making copies and using grep would be easier
While it is not the solution I was initially expecting, this approach worked as well. I kept getting error messages related to the structure of the dict literals that were reluctant to become json, so I took the csv file that I wanted to normalise and worked with each column one by one:
df = pd.read_csv("combined_json.csv")
df['regions'] = df['regions'].apply(lambda x: x.replace("'", '"'))
regions = pd.json_normalize(df['regions'].apply(json.loads).tolist()).rename(
columns=lambda x: x.replace('regions.', ''))
df['regions'] = regions['name']
Or, if it had more nested levels:
df['taggings'] = df['taggings'].apply(lambda x: x.replace("'", '"'))
taggings = pd.concat([pd.json_normalize(json.loads(j)) for j in df['taggings']])
df = df.reset_index(drop=True)
taggings = taggings.reset_index(drop=True)
df[['tags_id', 'nametag', 'group_tag', 'group_tag_name']] = taggings[['tags.id', 'tags.name', 'tags.tag_groups.id', 'tags.tag_groups.name']]
Which was eventually df.to_csv().
I'm making a call to an api which is returning a JSON response, whcih i am then trying to retrieve certain data from within the response.
{'data': {'9674': {'category': 'token',
'contract_address': [{'contract_address': '0x2a3bff78b79a009976eea096a51a948a3dc00e34',
'platform': {'coin': {'id': '1027',
'name': 'Ethereum',
'slug': 'ethereum',
'symbol': 'ETH'},
'name': 'Ethereum'}}],
'date_added': '2021-05-10T00:00:00.000Z',
'date_launched': '2021-05-10T00:00:00.000Z',
'description': 'Wilder World (WILD) is a cryptocurrency '
'launched in 2021and operates on the '
'Ethereum platform. Wilder World has a '
'current supply of 500,000,000 with '
'83,683,300.17 in circulation. The last '
'known price of Wilder World is 2.28165159 '
'USD and is down -6.79 over the last 24 '
'hours. It is currently trading on 21 active '
'market(s) with $2,851,332.76 traded over '
'the last 24 hours. More information can be '
'found at https://www.wilderworld.com/.',
'id': 9674,
'is_hidden': 0,
'logo': 'https://s2.coinmarketcap.com/static/img/coins/64x64/9674.png',
'name': 'Wilder World',
'notice': '',
'platform': {'id': 1027,
'name': 'Ethereum',
'slug': 'ethereum',
'symbol': 'ETH',
'token_address': '0x2a3bff78b79a009976eea096a51a948a3dc00e34'},
'self_reported_circulating_supply': 19000000,
'self_reported_tags': None,
'slug': 'wilder-world',
'subreddit': '',
'symbol': 'WILD',
'tag-groups': ['INDUSTRY',
'CATEGORY',
'INDUSTRY',
'CATEGORY',
'CATEGORY',
'CATEGORY',
'CATEGORY'],
'tag-names': ['VR/AR',
'Collectibles & NFTs',
'Gaming',
'Metaverse',
'Polkastarter',
'Animoca Brands Portfolio',
'SkyVision Capital Portfolio'],
'tags': ['vr-ar',
'collectibles-nfts',
'gaming',
'metaverse',
'polkastarter',
'animoca-brands-portfolio',
'skyvision-capital-portfolio'],
'twitter_username': 'WilderWorld',
'urls': {'announcement': [],
'chat': [],
'explorer': ['https://etherscan.io/token/0x2a3bff78b79a009976eea096a51a948a3dc00e34'],
'facebook': [],
'message_board': ['https://medium.com/#WilderWorld'],
'reddit': [],
'source_code': [],
'technical_doc': [],
'twitter': ['https://twitter.com/WilderWorld'],
'website': ['https://www.wilderworld.com/']}}},
'status': {'credit_count': 1,
'elapsed': 7,
'error_code': 0,
'error_message': None,
'notice': None,
'timestamp': '2022-01-20T21:33:04.832Z'}}
The data i am trying to get is 'logo': 'https://s2.coinmarketcap.com/static/img/coins/64x64/9674.png', but this sits within [data][9674][logo]
But as this script to running in the background for other objects, i won't know what the number [9674] is for other requests.
So is there a way to get that number automatically?
[data] will always be consistent.
Im using this to get the data back
session = Session()
session.headers.update(headers)
response = session.get(url, params=parameters)
pprint.pprint(json.loads(response.text)['data']['9674']['logo'])
You can try this:
session = Session()
session.headers.update(headers)
response = session.get(url, params=parameters)
resp = json.loads(response.text)
pprint.pprint(resp['data'][next(iter(resp['data']))]['logo'])
where next(iter(resp['data'])) - returns first key in resp['data'] dict. In your example it '9674'
With .keys() you get a List of all Keys in a Dictionary.
So you can use keys = json.loads(response.text)['data'].keys() to get the keys in the data-dict.
If you know there is always only one entry in 'data' you could use json.loads(response.text)['data'][keys[0]]['logo']. Otherwise you would need to iterate over all keys in the list and check which one you need.
I would like some direction on how i can access the data and do some modifications etc. for example accessing and listing only emails, etc please
import requests,json
api = "https://reqres.in/api/users?page=2"
test = requests.get(api)
x = test.json()
data_structure = []
data_structure.append(x)
print(data_structure)
Output
[{'page': 2, 'per_page': 6, 'total': 12, 'total_pages': 2, 'data': [{'id': 7, 'email': 'michael.lawson#reqres.in', 'first_name': 'Michael', 'last_name': 'Lawson', 'avatar': 'https://reqres.in/img/faces/7-image.jpg'}, {'id': 8, 'email': 'lindsay.ferguson#reqres.in', 'first_name': 'Lindsay', 'last_name': 'Ferguson', 'avatar': 'https://reqres.in/img/faces/8-image.jpg'}, {'id': 9, 'email': 'tobias.funke#reqres.in', 'first_name': 'Tobias', 'last_name': 'Funke', 'avatar': 'https://reqres.in/img/faces/9-image.jpg'}, {'id': 10, 'email': 'byron.fields#reqres.in', 'first_name': 'Byron', 'last_name': 'Fields', 'avatar': 'https://reqres.in/img/faces/10-image.jpg'}, {'id': 11, 'email': 'george.edwards#reqres.in', 'first_name': 'George', 'last_name': 'Edwards', 'avatar': 'https://reqres.in/img/faces/11-image.jpg'}, {'id': 12, 'email': 'rachel.howell#reqres.in', 'first_name': 'Rachel', 'last_name': 'Howell', 'avatar': 'https://reqres.in/img/faces/12-image.jpg'}], 'support': {'url': 'https://reqres.in/#support-heading', 'text': 'To keep ReqRes free, contributions towards server costs are appreciated!'}}]
First, I highly recommend you to install the JSON Viewer extension, which will help you a lot to see what's going on your API.
https://chrome.google.com/webstore/detail/json-viewer/gbmdgpbipfallnflgajpaliibnhdgobh?hl=es
Then, you don't need to create a new list, since the x = test.json() already outputs the same dictionary you brought from the API.
So your first chunk of code should look like this
import requests,json
api = "https://reqres.in/api/users?page=2"
test = requests.get(api)
x = test.json()
Then you can access all the data inside that dictionary, for example let's get all the emails.
To make it easier you should go to the api link using JSON Viewer to see the structure of your dictionary.
To access the emails we first need to access the "data" key on your dictionary
import requests,json
api = "https://reqres.in/api/users?page=2"
test = requests.get(api)
x = test.json()
data_list = x["data"]
After that you can see that data_list is a new list of dictionaries with all the data from each element on your API (in your case each id).
So finally, to access the emails, we need to iterate trough that list and get the "email" key from each dictionary on the list.
import requests,json
api = "https://reqres.in/api/users?page=2"
test = requests.get(api)
x = test.json()
data_list = x["data"]
for i in data_list:
print(i["email"])
And that my friend, is how you get info from an API, the same way you manipulate lists and dictionaries.
This is from an R guy.
I have this mess in a Pandas column: data['crew'].
array(["[{'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production', 'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor', 'profile_path': None}, {'credit_id': '56407fa89251417055000b58', 'department': 'Sound', 'gender': 0, 'id': 6745, 'job': 'Music Editor', 'name': 'Richard Henderson', 'profile_path': None}, {'credit_id': '5789212392514135d60025fd', 'department': 'Production', 'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production', 'name': 'Jeffrey Stott', 'profile_path': None}, {'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up', 'gender': 0, 'id': 23783, 'job': 'Makeup Artist', 'name': 'Heather Plott', 'profile_path': None}
It goes on for quite some time. Each new dict starts with a credit_id field. One sell can hold several dicts in an array.
Assume I want the names of all Casting directors, as shown in the first entry. I need to check check the job entry in every dict and, if it's Casting, grab what's in the name field and store it in my data frame in data['crew'].
I tried several strategies, then backed off and went for something simple.
Running the following shut me down, so I can't even access a simple field. How can I get this done in Pandas.
for row in data.head().iterrows():
if row['crew'].job == 'Casting':
print(row['crew'])
EDIT: Error Message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-138-aa6183fdf7ac> in <module>()
1 for row in data.head().iterrows():
----> 2 if row['crew'].job == 'Casting':
3 print(row['crew'])
TypeError: tuple indices must be integers or slices, not str
EDIT: Code used to get the array of dict (strings?) in the first place.
def convert_JSON(data_as_string):
try:
dict_representation = ast.literal_eval(data_as_string)
return dict_representation
except ValueError:
return []
data["crew"] = data["crew"].map(lambda x: sorted([d['name'] if d['job'] == 'Casting' else '' for d in convert_JSON(x)])).map(lambda x: ','.join(map(str, x))
To create a DataFrame from your sample data, write:
df = pd.DataFrame(data=[
{ 'credit_id': '54d5356ec3a3683ba0000039', 'department': 'Production',
'gender': 1, 'id': 494, 'job': 'Casting', 'name': 'Terri Taylor',
'profile_path': None},
{ 'credit_id': '56407fa89251417055000b58', 'department': 'Sound',
'gender': 0, 'id': 6745, 'job': 'Music Editor',
'name': 'Richard Henderson', 'profile_path': None},
{ 'credit_id': '5789212392514135d60025fd', 'department': 'Production',
'gender': 2, 'id': 9250, 'job': 'Executive In Charge Of Production',
'name': 'Jeffrey Stott', 'profile_path': None},
{ 'credit_id': '57892074c3a36835fa002886', 'department': 'Costume & Make-Up',
'gender': 0, 'id': 23783, 'job': 'Makeup Artist',
'name': 'Heather Plott', 'profile_path': None}])
Then you can get your data with a single instruction:
df[df.job == 'Casting'].name
The result is:
0 Terri Taylor
Name: name, dtype: object
The above result is Pandas Series object with names found.
In this case, 0 is the index value for the record found and
Terri Taylor is the name of (the only in your data) Casting Director.
Edit
If you want just a list (not Series), write:
df[df.job == 'Casting'].name.tolist()
The result is ['Terri Taylor'] - just a list.
I think, both my solutions should be quicker than "ordinary" loop
based on iterrows().
Checking the execution time, you may try also yet another solution:
df.query("job == 'Casting'").name.tolist()
==========
And as far as your code is concerned:
iterrows() returns each time a pair containing:
the key of the current row,
a named tuple - the content of this row.
So your loop should look something like:
for row in df.iterrows():
if row[1].job == 'Casting':
print(row[1]['name'])
You can not write row[1].name because it refers to the index value
(here we have a collision with default attributes of the named tuple).
I am trying to pull 'created' from the Monzo data I'm pulling.
I have made a call to the Monzo api with the following code:
from monzo.monzo import Monzo
client = Monzo(INSERT API KEY)
data = client.get_transactions("INSERT ACCOUNT NUMBER")
print (data)
and I can't quite get the data I need which looks like this:
d': 'merch_000094MPASVBf7xCdrZOz3', 'created': '2016-01-20T21: 26: 33.985Z', 'name': 'DelicedeFrance', 'logo': 'https: //mondo-logo-cache.appspot.com/twitter/deliceuk/?size=large', 'emoji': '🇫🇷', 'category': 'eating_out', 'online': False, 'atm': False, 'address': {'short_formatted': 'LiverpoolStreetStation,
LondonEC2M7PY', 'formatted': 'LiverpoolStreetStation,
LondonEC2M7PY,
UnitedKingdom', 'address': 'LiverpoolStreetStation', 'city': 'London', 'region': 'GreaterLondon', 'country': 'GBR', 'postcode': 'EC2M7PY', 'latitude': 51.518159172221615, 'longitude': -0.08210659649555102, 'zoom_level': 17, 'approximate': False}, 'updated': '2016-02-02T14: 10: 48.664Z', 'metadata': {'foursquare_category': 'Restaurant', 'foursquare_category_icon': 'https: //ss3.4sqi.net/img/categories_v2/food/default_88.png','foursquare_website': '', 'google_places_icon': 'https: //maps.gstatic.com/mapfiles/place_api/icons/restaurant-71.png', 'google_places_name': 'DelicedeFrance', 'suggested_name': 'DelicedeFrance', 'suggested_tags': '#food', 'twitter_id': ''}, 'disable_feedback': False}, 'notes': '', 'metadata': {}, 'account_balance': 3112, 'attachments': [], 'category': 'eating_out', 'is_load': False, 'settled': '2017-04-28T04: 54: 18.167Z', 'local_amount': -199, 'local_currency': 'GBP', 'updated': '2017-04-28T06: 15: 06.095Z', 'counterparty': {}, 'originator': False, 'include_in_spending': True}, {'created': '2017-04-28T08: 54: 10.917Z','amount': -130, 'currency': 'GBP', 'merchant': {'created': '2016-04-21T08: 02: 13.537Z','logo': 'https: //mondo-logo-cache.appspot.com/twitter/MCSaatchiLondon/?size=large', 'emoji': '🍲', 'category': 'eating_out', 'online': False, 'atm': False...
How do I pull the 'created' date?
Try this:
#!/usr/bin/env python
import csv
from pymonzo import MonzoAPI
if __name__ == '__main__':
monzo_api = MonzoAPI()
monzo_transactions = monzo_api.transactions()
with open('monzo_transactions.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for transaction in monzo_transactions:
writer.writerow([
transaction.amount, transaction.description,
transaction.created,
])
print('All done!')
If this is actually right json code and you just have paste errors, than you can use the python libary json:
import json
data = json.loads(datastring)
If this not json code, you probably have to write a parser on your own.