I have been trying to normalize a very nested json file I will later analyze. What I am struggling with is how to go more than one level deep to normalize.
I went through the pandas.io.json.json_normalize documentation, since it does exactly what I want it to do.
I have been able to normalize part of it and now understand how dictionaries work, but I am still not there.
With below code I am able to get only the first level.
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('authors_sample.json') as f:
d = json.load(f)
raw = json_normalize(d['hits']['hits'])
authors = json_normalize(data = d['hits']['hits'],
record_path = '_source',
meta = ['_id', ['_source', 'journal'], ['_source', 'title'],
['_source', 'normalized_venue_name']
])
I am trying to 'dig' into the 'authors' dictionary with below code, but the record_path = ['_source', 'authors'] throws me TypeError: string indices must be integers. As far as I understand json_normalize the logic should be good, but I still don't quite understand how to dive into a json with dict vs list.
I even went through this simple example.
authors = json_normalize(data = d['hits']['hits'],
record_path = ['_source', 'authors'],
meta = ['_id', ['_source', 'journal'], ['_source', 'title'],
['_source', 'normalized_venue_name']
])
Below is a chunk of the json file (5 records).
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
u'hits': {u'hits': [{u'_id': u'7CB3F2AD',
u'_index': u'scibase_listings',
u'_score': 1.0,
u'_source': {u'authors': None,
u'deleted': 0,
u'description': None,
u'doi': u'',
u'is_valid': 1,
u'issue': None,
u'journal': u'Physical Review Letters',
u'link': None,
u'meta_description': None,
u'meta_keywords': None,
u'normalized_venue_name': u'phys rev lett',
u'pages': None,
u'parent_keywords': [u'Chromatography',
u'Quantum mechanics',
u'Particle physics',
u'Quantum field theory',
u'Analytical chemistry',
u'Quantum chromodynamics',
u'Physics',
u'Mass spectrometry',
u'Chemistry'],
u'pub_date': u'1987-03-02 00:00:00',
u'pubtype': None,
u'rating_avg_weighted': 0,
u'rating_clarity': 0.0,
u'rating_clarity_weighted': 0.0,
u'rating_innovation': 0.0,
u'rating_innovation_weighted': 0.0,
u'rating_num_weighted': 0,
u'rating_reproducability': 0,
u'rating_reproducibility_weighted': 0.0,
u'rating_versatility': 0.0,
u'rating_versatility_weighted': 0.0,
u'review_count': 0,
u'tag': [u'mass spectra', u'elementary particles', u'bound states'],
u'title': u'Evidence for a new meson: A quasinuclear NN-bar bound state',
u'userAvg': 0.0,
u'user_id': None,
u'venue_name': u'Physical Review Letters',
u'views_count': 0,
u'volume': None},
u'_type': u'listing'},
{u'_id': u'7AF8EBC3',
u'_index': u'scibase_listings',
u'_score': 1.0,
u'_source': {u'authors': [{u'affiliations': [u'Punjabi University'],
u'author_id': u'780E3459',
u'author_name': u'munish puri'},
{u'affiliations': [u'Punjabi University'],
u'author_id': u'48D92C79',
u'author_name': u'rajesh dhaliwal'},
{u'affiliations': [u'Punjabi University'],
u'author_id': u'7D9BD37C',
u'author_name': u'r s singh'}],
u'deleted': 0,
u'description': None,
u'doi': u'',
u'is_valid': 1,
u'issue': None,
u'journal': u'Journal of Industrial Microbiology & Biotechnology',
u'link': None,
u'meta_description': None,
u'meta_keywords': None,
u'normalized_venue_name': u'j ind microbiol biotechnol',
u'pages': None,
u'parent_keywords': [u'Nuclear medicine',
u'Psychology',
u'Hydrology',
u'Chromatography',
u'X-ray crystallography',
u'Nuclear fusion',
u'Medicine',
u'Fluid dynamics',
u'Thermodynamics',
u'Physics',
u'Gas chromatography',
u'Radiobiology',
u'Engineering',
u'Organic chemistry',
u'High-performance liquid chromatography',
u'Chemistry',
u'Organic synthesis',
u'Psychotherapist'],
u'pub_date': u'2008-04-04 00:00:00',
u'pubtype': None,
u'rating_avg_weighted': 0,
u'rating_clarity': 0.0,
u'rating_clarity_weighted': 0.0,
u'rating_innovation': 0.0,
u'rating_innovation_weighted': 0.0,
u'rating_num_weighted': 0,
u'rating_reproducability': 0,
u'rating_reproducibility_weighted': 0.0,
u'rating_versatility': 0.0,
u'rating_versatility_weighted': 0.0,
u'review_count': 0,
u'tag': [u'flow rate',
u'operant conditioning',
u'packed bed reactor',
u'immobilized enzyme',
u'specific activity'],
u'title': u'Development of a stable continuous flow immobilized enzyme reactor for the hydrolysis of inulin',
u'userAvg': 0.0,
u'user_id': None,
u'venue_name': u'Journal of Industrial Microbiology & Biotechnology',
u'views_count': 0,
u'volume': None},
u'_type': u'listing'},
{u'_id': u'7521A721',
u'_index': u'scibase_listings',
u'_score': 1.0,
u'_source': {u'authors': [{u'author_id': u'7FF872BC',
u'author_name': u'barbara eileen ryan'}],
u'deleted': 0,
u'description': None,
u'doi': u'',
u'is_valid': 1,
u'issue': None,
u'journal': u'The American Historical Review',
u'link': None,
u'meta_description': None,
u'meta_keywords': None,
u'normalized_venue_name': u'american historical review',
u'pages': None,
u'parent_keywords': [u'Social science',
u'Politics',
u'Sociology',
u'Law'],
u'pub_date': u'1992-01-01 00:00:00',
u'pubtype': None,
u'rating_avg_weighted': 0,
u'rating_clarity': 0.0,
u'rating_clarity_weighted': 0.0,
u'rating_innovation': 0.0,
u'rating_innovation_weighted': 0.0,
u'rating_num_weighted': 0,
u'rating_reproducability': 0,
u'rating_reproducibility_weighted': 0.0,
u'rating_versatility': 0.0,
u'rating_versatility_weighted': 0.0,
u'review_count': 0,
u'tag': [u'social movements'],
u'title': u"Feminism and the women's movement : dynamics of change in social movement ideology, and activism",
u'userAvg': 0.0,
u'user_id': None,
u'venue_name': u'The American Historical Review',
u'views_count': 0,
u'volume': None},
u'_type': u'listing'},
{u'_id': u'7DAEB9A4',
u'_index': u'scibase_listings',
u'_score': 1.0,
u'_source': {u'authors': [{u'author_id': u'0299B8E9',
u'author_name': u'fraser j harbutt'}],
u'deleted': 0,
u'description': None,
u'doi': u'',
u'is_valid': 1,
u'issue': None,
u'journal': u'The American Historical Review',
u'link': None,
u'meta_description': None,
u'meta_keywords': None,
u'normalized_venue_name': u'american historical review',
u'pages': None,
u'parent_keywords': [u'Superconductivity',
u'Nuclear fusion',
u'Geology',
u'Chemistry',
u'Metallurgy'],
u'pub_date': u'1988-01-01 00:00:00',
u'pubtype': None,
u'rating_avg_weighted': 0,
u'rating_clarity': 0.0,
u'rating_clarity_weighted': 0.0,
u'rating_innovation': 0.0,
u'rating_innovation_weighted': 0.0,
u'rating_num_weighted': 0,
u'rating_reproducability': 0,
u'rating_reproducibility_weighted': 0.0,
u'rating_versatility': 0.0,
u'rating_versatility_weighted': 0.0,
u'review_count': 0,
u'tag': [u'iron'],
u'title': u'The iron curtain : Churchill, America, and the origins of the Cold War',
u'userAvg': 0.0,
u'user_id': None,
u'venue_name': u'The American Historical Review',
u'views_count': 0,
u'volume': None},
u'_type': u'listing'},
{u'_id': u'7B3236C5',
u'_index': u'scibase_listings',
u'_score': 1.0,
u'_source': {u'authors': [{u'author_id': u'7DAB7B72',
u'author_name': u'richard m freeland'}],
u'deleted': 0,
u'description': None,
u'doi': u'',
u'is_valid': 1,
u'issue': None,
u'journal': u'The American Historical Review',
u'link': None,
u'meta_description': None,
u'meta_keywords': None,
u'normalized_venue_name': u'american historical review',
u'pages': None,
u'parent_keywords': [u'Political Science', u'Economics'],
u'pub_date': u'1985-01-01 00:00:00',
u'pubtype': None,
u'rating_avg_weighted': 0,
u'rating_clarity': 0.0,
u'rating_clarity_weighted': 0.0,
u'rating_innovation': 0.0,
u'rating_innovation_weighted': 0.0,
u'rating_num_weighted': 0,
u'rating_reproducability': 0,
u'rating_reproducibility_weighted': 0.0,
u'rating_versatility': 0.0,
u'rating_versatility_weighted': 0.0,
u'review_count': 0,
u'tag': [u'foreign policy'],
u'title': u'The Truman Doctrine and the origins of McCarthyism : foreign policy, domestic politics, and internal security, 1946-1948',
u'userAvg': 0.0,
u'user_id': None,
u'venue_name': u'The American Historical Review',
u'views_count': 0,
u'volume': None},
u'_type': u'listing'}],
u'max_score': 1.0,
u'total': 36429433},
u'timed_out': False,
u'took': 170}
In the pandas example (below) what do the brackets mean? Is there a logic to be followed to go deeper with the []. [...]
result = json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
Each string or list of strings in the ['state', 'shortname', ['info', 'governor']] value is a path to an element to include, in addition to the selected rows. The second argument json_normalize() argument (record_path, set to 'counties' in the documentation example) tells the function how to select elements from the input data structure that make up the rows in the output, and the meta paths adds further metadata that will be included with each of the rows. Think of these as table joins in a database, if you will.
The input for the US States documentation example has two dictionaries in a list, and both of these dictionaries have a counties key that references another list of dicts:
>>> data = [{'state': 'Florida',
... 'shortname': 'FL',
... 'info': {'governor': 'Rick Scott'},
... 'counties': [{'name': 'Dade', 'population': 12345},
... {'name': 'Broward', 'population': 40000},
... {'name': 'Palm Beach', 'population': 60000}]},
... {'state': 'Ohio',
... 'shortname': 'OH',
... 'info': {'governor': 'John Kasich'},
... 'counties': [{'name': 'Summit', 'population': 1234},
... {'name': 'Cuyahoga', 'population': 1337}]}]
>>> pprint(data[0]['counties'])
[{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]
>>> pprint(data[1]['counties'])
[{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]
Between them there are 5 rows of data to use in the output:
>>> json_normalize(data, 'counties')
name population
0 Dade 12345
1 Broward 40000
2 Palm Beach 60000
3 Summit 1234
4 Cuyahoga 1337
The meta argument then names some elements that live next to those counties lists, and those are then merged in separately. The values from the first data[0] dictionary for those meta elements are ('Florida', 'FL', 'Rick Scott'), respectively, and for data[1] the values are ('Ohio', 'OH', 'John Kasich'), so you see those values attached to the counties rows that came from the same top-level dictionary, repeated 3 and 2 times respectively:
>>> data[0]['state'], data[0]['shortname'], data[0]['info']['governor']
('Florida', 'FL', 'Rick Scott')
>>> data[1]['state'], data[1]['shortname'], data[1]['info']['governor']
('Ohio', 'OH', 'John Kasich')
>>> json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']])
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
So, if you pass in a list for the meta argument, then each element in the list is a separate path, and each of those separate paths identifies data to add to the rows in the output.
In your example JSON, there are only a few nested lists to elevate with the first argument, like 'counties' did in the example. The only example in that datastructure is the nested 'authors' key; you'd have to extract each ['_source', 'authors'] path, after which you can add other keys from the parent object to augment those rows.
The second meta argument then pulls in the _id key from the outermost objects, followed by the nested ['_source', 'title'] and ['_source', 'journal'] nested paths.
The record_path argument takes the authors lists as the starting point, these look like:
>>> d['hits']['hits'][0]['_source']['authors'] # this value is None, and is skipped
>>> d['hits']['hits'][1]['_source']['authors']
[{'affiliations': ['Punjabi University'],
'author_id': '780E3459',
'author_name': 'munish puri'},
{'affiliations': ['Punjabi University'],
'author_id': '48D92C79',
'author_name': 'rajesh dhaliwal'},
{'affiliations': ['Punjabi University'],
'author_id': '7D9BD37C',
'author_name': 'r s singh'}]
>>> d['hits']['hits'][2]['_source']['authors']
[{'author_id': '7FF872BC',
'author_name': 'barbara eileen ryan'}]
>>> # etc.
and so gives you the following rows:
>>> json_normalize(d['hits']['hits'], ['_source', 'authors'])
affiliations author_id author_name
0 [Punjabi University] 780E3459 munish puri
1 [Punjabi University] 48D92C79 rajesh dhaliwal
2 [Punjabi University] 7D9BD37C r s singh
3 NaN 7FF872BC barbara eileen ryan
4 NaN 0299B8E9 fraser j harbutt
5 NaN 7DAB7B72 richard m freeland
and then we can use the third meta argument to add more columns like _id, _source.title and _source.journal, using ['_id', ['_source', 'journal'], ['_source', 'title']]:
>>> json_normalize(
... data['hits']['hits'],
... ['_source', 'authors'],
... ['_id', ['_source', 'journal'], ['_source', 'title']]
... )
affiliations author_id author_name _id \
0 [Punjabi University] 780E3459 munish puri 7AF8EBC3
1 [Punjabi University] 48D92C79 rajesh dhaliwal 7AF8EBC3
2 [Punjabi University] 7D9BD37C r s singh 7AF8EBC3
3 NaN 7FF872BC barbara eileen ryan 7521A721
4 NaN 0299B8E9 fraser j harbutt 7DAEB9A4
5 NaN 7DAB7B72 richard m freeland 7B3236C5
_source.journal
0 Journal of Industrial Microbiology & Biotechno...
1 Journal of Industrial Microbiology & Biotechno...
2 Journal of Industrial Microbiology & Biotechno...
3 The American Historical Review
4 The American Historical Review
5 The American Historical Review
_source.title \
0 Development of a stable continuous flow immobi...
1 Development of a stable continuous flow immobi...
2 Development of a stable continuous flow immobi...
3 Feminism and the women's movement : dynamics o...
4 The iron curtain : Churchill, America, and the...
5 The Truman Doctrine and the origins of McCarth...
You can also have a look at the library flatten_json, which does not require you to write column hierarchies as in json_normalize:
from flatten_json import flatten
data = d['hits']['hits']
dict_flattened = (flatten(record, '.') for record in data)
df = pd.DataFrame(dict_flattened)
print(df)
See https://github.com/amirziai/flatten.
Adding to Sanders comment,
more context can be found here as the creator of this function has a medium blog:
https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
It is worth keeping in mind that panda's json_normalize can handle most json objects, like arrays for example. The flatten_json library requires it to be a nested dict. However, you can work around this requirement by adding the array to a dict like so:
flatten({'response':data}, '.')
In this case, the flatten_json library will actually use a counter in the dot notation to distinguished against duplicates. For example:
flatten({
'response': [
{'metrics': {'clicks': '0', 'cost_micros': '0', 'impressions': '3'},
'segments': {'date': '2022-12-01'}},
{'metrics': {'clicks': '1', 'cost_micros': '609240', 'impressions': '358'},
'segments': {'date': '2022-12-01'}},
{'metrics': {'clicks': '0', 'cost_micros': '0', 'impressions': '3'},
'segments': {'date': '2022-12-02'}},
{'metrics': {'clicks': '2', 'cost_micros': '40000', 'impressions': '291'},
'segments': {'date': '2022-12-02'}},
{'metrics': {'clicks': '0', 'cost_micros': '0', 'impressions': '2'},
'segments': {'date': '2022-12-03'}},
{'metrics': {'clicks': '2', 'cost_micros': '337754', 'impressions': '241'},
'segments': {'date': '2022-12-03'}},
{'metrics': {'clicks': '0', 'cost_micros': '0', 'impressions': '4'},
'segments': {'date': '2022-12-04'}},
{'metrics': {'clicks': '2', 'cost_micros': '757299', 'impressions': '197'},
'segments': {'date': '2022-12-04'}}
]
}, '.')
Produces:
{'response.0.metrics.clicks': '0',
'response.0.metrics.cost_micros': '0',
'response.0.metrics.impressions': '3',
'response.0.segments.date': '2022-12-01',
'response.1.metrics.clicks': '1',
'response.1.metrics.cost_micros': '609240',
'response.1.metrics.impressions': '358',
'response.1.segments.date': '2022-12-01',
'response.2.metrics.clicks': '0',
'response.2.metrics.cost_micros': '0',
'response.2.metrics.impressions': '3',
'response.2.segments.date': '2022-12-02',
'response.3.metrics.clicks': '2',
'response.3.metrics.cost_micros': '40000',
'response.3.metrics.impressions': '291',
'response.3.segments.date': '2022-12-02',
'response.4.metrics.clicks': '0',
'response.4.metrics.cost_micros': '0',
'response.4.metrics.impressions': '2',
'response.4.segments.date': '2022-12-03',
'response.5.metrics.clicks': '2',
'response.5.metrics.cost_micros': '337754',
'response.5.metrics.impressions': '241',
'response.5.segments.date': '2022-12-03',
'response.6.metrics.clicks': '0',
'response.6.metrics.cost_micros': '0',
'response.6.metrics.impressions': '4',
'response.6.segments.date': '2022-12-04',
'response.7.metrics.clicks': '2',
'response.7.metrics.cost_micros': '757299',
'response.7.metrics.impressions': '197',
'response.7.segments.date': '2022-12-04'}
Related
I'm having troubles completely unnesting this json from an Api.
[{'id': 1,
'name': 'Buzz',
'tagline': 'A Real Bitter Experience.',
'first_brewed': '09/2007',
'description': 'A light, crisp and bitter IPA brewed with English and American hops. A small batch brewed only once.',
'image_url': 'https://images.punkapi.com/v2/keg.png',
'abv': 4.5,
'ibu': 60,
'target_fg': 1010,
'target_og': 1044,
'ebc': 20,
'srm': 10,
'ph': 4.4,
'attenuation_level': 75,
'volume': {'value': 20, 'unit': 'litres'},
'boil_volume': {'value': 25, 'unit': 'litres'},
'method': {'mash_temp': [{'temp': {'value': 64, 'unit': 'celsius'},
'duration': 75}],
'fermentation': {'temp': {'value': 19, 'unit': 'celsius'}},
'twist': None},
'ingredients': {'malt': [{'name': 'Maris Otter Extra Pale',
'amount': {'value': 3.3, 'unit': 'kilograms'}},
{'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}},
{'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}],
'hops': [{'name': 'Fuggles',
'amount': {'value': 25, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'First Gold',
'amount': {'value': 25, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Fuggles',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'middle',
'attribute': 'flavour'},
{'name': 'First Gold',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'middle',
'attribute': 'flavour'},
{'name': 'Cascade',
'amount': {'value': 37.5, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'}],
'yeast': 'Wyeast 1056 - American Ale™'},
'food_pairing': ['Spicy chicken tikka masala',
'Grilled chicken quesadilla',
'Caramel toffee cake'],
'brewers_tips': 'The earthy and floral aromas from the hops can be overpowering. Drop a little Cascade in at the end of the boil to lift the profile with a bit of citrus.',
'contributed_by': 'Sam Mason <samjbmason>'},
{'id': 2,
'name': 'Trashy Blonde',
'tagline': "You Know You Shouldn't",
'first_brewed': '04/2008',
'description': 'A titillating, neurotic, peroxide punk of a Pale Ale. Combining attitude, style, substance, and a little bit of low self esteem for good measure; what would your mother say? The seductive lure of the sassy passion fruit hop proves too much to resist. All that is even before we get onto the fact that there are no additives, preservatives, pasteurization or strings attached. All wrapped up with the customary BrewDog bite and imaginative twist.',
'image_url': 'https://images.punkapi.com/v2/2.png',
'abv': 4.1,
'ibu': 41.5,
'target_fg': 1010,
'target_og': 1041.7,
'ebc': 15,
'srm': 15,
'ph': 4.4,
'attenuation_level': 76,
'volume': {'value': 20, 'unit': 'litres'},
'boil_volume': {'value': 25, 'unit': 'litres'},
'method': {'mash_temp': [{'temp': {'value': 69, 'unit': 'celsius'},
'duration': None}],
'fermentation': {'temp': {'value': 18, 'unit': 'celsius'}},
'twist': None},
'ingredients': {'malt': [{'name': 'Maris Otter Extra Pale',
'amount': {'value': 3.25, 'unit': 'kilograms'}},
{'name': 'Caramalt', 'amount': {'value': 0.2, 'unit': 'kilograms'}},
{'name': 'Munich', 'amount': {'value': 0.4, 'unit': 'kilograms'}}],
'hops': [{'name': 'Amarillo',
'amount': {'value': 13.8, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Simcoe',
'amount': {'value': 13.8, 'unit': 'grams'},
'add': 'start',
'attribute': 'bitter'},
{'name': 'Amarillo',
'amount': {'value': 26.3, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'},
{'name': 'Motueka',
'amount': {'value': 18.8, 'unit': 'grams'},
'add': 'end',
'attribute': 'flavour'}],
'yeast': 'Wyeast 1056 - American Ale™'},
'food_pairing': ['Fresh crab with lemon',
'Garlic butter dipping sauce',
'Goats cheese salad',
'Creamy lemon bar doused in powdered sugar'],
'brewers_tips': 'Be careful not to collect too much wort from the mash. Once the sugars are all washed out there are some very unpleasant grainy tasting compounds that can be extracted into the wort.',
'contributed_by': 'Sam Mason <samjbmason>'}]
I was able to unnest it to a level using json_normalize
import requests
import pandas as pd
url = "https://api.punkapi.com/v2/beers"
requests.get(url).json()
data = requests.get(url).json()
pd.json_normalize(data)
this is an image of the output after using json_normalize
now to unnest the column 'method.mash_temp' I included record_path
pd.json_normalize(
data,
record_path =['method', 'mash_temp'],
meta=['id', 'name']
)
but I am having troubles adding the other columns('ingredients.malt', 'ingredients.hops') with list of dictionaries in the record_path argument.
I’m trying to use Python print specific values from a JSON file that I pulled from an API. From what I understand, I am pulling it as a JSON file that has a list of dictionaries of players, with a nested dictionary for each player containing their data (i.e. name, team, etc.).
I’m running into issues printing the values within the JSON file, as each character is printing on a separate line.
The end result I am trying to get to is a Pandas DataFrame containing all the values from the JSON file, but I can’t even seem to iterate through the JSON file correctly.
Here is my code:
url = "https://api-football-v1.p.rapidapi.com/v3/players"
querystring = {"league":"39","season":"2020", "page":"2"}
headers = {
"X-RapidAPI-Host": "api-football-v1.p.rapidapi.com",
"X-RapidAPI-Key": "xxxxxkeyxxxxx"
}
response = requests.request("GET", url, headers=headers, params=querystring).json()
response_dump = json.dumps(response)
for item in response_dump:
for player_item in item:
print(player_item)
This is the output when I print the JSON response (first two items):
{'get': 'players', 'parameters': {'league': '39', 'page': '2', 'season': '2020'}, 'errors': [], 'results': 20, 'paging': {'current': 2, 'total': 37}, 'response': [{'player': {'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}, 'statistics': [{'team': {'id': 40, 'name': 'Liverpool', 'logo': 'https://media.api-sports.io/football/teams/40.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Attacker', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}, {'player': {'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}, 'statistics': [{'team': {'id': 39, 'name': 'Wolves', 'logo': 'https://media.api-sports.io/football/teams/39.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Midfielder', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]},
This is the data type of each layer of the JSON file, from when I iterated through it with a For loop:
print(type(response)) <class 'dict'>
print(type(response_dump)) <class 'str'>
print(type(item)) <class 'str'>
print(type(player_item)) <class 'str'>
You do not have to json.dumps() in my opinion, just use the JSON from response to iterate:
for player in response['response']:
print(player)
{'player': {'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}, 'statistics': [{'team': {'id': 40, 'name': 'Liverpool', 'logo': 'https://media.api-sports.io/football/teams/40.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Attacker', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}
{'player': {'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}, 'statistics': [{'team': {'id': 39, 'name': 'Wolves', 'logo': 'https://media.api-sports.io/football/teams/39.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Midfielder', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}
or
for player in response['response']:
print(player['player'])
{'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}
{'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}
To get a DataFrame simply call pd.json_normalize() - Cause your question is not that clear I am not sure wiche information is needed and how to displayed. This is predestinated to ask a new question with exact that focus.:
pd.json_normalize(response['response'])
EDIT
Based on your comment and improvment:
pd.concat([pd.json_normalize(response,['response'])\
,pd.json_normalize(response,['response','statistics'])], axis=1)\
.drop(['statistics'], axis=1)
player.id
player.name
player.firstname
player.lastname
player.age
player.birth.date
player.birth.place
player.birth.country
player.nationality
player.height
player.weight
player.injured
player.photo
team.id
team.name
team.logo
league.id
league.name
league.country
league.logo
league.flag
league.season
games.appearences
games.lineups
games.minutes
games.number
games.position
games.rating
games.captain
substitutes.in
substitutes.out
substitutes.bench
shots.total
shots.on
goals.total
goals.conceded
goals.assists
goals.saves
passes.total
passes.key
passes.accuracy
tackles.total
tackles.blocks
tackles.interceptions
duels.total
duels.won
dribbles.attempts
dribbles.success
dribbles.past
fouls.drawn
fouls.committed
cards.yellow
cards.yellowred
cards.red
penalty.won
penalty.commited
penalty.scored
penalty.missed
penalty.saved
0
301
Benjamin Luke Woodburn
Benjamin Luke
Woodburn
23
1999-10-15
Nottingham
England
Wales
174 cm
72 kg
False
https://media.api-sports.io/football/players/301.png
40
Liverpool
https://media.api-sports.io/football/teams/40.png
39
Premier League
England
https://media.api-sports.io/football/leagues/39.png
https://media.api-sports.io/flags/gb.svg
2020
0
0
0
Attacker
False
0
0
3
0
0
0
0
0
0
0
1
518
Meritan Shabani
Meritan
Shabani
23
1999-03-15
München
Germany
Germany
185 cm
78 kg
False
https://media.api-sports.io/football/players/518.png
39
Wolves
https://media.api-sports.io/football/teams/39.png
39
Premier League
England
https://media.api-sports.io/football/leagues/39.png
https://media.api-sports.io/flags/gb.svg
2020
0
0
0
Midfielder
False
0
0
3
0
0
0
0
0
0
0
I have two datasets - one as a dataframe and the other as an array of JSON files.
Each line in the df has a string (folio number) that identifies a piece of land (Ex: '0101000000030'), and a date (in datetime) a permit was applied for.
Every JSON file in the array has a corresponding number identifying that land. It also has dates the property was sold, to whom it was sold, and the seller.
I need to take the folio number and the date the permit was applied for and run it through the array of JSON files until it finds the matching folio.
Then, it needs to extract the property's owner information by finding who owned the property when the permit was applied for and append it to the corresponding row in the df.
Desired Output
FirstSubmissionDate Folio PropertyOwner
05/17/2018 '0101000000030' blahblah
Input DF
FirstSubmissionDate Folio
05/17/2018 '0101000000030'
Input JSON
{'Additionals': {'AddtionalInfo': [{'Key': 'LAND USE AND RESTRICTIONS',
'Value': [{'InfoName': 'Community Development District',
'InfoValue': 'COUNTYGIS',
'Message': ''},
{'InfoName': 'Community Redevelopment Area',
'InfoValue': 'COUNTYGIS',
'Message': ''},
{'InfoName': 'Empowerment Zone', 'InfoValue': 'COUNTYGIS', 'Message': ''},
{'InfoName': 'Enterprise Zone', 'InfoValue': 'COUNTYGIS', 'Message': ''},
{'InfoName': 'Urban Development',
'InfoValue': 'COUNTYGIS',
'Message': ''},
{'InfoName': 'Zoning Code', 'InfoValue': 'COUNTYGIS', 'Message': ''},
{'InfoName': 'Existing Land Use',
'InfoValue': 'COUNTYGIS',
'Message': ''},
{'InfoName': 'Government Agencies and Community Services',
'InfoValue': 'http://gisweb.miamidade.gov/communityservices/CommunityServicesAll.html?x=&y=&bufferDistance=5&address=60 SE 2 ST',
'Message': ''}]},
{'Key': 'OTHER GOVERNMENTAL JURISDICTIONS',
'Value': [{'InfoName': 'Business Incentives',
'InfoValue': 'https://gisweb.miamidade.gov/businessincentive/default.aspx?searchtype=address¶mvalue=',
'Message': ''},
{'InfoName': 'Childrens Trust',
'InfoValue': 'https://www.thechildrenstrust.org/',
'Message': ''},
{'InfoName': 'City of Miami',
'InfoValue': 'http://www.miamigov.com/home/',
'Message': ''},
{'InfoName': 'Environmental Considerations',
'InfoValue': 'https://gisweb.miamidade.gov/environmentalconsiderations/default.aspx?searchtype=address¶mvalue=60 SE 2 ST',
'Message': ''},
{'InfoName': 'Florida Inland Navigation District',
'InfoValue': 'http://www.aicw.org',
'Message': ''},
{'InfoName': 'PA Bulletin Board',
'InfoValue': 'http://bbs.miamidade.gov/',
'Message': ''},
{'InfoName': 'Special Taxing District and Other Non-Ad valorem Assessment',
'InfoValue': 'http://www.miamidade.gov/Apps/PA/PAOnlineTools/Taxes/NonAdvalorem.aspx?folio=0101000000030',
'Message': ''},
{'InfoName': 'School Board',
'InfoValue': 'http://www.dadeschools.net/',
'Message': ''},
{'InfoName': 'South Florida Water Mgmt District',
'InfoValue': 'http://www.sfwmd.gov/portal/page/portal/sfwmdmain/home%20page',
'Message': ''},
{'InfoName': 'Tax Collector',
'InfoValue': 'http://www.miamidade.gov/taxcollector/',
'Message': ''}]}],
'FooterMessage': '',
'HeaderMessage': "* The information listed below is not derived from the Property Appraiser's Office records. It is provided for convenience and is derived from other government agencies."},
'Assessment': {'AssessmentInfos': [{'AssessedValue': 5587359,
'BuildingOnlyValue': 0,
'ExtraFeatureValue': 0,
'LandValue': 7618560,
'Message': None,
'TotalValue': 7618560,
'Year': 2021},
{'AssessedValue': 5079418,
'BuildingOnlyValue': 0,
'ExtraFeatureValue': 0,
'LandValue': 6963200,
'Message': None,
'TotalValue': 6963200,
'Year': 2020},
{'AssessedValue': 4617653,
'BuildingOnlyValue': 0,
'ExtraFeatureValue': 0,
'LandValue': 6963200,
'Message': None,
'TotalValue': 6963200,
'Year': 2019}],
'Messages': [{'Message': '', 'Year': 2021},
{'Message': '', 'Year': 2020},
{'Message': '', 'Year': 2019}]},
'Benefit': {'BenefitInfos': [{'Description': 'Non-Homestead Cap',
'Message': None,
'Seq': '5',
'TaxYear': 2021,
'Type': 'Assessment Reduction',
'Url': 'http://www.miamidade.gov/pa/property_value_cap.asp',
'Value': 2031201},
{'Description': 'Non-Homestead Cap',
'Message': None,
'Seq': '5',
'TaxYear': 2020,
'Type': 'Assessment Reduction',
'Url': 'http://www.miamidade.gov/pa/property_value_cap.asp',
'Value': 1883782},
{'Description': 'Non-Homestead Cap',
'Message': None,
'Seq': '5',
'TaxYear': 2019,
'Type': 'Assessment Reduction',
'Url': 'http://www.miamidade.gov/pa/property_value_cap.asp',
'Value': 2345547}],
'Messages': []},
'Building': {'BuildingInfos': [], 'Messages': []},
'ClassifiedAgInfo': {'Acreage': 0,
'CalculatedValue': 0,
'LandCode': None,
'LandUse': None,
'Message': None,
'UnitPrice': 0},
'Completed': True,
'District': 6,
'ExtraFeature': {'ExtraFeatureInfos': [], 'Messages': []},
'GeoParcel': None,
'Land': {'Landlines': [{'AdjustedUnitPrice': 465,
'CalculatedValue': 7618560,
'Depth': 0,
'FrontFeet': 0,
'LandUse': 'GENERAL',
'LandlineType': 'C',
'Message': None,
'MuniZone': 'T6-80-O',
'MuniZoneDescription': None,
'PAZoneDescription': 'COMMERCIAL',
'PercentCondition': 1,
'RollYear': 2021,
'TotalAdjustments': 1,
'UnitType': 'Square Ft.',
'Units': 16384,
'UseCode': '00',
'Zone': '6401'},
{'AdjustedUnitPrice': -1,
'CalculatedValue': -1,
'Depth': 0,
'FrontFeet': 0,
'LandUse': 'GENERAL',
'LandlineType': 'C',
'Message': None,
'MuniZone': 'T6-80-O',
'MuniZoneDescription': None,
'PAZoneDescription': 'COMMERCIAL',
'PercentCondition': 1,
'RollYear': 2020,
'TotalAdjustments': 1,
'UnitType': 'Square Ft.',
'Units': 16384,
'UseCode': '00',
'Zone': '6401'},
{'AdjustedUnitPrice': -1,
'CalculatedValue': -1,
'Depth': 0,
'FrontFeet': 0,
'LandUse': 'GENERAL',
'LandlineType': 'C',
'Message': None,
'MuniZone': 'T6-80-O',
'MuniZoneDescription': None,
'PAZoneDescription': 'COMMERCIAL',
'PercentCondition': 1,
'RollYear': 2019,
'TotalAdjustments': 1,
'UnitType': 'Square Ft.',
'Units': 16384,
'UseCode': '00',
'Zone': '6401'}],
'Messages': [{'Message': '', 'Year': 2021},
{'Message': 'The calculated values for this property have been overridden. Please refer to the Land, Building, and XF Values in the Assessment Section, in order to obtain the most accurate values.',
'Year': 2020},
{'Message': 'The calculated values for this property have been overridden. Please refer to the Land, Building, and XF Values in the Assessment Section, in order to obtain the most accurate values.',
'Year': 2019}]},
'LegalDescription': {'Description': 'MIAMI NORTH PB B-41|BEG 12.2FT W OF X OF S/L OF SE 2|ST & W/L OF SE 1 AVE TH S11.85FT|SWLY A/D 72.55FT S52.71FT|W108.69FT N10FT W4.6FT N123.52FT|E137.4FT TO POB|LOT SIZE 16384 SQ FT|COC 25843-0025 26307-3840 0707 6',
'Message': None,
'Number': None},
'MailingAddress': {'Address1': '1000 BRICKELL AVE STE 400',
'Address2': '',
'Address3': '',
'City': 'MIAMI',
'Country': 'USA',
'Message': None,
'State': 'FL',
'ZipCode': '33131'},
'Message': '',
'OwnerInfos': [{'Description': 'Sole Owner',
'MarriedFlag': '0',
'Message': None,
'Name': '16 SE 2ND STREET DOWNTOWN',
'PercentageOwn': 1,
'Role': None,
'ShortDescription': 'Sole Owner',
'TenancyCd': 'S'},
{'Description': 'Sole Owner',
'MarriedFlag': '0',
'Message': None,
'Name': 'INVESTMENT LLC',
'PercentageOwn': 1,
'Role': None,
'ShortDescription': 'Sole Owner',
'TenancyCd': 'S'}],
'PropertyInfo': {'BathroomCount': 0,
'BedroomCount': 0,
'BuildingActualArea': 0,
'BuildingBaseArea': 0,
'BuildingEffectiveArea': 0,
'BuildingGrossArea': 0,
'BuildingHeatedArea': 0,
'DORCode': '1081',
'DORDescription': 'VACANT LAND - COMMERCIAL : VACANT LAND',
'DORDescriptionCurrent': None,
'EncodedFolioAndTaxYear': 'J1COeydnmm%2fHHVEoyromqjt3GPqH8da%2fsulgVBOgI7w%3d',
'FloorCount': 0,
'FolioNumber': '01-0100-000-0030',
'HalfBathroomCount': 0,
'HxBaseYear': 0,
'LotSize': 16384,
'Message': None,
'Municipality': 'Miami',
'Neighborhood': 69010,
'NeighborhoodDescription': 'Miami CBD',
'ParentFolio': '',
'PercentHomesteadCapped': 0,
'PlatBook': 'B',
'PlatPage': '41',
'PrimaryZone': '6401',
'PrimaryZoneDescription': 'COMMERCIAL',
'ShowCurrentValuesFlag': 'N',
'Status': 'AC Active',
'Subdivision': '010100000',
'SubdivisionDescription': '353017046',
'UnitCount': 0,
'YearBuilt': '0'},
'RollYear1': 2021,
'SalesInfos': [{'DateOfSale': '6/23/2021',
'DocumentStamps': 276000,
'EncodedRecordBookAndPage': 'lHVlhHQhIZoJRUYKiXnhi4goVgjenckUAcgPekALEZ8LlG%2bmH%2bycTA%3d%3d',
'GranteeName1': '16 SE 2ND STREET DOWNTOWN',
'GranteeName2': 'INVESTMENT LLC',
'GrantorName1': '16 SE 2ND STREET LLC',
'GrantorName2': '',
'Message': None,
'OfficialRecordBook': '32602',
'OfficialRecordPage': '3521',
'QualificationDescription': 'Qual on DOS, multi-parcel sale',
'QualifiedFlag': 'Q',
'QualifiedSYear': None,
'QualifiedSourceCode': '',
'ReasonCode': '05',
'ReviewCode': None,
'SaleId': 5,
'SaleInstrument': 'WDE',
'SalePrice': 46000000,
'VacantFlag': '\x00',
'ValidCode': None,
'VerifyCode': None},
{'DateOfSale': '5/24/2013',
'DocumentStamps': 0,
'EncodedRecordBookAndPage': 'lHVlhHQhIZoJRUYKiXnhiyo2fiU6Ad2Yj6ROwqxBp26vA0B1JkALuQ%3d%3d',
'GranteeName1': '16 SE 2ND STREET LLC',
'GranteeName2': '',
'GrantorName1': 'BURDINES 1225 LLC',
'GrantorName2': '',
'Message': None,
'OfficialRecordBook': '28688',
'OfficialRecordPage': '1169',
'QualificationDescription': 'Financial inst or "In Lieu of Forclosure" stated',
'QualifiedFlag': 'U',
'QualifiedSYear': None,
'QualifiedSourceCode': '',
'ReasonCode': '12',
'ReviewCode': None,
'SaleId': 4,
'SaleInstrument': 'DEE',
'SalePrice': 32620638,
'VacantFlag': '\x00',
'ValidCode': None,
'VerifyCode': None},
{'DateOfSale': '8/1/1989',
'DocumentStamps': 0,
'EncodedRecordBookAndPage': 'lHVlhHQhIZoJRUYKiXnhi9bvfovmAqmTIZ5uJf3HEgtQChvRqiPQDw%3d%3d',
'GranteeName1': '',
'GranteeName2': '',
'GrantorName1': '',
'GrantorName2': '',
'Message': None,
'OfficialRecordBook': '14202',
'OfficialRecordPage': '2339',
'QualificationDescription': 'Deeds that include more than one parcel',
'QualifiedFlag': 'Q',
'QualifiedSYear': None,
'QualifiedSourceCode': '',
'ReasonCode': '02',
'ReviewCode': None,
'SaleId': 3,
'SaleInstrument': '',
'SalePrice': 6200000,
'VacantFlag': '\x00',
'ValidCode': None,
'VerifyCode': None},
{'DateOfSale': '9/1/2003',
'DocumentStamps': 0,
'EncodedRecordBookAndPage': 'lHVlhHQhIZoJRUYKiXnhi5pTmn2bXcBBM42%2bwPcIyhry9UhcpSwX4g%3d%3d',
'GranteeName1': '',
'GranteeName2': '',
'GrantorName1': '',
'GrantorName2': '',
'Message': None,
'OfficialRecordBook': '21695',
'OfficialRecordPage': '3500',
'QualificationDescription': 'Deeds that include more than one parcel',
'QualifiedFlag': 'Q',
'QualifiedSYear': None,
'QualifiedSourceCode': '',
'ReasonCode': '02',
'ReviewCode': None,
'SaleId': 2,
'SaleInstrument': '',
'SalePrice': 8800000,
'VacantFlag': '\x00',
'ValidCode': None,
'VerifyCode': None},
{'DateOfSale': '7/1/2007',
'DocumentStamps': 0,
'EncodedRecordBookAndPage': 'lHVlhHQhIZoJRUYKiXnhi5bVa6yEIUa%2bSDngqM2N5YUM89ag%2fj8HOA%3d%3d',
'GranteeName1': '',
'GranteeName2': '',
'GrantorName1': '',
'GrantorName2': '',
'Message': None,
'OfficialRecordBook': '25843',
'OfficialRecordPage': '0025',
'QualificationDescription': 'Other disqualified',
'QualifiedFlag': 'U',
'QualifiedSYear': None,
'QualifiedSourceCode': '',
'ReasonCode': '03',
'ReviewCode': None,
'SaleId': 1,
'SaleInstrument': '',
'SalePrice': 21500000,
'VacantFlag': '\x00',
'ValidCode': None,
'VerifyCode': None}],
'SiteAddress': [{'Address': '60 SE 2 ST, Miami, FL 33131-2103',
'BuildingNumber': 1,
'City': 'Miami',
'HouseNumberSuffix': '',
'Message': None,
'StreetName': '2',
'StreetNumber': 60,
'StreetPrefix': 'SE',
'StreetSuffix': 'ST',
'StreetSuffixDirection': '',
'Unit': '',
'Zip': '33131-2103'}],
'Taxable': {'Messages': [],
'TaxableInfos': [{'CityExemptionValue': 0,
'CityTaxableValue': 5587359,
'CountyExemptionValue': 0,
'CountyTaxableValue': 5587359,
'Message': None,
'RegionalExemptionValue': 0,
'RegionalTaxableValue': 5587359,
'SchoolExemptionValue': 0,
'SchoolTaxableValue': 7618560,
'Year': 2021},
{'CityExemptionValue': 0,
'CityTaxableValue': 5079418,
'CountyExemptionValue': 0,
'CountyTaxableValue': 5079418,
'Message': None,
'RegionalExemptionValue': 0,
'RegionalTaxableValue': 5079418,
'SchoolExemptionValue': 0,
'SchoolTaxableValue': 6963200,
'Year': 2020},
{'CityExemptionValue': 0,
'CityTaxableValue': 4617653,
'CountyExemptionValue': 0,
'CountyTaxableValue': 4617653,
'Message': None,
'RegionalExemptionValue': 0,
'RegionalTaxableValue': 4617653,
'SchoolExemptionValue': 0,
'SchoolTaxableValue': 6963200,
'Year': 2019}]}}
I can distill down to the sales info:
for y in variable_name[1]['SalesInfos']:
y = y['DateOfSale']
y = datetime.strptime(y, '%m/%d/%Y')
print(y)
Edit - Made changes to the code from where I'm receiving the jsondata and converting it into a dictionary.
(Sorry for the inconvenience and confusion caused)
I need to fetch values from product_name which is a key in the list of dictionaries inside a dictionary.
products = ([{'product_id': 'WVXNR',
'product_name': 'BASUNDI 200 ML',
'product_description': 'A traditional taste of sweetened condensed milk with rich creamy flavor, can be served warm or chilled.',
'product_images': '/images/productImages/Basundi_lH9o5wD.png',
'product_price': 80.0,
'gst': 0,
'product_status': None,
'discount': None,
'rating': None,
'product_quantity': 1,
'get_product_total': 80.0,
'get_product_total_gst': 0.0},
{'product_id': 'MEADN',
'product_name': 'MASALA MILK',
'product_description': 'Blended with dry fruit and saffron is rich in vitamins and minerals, this healthy and nutritious milk is an all-time favorite!!',
'product_images': '/images/productImages/masala_milk_BlypKDx.png',
'product_price': 190.0,
'gst': 0,
'product_status': None,
'discount': None,
'rating': None,
'product_quantity': 1,
'get_product_total': 190.0,
'get_product_total_gst': 0.0
}],)
What I've tried
Assuming products is a variable containing all the data
product_name = [x for x in products['product_name']]
Correction above
Doing this would give me
tuple indices must be integers or slices, not str
What I need
product_name = ['BASUNDI 200 ML', 'MASALA MILK']
Assuming that
products = product_data['products']
results in
[{'product_id': 'NNPTA', 'product_name': 'CHASS 200 ML', 'product_description': 'The secret recipe of butter milk from Punjab Sind Foods is a refresher any time! an excellent source of probiotics and a must have with every meal for better digestion.', 'product_images': '/images/productImages/Masala_Chass_yGg9K92.png', 'product_price': 28.0, 'gst': 0, 'product_status': None, 'discount': None, 'rating': None, 'product_quantity': 2, 'get_product_total': 56.0, 'get_product_total_gst': 0.0}, {'product_id': 'HZCNM', 'product_name': 'FRESH MILK 1 LTR', 'product_description': 'Our milk is free of chemical, pesticides and preservatives. We are committed to provide hygienic and healthy milk every time you order from us.', 'product_images': '/images/productImages/Fresh_milk_IL.png', 'product_price': 62.0, 'gst': 0, 'product_status': None, 'discount': None, 'rating': None, 'product_quantity': 1, 'get_product_total': 62.0, 'get_product_total_gst': 0.0}]
Then what you are looking for can be obtained with:
product_names = [x['product_name'] for x in products]
Since product_data is a list which is a iterable, you can use map function with lambda function that extracts from the dict:
product_names = list(map(lambda product: product["product_name"], product_data))
Thanks for the edit. Watch the stray comma at the end of your data (this creates a tuple, rather than a list).
products = ([{'product_id': 'WVXNR',
'product_name': 'BASUNDI 200 ML',
'product_description': 'A traditional taste of sweetened condensed milk with rich creamy flavor, can be served warm or chilled.',
'product_images': '/images/productImages/Basundi_lH9o5wD.png',
'product_price': 80.0,
'gst': 0,
'product_status': None,
'discount': None,
'rating': None,
'product_quantity': 1,
'get_product_total': 80.0,
'get_product_total_gst': 0.0},
{'product_id': 'MEADN',
'product_name': 'MASALA MILK',
'product_description': 'Blended with dry fruit and saffron is rich in vitamins and minerals, this healthy and nutritious milk is an all-time favorite!!',
'product_images': '/images/productImages/masala_milk_BlypKDx.png',
'product_price': 190.0,
'gst': 0,
'product_status': None,
'discount': None,
'rating': None,
'product_quantity': 1,
'get_product_total': 190.0,
'get_product_total_gst': 0.0
}])
The following list comprehension should return what you need now.
product_names = [d['product_name'] for d in products]
print(product_names)
Stackoverflow, please do your magic,
i have dataframe pandas like this
Column_one \
{{'name': 'Marfon ', 'email': '', 'phone': '123454333', 'address': 'San Jose', 'estimated_date': 2019-10-01 00:00:00, 'estimated_time': {'minimum': 1000, 'maximum': 1200, 'min': 0, 'max': 0}}
{{'name': 'Joe Doe ', 'email': 'joe#gmail.com', 'phone': '987655444', 'address': 'Carolina', 'estimated_date': 2019-10-01 00:00:00, 'estimated_time': {'minimum': 1000, 'maximum': 1200, 'min': 0, 'max': 0}}
Column_two
[{'status': False, 'item_code': 'JSK', 'price': 15000, 'note': [], 'sub_total_price': 50}]
[{'status': False, 'item_code': 'HSO', 'price': 15000, 'note': [], 'sub_total_price': 100}]
how to create new dataframe like this?
name email phone address item_code
Marfon 123454333 San Jose JSK
Joe Doe joe#gmail.com 987655444 Carolina HSO
solved
column_one = pd.DataFrame(main_df['Column_one'].values.tolist(), index=main_df.index)
column_two = main_df['Column_two'].apply(lambda x: ', '.join(y['item_code'] for y in x))
data_con = pd.concat([column_one, column_two], axis=1)
print(data_con)
You have some mess in your input data. But if what you meant was this, then:
Column_one =\
[{'name': 'Marfon ', 'email': '', 'phone': '123454333', 'address': 'San Jose', 'estimated_date': '2019-10-01 00:00:00'},
{'name': 'Joe Doe ', 'email': 'joe#gmail.com', 'phone': '987655444', 'address': 'Carolina', 'estimated_date': '2019-10-01 00:00:00'}]
Column_two=\
[{'status': False, 'item_code': 'JSK', 'price': 15000, 'note': [], 'sub_total_price': 50},
{'status': False, 'item_code': 'HSO', 'price': 15000, 'note': [], 'sub_total_price': 100}]
pd.concat([pd.DataFrame(Column_one), pd.DataFrame(Column_two)], axis=1)
output:
name email phone address estimated_date status item_code price note sub_total_price
Marfon 123454333 San Jose 2019-10-01 00:00:00 False JSK 15000 [] 50
Joe Doe joe#gmail.com 987655444 Carolina 2019-10-01 00:00:00 False HSO 15000 [] 100