Related
I have the following data, and when I used json_flatten i was able to extract most of the data except for deliveryMethod.items and languages.items.
I also tried to use pd.json_normalize(a, record_path= 'deliveryMethod.items') but it doesn't seem to be working.
a = {'ID': '1', 'Name': 'ABC', 'Center': 'Center For Education', 'providerNameAr': 'ABC', 'city': {'id': 1, 'cityEn': 'LA', 'regionId': 0, 'region': None}, 'cityName': None, 'LevelNumber': 'ABCD', 'activityStartDate': '09/01/2020', 'activityEndDate': '09/02/2020', 'activityType': {'lookUpId': 2, 'lookUpEn': 'Course', 'code': None, 'parent': None, 'hasParent': False}, 'deliveryMethod': {'items': [{'lookUpId': 2, 'lookUpEn': 'online' 'code': None, 'parent': None, 'hasParent': False}]}, 'languages': {'items': [{'lookUpId': 1, 'lookUpEn': 'English', 'code': None, 'parent': None, 'hasParent': False}]}, 'activityCategory': {'lookUpId': 1, 'lookUpEn': 'Regular', 'code': None, 'parent': None, 'hasParent': False}, 'address': 'LA', 'phoneNumber': '-11111', 'emailAddress': 'ABCS#Gmail.com', 'isAllSpeciality': True, 'requestId': 23, 'parentActivityId': None, 'sppData': None}
I’m trying to use Python print specific values from a JSON file that I pulled from an API. From what I understand, I am pulling it as a JSON file that has a list of dictionaries of players, with a nested dictionary for each player containing their data (i.e. name, team, etc.).
I’m running into issues printing the values within the JSON file, as each character is printing on a separate line.
The end result I am trying to get to is a Pandas DataFrame containing all the values from the JSON file, but I can’t even seem to iterate through the JSON file correctly.
Here is my code:
url = "https://api-football-v1.p.rapidapi.com/v3/players"
querystring = {"league":"39","season":"2020", "page":"2"}
headers = {
"X-RapidAPI-Host": "api-football-v1.p.rapidapi.com",
"X-RapidAPI-Key": "xxxxxkeyxxxxx"
}
response = requests.request("GET", url, headers=headers, params=querystring).json()
response_dump = json.dumps(response)
for item in response_dump:
for player_item in item:
print(player_item)
This is the output when I print the JSON response (first two items):
{'get': 'players', 'parameters': {'league': '39', 'page': '2', 'season': '2020'}, 'errors': [], 'results': 20, 'paging': {'current': 2, 'total': 37}, 'response': [{'player': {'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}, 'statistics': [{'team': {'id': 40, 'name': 'Liverpool', 'logo': 'https://media.api-sports.io/football/teams/40.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Attacker', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}, {'player': {'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}, 'statistics': [{'team': {'id': 39, 'name': 'Wolves', 'logo': 'https://media.api-sports.io/football/teams/39.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Midfielder', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]},
This is the data type of each layer of the JSON file, from when I iterated through it with a For loop:
print(type(response)) <class 'dict'>
print(type(response_dump)) <class 'str'>
print(type(item)) <class 'str'>
print(type(player_item)) <class 'str'>
You do not have to json.dumps() in my opinion, just use the JSON from response to iterate:
for player in response['response']:
print(player)
{'player': {'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}, 'statistics': [{'team': {'id': 40, 'name': 'Liverpool', 'logo': 'https://media.api-sports.io/football/teams/40.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Attacker', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}
{'player': {'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}, 'statistics': [{'team': {'id': 39, 'name': 'Wolves', 'logo': 'https://media.api-sports.io/football/teams/39.png'}, 'league': {'id': 39, 'name': 'Premier League', 'country': 'England', 'logo': 'https://media.api-sports.io/football/leagues/39.png', 'flag': 'https://media.api-sports.io/flags/gb.svg', 'season': 2020}, 'games': {'appearences': 0, 'lineups': 0, 'minutes': 0, 'number': None, 'position': 'Midfielder', 'rating': None, 'captain': False}, 'substitutes': {'in': 0, 'out': 0, 'bench': 3}, 'shots': {'total': None, 'on': None}, 'goals': {'total': 0, 'conceded': 0, 'assists': None, 'saves': None}, 'passes': {'total': None, 'key': None, 'accuracy': None}, 'tackles': {'total': None, 'blocks': None, 'interceptions': None}, 'duels': {'total': None, 'won': None}, 'dribbles': {'attempts': None, 'success': None, 'past': None}, 'fouls': {'drawn': None, 'committed': None}, 'cards': {'yellow': 0, 'yellowred': 0, 'red': 0}, 'penalty': {'won': None, 'commited': None, 'scored': 0, 'missed': 0, 'saved': None}}]}
or
for player in response['response']:
print(player['player'])
{'id': 301, 'name': 'Benjamin Luke Woodburn', 'firstname': 'Benjamin Luke', 'lastname': 'Woodburn', 'age': 23, 'birth': {'date': '1999-10-15', 'place': 'Nottingham', 'country': 'England'}, 'nationality': 'Wales', 'height': '174 cm', 'weight': '72 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/301.png'}
{'id': 518, 'name': 'Meritan Shabani', 'firstname': 'Meritan', 'lastname': 'Shabani', 'age': 23, 'birth': {'date': '1999-03-15', 'place': 'München', 'country': 'Germany'}, 'nationality': 'Germany', 'height': '185 cm', 'weight': '78 kg', 'injured': False, 'photo': 'https://media.api-sports.io/football/players/518.png'}
To get a DataFrame simply call pd.json_normalize() - Cause your question is not that clear I am not sure wiche information is needed and how to displayed. This is predestinated to ask a new question with exact that focus.:
pd.json_normalize(response['response'])
EDIT
Based on your comment and improvment:
pd.concat([pd.json_normalize(response,['response'])\
,pd.json_normalize(response,['response','statistics'])], axis=1)\
.drop(['statistics'], axis=1)
player.id
player.name
player.firstname
player.lastname
player.age
player.birth.date
player.birth.place
player.birth.country
player.nationality
player.height
player.weight
player.injured
player.photo
team.id
team.name
team.logo
league.id
league.name
league.country
league.logo
league.flag
league.season
games.appearences
games.lineups
games.minutes
games.number
games.position
games.rating
games.captain
substitutes.in
substitutes.out
substitutes.bench
shots.total
shots.on
goals.total
goals.conceded
goals.assists
goals.saves
passes.total
passes.key
passes.accuracy
tackles.total
tackles.blocks
tackles.interceptions
duels.total
duels.won
dribbles.attempts
dribbles.success
dribbles.past
fouls.drawn
fouls.committed
cards.yellow
cards.yellowred
cards.red
penalty.won
penalty.commited
penalty.scored
penalty.missed
penalty.saved
0
301
Benjamin Luke Woodburn
Benjamin Luke
Woodburn
23
1999-10-15
Nottingham
England
Wales
174 cm
72 kg
False
https://media.api-sports.io/football/players/301.png
40
Liverpool
https://media.api-sports.io/football/teams/40.png
39
Premier League
England
https://media.api-sports.io/football/leagues/39.png
https://media.api-sports.io/flags/gb.svg
2020
0
0
0
Attacker
False
0
0
3
0
0
0
0
0
0
0
1
518
Meritan Shabani
Meritan
Shabani
23
1999-03-15
München
Germany
Germany
185 cm
78 kg
False
https://media.api-sports.io/football/players/518.png
39
Wolves
https://media.api-sports.io/football/teams/39.png
39
Premier League
England
https://media.api-sports.io/football/leagues/39.png
https://media.api-sports.io/flags/gb.svg
2020
0
0
0
Midfielder
False
0
0
3
0
0
0
0
0
0
0
I am having a JSON file where the annotation is stored as below
{'licenses': [{'name': '', 'id': 0, 'url': ''}], 'info': {'contributor': '', 'date_created': '', 'description': '', 'url': '', 'version': '', 'year': ''}, 'categories': [{'id': 1, 'name': 'book', 'supercategory': ''}, {'id': 2, 'name': 'ceiling', 'supercategory': ''}, {'id': 3, 'name': 'chair', 'supercategory': ''}, {'id': 4, 'name': 'floor', 'supercategory': ''}, {'id': 5, 'name': 'object', 'supercategory': ''}, {'id': 6, 'name': 'person', 'supercategory': ''}, {'id': 7, 'name': 'screen', 'supercategory': ''}, {'id': 8, 'name': 'table', 'supercategory': ''}, {'id': 9, 'name': 'wall', 'supercategory': ''}, {'id': 10, 'name': 'window', 'supercategory': ''}, {'id': 11, 'name': '__background__', 'supercategory': ''}], 'images': [{'id': 1, 'width': 848, 'height': 480, 'file_name': '153058384000.png', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}], 'annotations': [{'id': 1, 'image_id': 1, 'category_id': 7, 'segmentation': [[591.81, 146.75, 848.0, 119.83, 848.0, 289.18, 606.39, 288.06]], 'area': 38747.0, 'bbox': [591.81, 119.83, 256.19, 169.35], 'iscrowd': 0, 'attributes': {'occluded': False}}]}
I want to select a specific region from the image using the ''segmentation': [[591.81, 146.75, 848.0, 119.83, 848.0, 289.18, 606.39, 288.06]]' field within annotation in the above json file.
The image I am using is below
I tried with Opencv and PIL, but I didn't get effective output
Note: segmentation may have more than 8 coordinates
I have list as follows:
data = [
{'items': [
{'key': u'3', 'id': 1, 'name': u'Typeplaatje'},
{'key': u'2', 'id': 2, 'name': u'Aanduiding van het chassisnummer '},
{'key': u'1', 'id': 3, 'name': u'Kilometerteller: Kilometerstand '},
{'key': u'5', 'id': 4, 'name': u'Inschrijvingsbewijs '},
{'key': u'4', 'id': 5, 'name': u'COC of gelijkvormigheidsattest '}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'},
{'items': [
{'key': u'10', 'id': 10, 'name': u'Koppeling'},
{'key': u'7', 'id': 11, 'name': u'Differentieel '},
{'key': u'9', 'id': 12, 'name': u'Cardanhoezen '},
{'key': u'8', 'id': 13, 'name': u'Uitlaat '},
{'key': u'6', 'id': 15, 'name': u'Batterij'}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'}
]
And I want to sort items by key.
Thus the wanted result is as follows:
res = [
{'items': [
{'key': u'1', 'id': 3, 'name': u'Kilometerteller: Kilometerstand '},
{'key': u'2', 'id': 2, 'name': u'Aanduiding van het chassisnummer '},
{'key': u'3', 'id': 1, 'name': u'Typeplaatje'},
{'key': u'4', 'id': 5, 'name': u'COC of gelijkvormigheidsattest '},
{'key': u'5', 'id': 4, 'name': u'Inschrijvingsbewijs '},
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'},
{'items': [
{'key': u'6', 'id': 15, 'name': u'Batterij'},
{'key': u'7', 'id': 11, 'name': u'Differentieel '},
{'key': u'8', 'id': 13, 'name': u'Uitlaat '},
{'key': u'9', 'id': 12, 'name': u'Cardanhoezen '},
{'key': u'10', 'id': 10, 'name': u'Koppeling'}
], 'id': 2, 'key': u'B', 'name': u'Onderdelen'}
]
I've tried as follows:
res = []
for item in data:
new_data = {
'id': item['id'],
'key': item['key'],
'name': item['name'],
'items': sorted(item['items'], key=lambda k : k['key'])
}
res.append(new_data)
print(res)
The first is sorted fine, but the second one not.
What am I doing wrong and is there a better way of doing it?
Your sort is wrong in the second case because the keys are strings, and strings are sorted by their first character which is '1' if your key is '10'. A slight modification to your sorting function would do the trick:
'items': sorted(item['items'], key=lambda k : int(k['key'])
I'm doing an int because you want to sort them as if they are numbers. Here it is in your code:
res = []
for item in data:
new_data = {
'id': item['id'],
'key': item['key'],
'name': item['name'],
'items': sorted(item['items'], key=lambda k : int(k['key']) )
}
res.append(new_data)
print(res)
And here's the result:
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
You need to replace the old items in the data with the sorted items based on key numerically instead of string sort. So use int(item['key']) in sort like,
>>> data
[{'items': [{'key': '1', 'id': 3, 'name': 'Kilometerteller: Kilometerstand '}, {'key': '2', 'id': 2, 'name': 'Aanduiding van het chassisnummer '}, {'key': '3', 'id': 1, 'name': 'Typeplaatje'}, {'key': '4', 'id': 5, 'name': 'COC of gelijkvormigheidsattest '}, {'key': '5', 'id': 4, 'name': 'Inschrijvingsbewijs '}], 'id': 2, 'key': 'B', 'name': 'Onderdelen'}, {'items': [{'key': '6', 'id': 15, 'name': 'Batterij'}, {'key': '7', 'id': 11, 'name': 'Differentieel '}, {'key': '8', 'id': 13, 'name': 'Uitlaat '}, {'key': '9', 'id': 12, 'name': 'Cardanhoezen '}, {'key': '10', 'id': 10, 'name': 'Koppeling'}], 'id': 2, 'key': 'B', 'name': 'Onderdelen'}]
>>>
>>> for item in data:
... item['items'] = sorted(item['items'], key=lambda x: int(x['key']))
...
>>> import pprint
>>> pprint.pprint(data)
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
So list comes with a handy method called sort which sorts itself inplace. I'd use that to your advantage:
for d in data:
d['items'].sort(key=lambda x: int(x['key']))
Results:
[{'id': 2,
'items': [{'id': 3, 'key': '1', 'name': 'Kilometerteller: Kilometerstand '},
{'id': 2, 'key': '2', 'name': 'Aanduiding van het chassisnummer '},
{'id': 1, 'key': '3', 'name': 'Typeplaatje'},
{'id': 5, 'key': '4', 'name': 'COC of gelijkvormigheidsattest '},
{'id': 4, 'key': '5', 'name': 'Inschrijvingsbewijs '}],
'key': 'B',
'name': 'Onderdelen'},
{'id': 2,
'items': [{'id': 15, 'key': '6', 'name': 'Batterij'},
{'id': 11, 'key': '7', 'name': 'Differentieel '},
{'id': 13, 'key': '8', 'name': 'Uitlaat '},
{'id': 12, 'key': '9', 'name': 'Cardanhoezen '},
{'id': 10, 'key': '10', 'name': 'Koppeling'}],
'key': 'B',
'name': 'Onderdelen'}]
In Python, I am trying to turn a list of separate JSON data:
[[{'id': 1, 'name': 'pencil', 'description': '2b or not 2b, that is the question'}], [{'id': 2, 'name': 'oil pastel', 'description': None}], [{'id': 3, 'name': 'gouache', 'description': None}], [{'id': 4, 'name': 'paper', 'description': None}]]
into one piece of JSON data:
{'id': 1, 'name': 'pencil', 'description': '2b or not 2b, that is the question'}, {'id': 2, 'name': 'oil pastel', 'description': None}, {'id': 3, 'name': 'gouache', 'description': None}, {'id': 4, 'name': 'paper', 'description': None}, {'id': 5, 'name': 'coloured pencil', 'description': None}
Been struggling with it for a few hours. Does anyone have any ideas?
Use simple list-comprehension
[y for x in list_of_lists for y in x]
Output:
[{'description': '2b or not 2b, that is the question', 'id': 1, 'name': 'pencil'}, {'description': None, 'id': 2, 'name': 'oil pastel'}, {'description': None, 'id': 3, 'name': 'gouache'}, {'description': None, 'id': 4, 'name': 'paper'}]
Use itertools.chain
>>> list(itertools.chain.from_iterable(j))
Or a list comprehension
>>> [x[0] for x in j] # Assuming there is only one item in each list
Both outputs
[{'id': 1,
'name': 'pencil',
'description': '2b or not 2b, that is the question'},
{'id': 2, 'name': 'oil pastel', 'description': None},
{'id': 3, 'name': 'gouache', 'description': None},
{'id': 4, 'name': 'paper', 'description': None}]
Using functools with operator
j = [[{'id': 1, 'name': 'pencil', 'description': '2b or not 2b, that is the question'}], [{'id': 2, 'name': 'oil pastel', 'description': None}], [{'id': 3, 'name': 'gouache', 'description': None}], [{'id': 4, 'name': 'paper', 'description': None}]]
import functools
import operator
functools.reduce(operator.iadd,j,[])
Output:
[{'id': 1,
'name': 'pencil',
'description': '2b or not 2b, that is the question'},
{'id': 2, 'name': 'oil pastel', 'description': None},
{'id': 3, 'name': 'gouache', 'description': None},
{'id': 4, 'name': 'paper', 'description': None}]