OSM Data to Dataframe - python

I have a dictionary of OSM data that I queried using overpy. I would like to get this data into a DF With each tag as its own column. I am having trouble with the tags as they are nested within the dictionary and cannot reference the tags by name as they change from feature to feature
Example of dictionary:
{'version': 0.6,
'generator': 'Overpass API 0.7.57 93a4d346',
'osm3s': {'timestamp_osm_base': '2022-05-18T13:41:10Z',
'timestamp_areas_base': '2022-05-18T13:27:04Z',
'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},
'elements': [{'type': 'node',
'id': 33827805,
'lat': 46.2519258,
'lon': -60.5281258,
'tags': {'ele': '240', 'name': "Kelly's Mountain", 'natural': 'peak'}},
{'type': 'node',
'id': 244245796,
'lat': 56.008075,
'lon': -130.003485,
'tags': {'alt_name': 'Monument 8',
'ele': '1570',
'historic': 'boundary_stone',
'man_made': 'survey_point',
'name': 'Mount Welker',
'name:fr': 'Borne frontière 8',
'natural': 'peak',
'operator': 'International Boundary Commission',
'ref': 'MON 8',
'source': 'CA-US International Boundary Commission;NRCan-CanVec-10.0',
'start_date': '1905'}},
{'type': 'node',
'id': 244245903,
'lat': 56.2650369,
'lon': -130.615449,
'tags': {'alt_name': 'Monument 27',
'ele': '1667',
'historic': 'boundary_stone',
'man_made': 'survey_point',
'name': 'Mount Middleton',
'name:fr': 'Borne frontière 27',
'natural': 'peak',
'operator': 'International Boundary Commission',
'ref': 'MON 27',
'source': 'CA-US International Boundary Commission',
'start_date': '1920'}},
{'type': 'node',
'id': 244245911,
'lat': 56.3668442,
'lon': -130.78198,
'tags': {'alt_name': 'Monument 40',
'ele': '1828',
'historic': 'boundary_stone',
'man_made': 'survey_point',
'name': 'Mount Stoeckl',
'name:fr': 'Borne frontière 40',
'natural': 'peak',
'operator': 'International Boundary Commission',
'ref': 'MON 40',
'source': 'CA-US International Boundary Commission',
'start_date': '1905'}},
{'type': 'node',
'id': 244245929,
'lat': 56.4060047,
'lon': -131.087348,
'tags': {'alt_name': 'Unmarked Boundary Point 47',
'ele': '1840',
'man_made': 'survey_point',
'name': 'Mount Lewis Cass',
'name:fr': 'Point frontalier non-marqué 47',
'natural': 'peak',
'operator': 'International Boundary Commission',
'ref': 'BP 47',
'source': 'CA-US International Boundary Commission',
'start_date': '1907'}},
...]}
My desired end result would be a dataframe as such but with a column for all tag types. I.e. a column titled 'ele' with the first row value being 240 and the second with a value of 1570

Related

Creating a Python dictionary from other nested list containing dictionary in python

I have this list that contains dictionaries as its element
dict_1 = [{'id': '0eb7df70-f319-4562-ab2a-9e641e978b3b', 'first_name': 'Rahx', 'surname': 'Smith ', 'devices': {'os': 'Apple iPhone', 'mac_address': 'f4:af:e7:b7:ab:22', 'manufacturer': 'Apple'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': 'a0bb8d38-0d27-4d7f-acc0-1e850a706b6c', 'first_name': 'Lucy', 'surname': 'Pye', 'devices': {'os': 'Apple iPhone', 'mac_address': 'f8:87:f1:72:4c:4d', 'manufacturer': 'Apple'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': '0eb7df70-f319-4562-ab2a-9e641e978b3b', 'first_name': 'xyx', 'surname': 'dcsdd', 'devices': {'os': 'NOKIA Phone', 'mac_address': '78:28:ca:a8:56:b9', 'manufacturer': 'NOKIA'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': 'a0bb8d38-0d27-4d7f-acc0-1e850a706b6c', 'first_name': 'ddwdw', 'surname': 'sdsds', 'devices': {'os': 'MI Phone', 'mac_address': 'dc:08:0f:3f:57:0c', 'manufacturer': 'MI'}, 'lat': 54.33218267030654, 'lng': -6.27796001203896, 'seenTime': 1582814693000}]
and I want output like this from dict_1 variable
{
"f77df8c2-b19d-4341-9021-7beab4b9ebcd":{
"first_name":"anonymous",
"surname":"anonymous",
"lat":57.14913102,
"lng":-2.09987143,
"devices": {'os': 'MI Phone', 'mac_address': 'dc:08:0f:3f:57:0c', 'manufacturer': 'MI'},
"seenTime": 1582814693000
},
"7beab4b9ebcd-b19d-9021-f77df8c2-4341":{
etc.
},
etc.
}
help me to know what should I do in this case.
Try this.
dict_1 = {x.pop('id'): x for x in dict_1}
I think this could do the job :
dict_2 = {}
for d in dict_1 :
id = d.pop('id')
dict_2[id] = d

Regardless of CSS class element input to find_all() function of BeautifulSoup, I receive an empty list as output

import requests
from bs4 import BeautifulSoup
gene_list = {"Ccl2", "CXCR4"}
for seq in gene_list:
text = requests.get("https://uswest.ensembl.org/Multi/Search/Results?q=" + seq + ";site=ensembl").text
soup = BeautifulSoup(text, "lxml")
soup.find_all("div", {"class": "table_result"})
I am trying to search the Ensembl website (https://uswest.ensembl.org/index.html) for genes that we have found in our sequencing data that are un-annotated and then do some subsequent searching and data processing on them.
I just can't get my scraper to return anything but an empty list from my find_all(). I have tried every parser (html.parser, lxml, html5lib), both syntaxes of class ({'class': 'name_of_class'} and class_='name_of_class'), and multiple different CSS class elements. If I just define an HTML5 element like "div" then it returns the expected output. I have no idea why it won't return the specified div/class stated above.
I have tried many different CSS class elements the one in the above code is an example of the last one I tried.
Lastly here is my session info:
-----
bs4 4.9.1
requests 2.24.0
sinfo 0.3.1
-----
Python 3.7.3 (default, Apr 24 2020, 18:51:23) [Clang 11.0.3 (clang-1103.0.32.62)]
Darwin-19.5.0-x86_64-i386-64bit
8 logical CPU cores, i386
-----
Session information updated at 2020-07-13 16:18
Finally after going through the website network tab, I found how the website works. Basically it makes four different API calls behind the scenes. Those are as follows:
https://asia.ensembl.org/Multi/Ajax/search?q=name%3A%22CXCR4%22&rows=200&fq=feature_type%3AGene+AND+database_type%3Acore&facet.field=species&facet.mincount=1&facet=true
https://asia.ensembl.org/Multi/Ajax/search?q=(+NOT+species%3Axxx+)+AND+(+CXCR4+)+AND+(+NOT+species%3Ayyy+)&fq=&rows=1&facet.field=species&facet.field=feature_type&facet.field=strain&facet.mincount=1&facet=true&facet.limit=-1
https://asia.ensembl.org/Multi/Ajax/search?q=(+CXCR4%5E316+AND+species%3A%22CrossSpecies%22+)+OR+(+CXCR4%5E190+AND+species%3A%22Human%22+)+OR+(+CXCR4%5E80+AND+species%3A%22Mouse%22+)+OR+(+CXCR4+AND+species%3A%22Zebrafish%22+)&fq=(++(++species%3A%22CrossSpecies%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Human%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Mouse%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Zebrafish%22+AND+(+reference_strain%3A1+)++)++)&hl=true&hl.fl=_hr&hl.fl=content&hl.fl=description&hl.fragsize=500&rows=10&start=0
Hence, the results are combined results of the above 4 API calls and they appear at different pages in the website.
import requests
res = requests.get("https://asia.ensembl.org/Multi/Ajax/search?q=(+CXCR4%5E316+AND+species%3A%22CrossSpecies%22+)+OR+(+CXCR4%5E190+AND+species%3A%22Human%22+)+OR+(+CXCR4%5E80+AND+species%3A%22Mouse%22+)+OR+(+CXCR4+AND+species%3A%22Zebrafish%22+)&fq=(++(++species%3A%22CrossSpecies%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Human%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Mouse%22+AND+(+reference_strain%3A1+)++)++OR++(++species%3A%22Zebrafish%22+AND+(+reference_strain%3A1+)++)++)&hl=true&hl.fl=_hr&hl.fl=content&hl.fl=description&hl.fragsize=500&rows=10&start=0", verify=False)
result = res.json()
print(result)
Note*: Don't forget to use verify=False in your requests call else it will throw SSLException
Output:
{'error': '',
'result': {'highlighting': {'1d3be01c-f969-40de-a1f8-bfd5bbf40fc1': {},
'5b2accd3-cfef-4e2a-9d9c-2e70752e4a68': {'_hr': ['<strong><em>Cxcr4</em></strong>-001 (Vega transcript) is an external reference matched to Transcript ENSMUST00000052172']},
'd2f9e02b-f3f3-4823-9e39-3f727a265acb': {'_hr': ['GO:0031723 (GO record; description: <strong><em>CXCR4</em></strong> chemokine receptor binding,) is an external reference matched to Transcript ENST00000291526']},
'b66c389f-ade7-4bc6-bcd6-b7011e7bc10e': {'_hr': ['LRG_51t1 (LRG display in Ensembl transcript record; description: Locus Reference Genomic record for <strong><em>CXCR4</em></strong>) is an external reference matched to Transcript ENST00000409817']},
'dc70ef4d-7627-49d3-bfe8-f7e0c5fde994': {'_hr': ['<strong><em>Cxcr4</em></strong>-002 (Vega transcript) is an external reference matched to Transcript ENSMUST00000142893']},
'e7d394ec-fd37-4cc2-8a5c-81482299c695': {},
'8a02b397-ad39-420e-a4ed-89b709d4a3f5': {},
'2d5880cc-d9f6-4fec-a154-ce9b7ba3c590': {'_hr': ['LRG_51t1 (LRG display in Ensembl transcript record; description: Locus Reference Genomic record for <strong><em>CXCR4</em></strong>) is an external reference matched to Transcript ENST00000241393']},
'7f406926-0470-4c70-b8c7-f2bd8228be08': {'_hr': ['<strong><em>Cxcr4</em></strong>-001 (Vega transcript) is an external reference matched to Transcript ENSMUST00000052172']},
'cf47bc6b-6bd0-4690-a0ac-8feed5a5a112': {'_hr': ['LRG_51 (LRG display in Ensembl gene record; description: Locus Reference Genomic record for <strong><em>CXCR4</em></strong>,) is an external reference matched to Gene ENSG00000121966']}},
'responseHeader': {'QTime': 37,
'params': {'fq': '( ( species:"CrossSpecies" AND ( reference_strain:1 ) ) OR ( species:"Human" AND ( reference_strain:1 ) ) OR ( species:"Mouse" AND ( reference_strain:1 ) ) OR ( species:"Zebrafish" AND ( reference_strain:1 ) ) )',
'hl.fragsize': '500',
'hl.fl': ['_hr', 'content', 'description'],
'q': '( CXCR4^316 AND species:"CrossSpecies" ) OR ( CXCR4^190 AND species:"Human" ) OR ( CXCR4^80 AND species:"Mouse" ) OR ( CXCR4 AND species:"Zebrafish" )',
'hl': 'true',
'wt': 'json',
'start': ['0', '0'],
'rows': '10'},
'status': 0},
'response': {'numFound': 24,
'docs': [{'domain_url': 'homo_sapiens/Gene/Summary?g=ENSG00000121966&db=core',
'name': 'CXCR4',
'species': 'Human',
'ref_boost': 10,
'location': '2:136114349-136118149:-1',
'quick_links': ['orthologues:1'],
'db_boost': 40,
'website': 'http://www.ensembl.org',
'reference_strain': 1,
'id': 'ENSG00000121966',
'domain': 'http://www.ensembl.org',
'uid': 'cf47bc6b-6bd0-4690-a0ac-8feed5a5a112',
'feature_type': 'Gene',
'description': 'C-X-C motif chemokine receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]',
'score': 3.3581953,
'database_type': 'core'},
{'feature_type': 'Transcript',
'score': 2.238805,
'database_type': 'core',
'description': 'C-X-C motif chemokine receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]',
'reference_strain': 1,
'website': 'http://www.ensembl.org',
'db_boost': 40,
'uid': '2d5880cc-d9f6-4fec-a154-ce9b7ba3c590',
'domain': 'http://www.ensembl.org',
'id': 'ENST00000241393',
'name': 'CXCR4-201',
'location': '2:136114349-136118149:-1',
'quick_links': ['protein:1'],
'ref_boost': 10,
'species': 'Human',
'domain_url': 'homo_sapiens/Transcript/Summary?t=ENST00000241393&db=core'},
{'feature_type': 'Transcript',
'description': 'C-X-C motif chemokine receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]',
'database_type': 'core',
'score': 2.238805,
'website': 'http://www.ensembl.org',
'db_boost': 40,
'reference_strain': 1,
'domain': 'http://www.ensembl.org',
'id': 'ENST00000409817',
'uid': 'b66c389f-ade7-4bc6-bcd6-b7011e7bc10e',
'name': 'CXCR4-202',
'location': '2:136114349-136116243:-1',
'quick_links': ['protein:1'],
'species': 'Human',
'ref_boost': 10,
'domain_url': 'homo_sapiens/Transcript/Summary?t=ENST00000409817&db=core'},
{'name': 'CXCR4-203',
'quick_links': ['protein:0'],
'location': '2:136114637-136117737:-1',
'species': 'Human',
'ref_boost': 10,
'domain_url': 'homo_sapiens/Transcript/Summary?t=ENST00000466288&db=core',
'feature_type': 'Transcript',
'description': 'C-X-C motif chemokine receptor 4 [Source:HGNC Symbol;Acc:HGNC:2561]',
'database_type': 'core',
'score': 2.238805,
'website': 'http://www.ensembl.org',
'db_boost': 40,
'reference_strain': 1,
'domain': 'http://www.ensembl.org',
'id': 'ENST00000466288',
'uid': '1d3be01c-f969-40de-a1f8-bfd5bbf40fc1'},
{'domain_url': 'mus_musculus/Gene/Summary?g=ENSMUSG00000045382&db=core',
'strain': 'Mouse reference (CL57BL6)',
'name': 'Cxcr4',
'ref_boost': 10,
'species': 'Mouse',
'quick_links': ['orthologues:1'],
'location': '1:128588199-128592293:-1',
'db_boost': 40,
'website': 'http://www.ensembl.org',
'reference_strain': 1,
'id': 'ENSMUSG00000045382',
'domain': 'http://www.ensembl.org',
'uid': '5b2accd3-cfef-4e2a-9d9c-2e70752e4a68',
'feature_type': 'Gene',
'description': 'chemokine (C-X-C motif) receptor 4 [Source:MGI Symbol;Acc:MGI:109563]',
'score': 1.4139885,
'database_type': 'core'},
{'location': '1:128588199-128592290:-1',
'quick_links': ['protein:1'],
'species': 'Mouse',
'ref_boost': 10,
'name': 'Cxcr4-201',
'strain': 'Mouse reference (CL57BL6)',
'domain_url': 'mus_musculus/Transcript/Summary?t=ENSMUST00000052172&db=core',
'score': 0.9426663,
'database_type': 'core',
'description': 'chemokine (C-X-C motif) receptor 4 [Source:MGI Symbol;Acc:MGI:109563]',
'feature_type': 'Transcript',
'uid': '7f406926-0470-4c70-b8c7-f2bd8228be08',
'domain': 'http://www.ensembl.org',
'id': 'ENSMUST00000052172',
'reference_strain': 1,
'website': 'http://www.ensembl.org',
'db_boost': 40},
{'reference_strain': 1,
'website': 'http://www.ensembl.org',
'db_boost': 40,
'uid': 'dc70ef4d-7627-49d3-bfe8-f7e0c5fde994',
'domain': 'http://www.ensembl.org',
'id': 'ENSMUST00000142893',
'feature_type': 'Transcript',
'score': 0.9426663,
'database_type': 'core',
'description': 'chemokine (C-X-C motif) receptor 4 [Source:MGI Symbol;Acc:MGI:109563]',
'domain_url': 'mus_musculus/Transcript/Summary?t=ENSMUST00000142893&db=core',
'strain': 'Mouse reference (CL57BL6)',
'name': 'Cxcr4-202',
'location': '1:128589099-128592293:-1',
'quick_links': ['protein:1'],
'species': 'Mouse',
'ref_boost': 10},
{'reference_strain': 1,
'website': 'http://www.ensembl.org',
'uid': 'e7d394ec-fd37-4cc2-8a5c-81482299c695',
'id': 'Cxcr4',
'domain': 'http://www.ensembl.org',
'feature_type': 'Marker',
'database_type': 'core',
'score': 0.01179975,
'domain_url': 'mus_musculus/Marker/Details?m=Cxcr4',
'strain': 'Mouse reference (CL57BL6)',
'species': 'Mouse'},
{'domain_url': 'homo_sapiens/Gene/Summary?g=ENSG00000160181&db=core',
'ref_boost': 10,
'species': 'Human',
'quick_links': ['orthologues:1'],
'location': '21:42346357-42350997:-1',
'name': 'TFF2',
'id': 'ENSG00000160181',
'domain': 'http://www.ensembl.org',
'uid': 'd2f9e02b-f3f3-4823-9e39-3f727a265acb',
'db_boost': 40,
'website': 'http://www.ensembl.org',
'reference_strain': 1,
'description': 'trefoil factor 2 [Source:HGNC Symbol;Acc:HGNC:11756]',
'database_type': 'core',
'score': 0.0072125974,
'feature_type': 'Gene'},
{'feature_type': 'Protein Family',
'description': 'Ensembl protein family PTHR24227 [C C CHEMOKINE RECEPTOR TYPE C C CKR CC CKR CCR ANTIGEN]: 27 genes / 77 proteins in homo sapiens',
'score': 0.004816582,
'database_type': 'core',
'website': 'http://www.ensembl.org',
'reference_strain': 1,
'domain': 'http://www.ensembl.org',
'id': 'PTHR24227',
'uid': '8a02b397-ad39-420e-a4ed-89b709d4a3f5',
'name': 'PTHR24227',
'species': 'Human',
'domain_url': 'homo_sapiens/Gene/Family?family=PTHR24227;g=ENSG00000163464'}],
'start': 0,
'maxScore': 3.3581953}}}

Getting a KeyError: venues error in FourSquare/Python call

OK, I'm a newbie and I think I'm doing everything I should be, but I am still getting a KeyError: venues. (I also tried using "venue" instead and I am not at my maximum quota for the day at FourSquare)... I am using a Jupyter Notebook to do this
Using this code:
VERSION = '20200418'
RADIUS = 1000
LIMIT = 2
**url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, RADIUS, LIMIT)
url
results = requests.get(url).json()**
I get 2 results (shown at end of this post)
When I try to take those results and put them into a dataframe, i get "KeyError: venues"
# assign relevant part of JSON to venues
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-29-5acf500bf9ad> in <module>
1 # assign relevant part of JSON to venues
----> 2 venues = results['response']['venues']
3
4 # tranform venues into a dataframe
5 dataframe = json_normalize(venues)
KeyError: 'venues'
I'm not really sure where I am going wrong... This has worked for me with other locations... But then again, like I said, I'm new at this... (I haven't maxed out my queries, and I've tried using "venue" instead)... Thank you
FourSquareResults:
{'meta': {'code': 200, 'requestId': '5ec42de01a4b0a001baa10ff'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'warning': {'text': "There aren't a lot of results near you. Try something more general, reset your filters, or expand the search area."},
'headerLocation': 'Cranford',
'headerFullLocation': 'Cranford',
'headerLocationGranularity': 'city',
'totalResults': 20,
'suggestedBounds': {'ne': {'lat': 40.67401708586377,
'lng': -74.29300815204098},
'sw': {'lat': 40.65601706786374, 'lng': -74.31669390523408}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4c13c8d2b7b9c928d127aa37',
'name': 'Cranford Canoe Club',
'location': {'address': '250 Springfield Ave',
'crossStreet': 'Orange Avenue',
'lat': 40.66022488705574,
'lng': -74.3061084180977,
'labeledLatLngs': [{'label': 'display',
'lat': 40.66022488705574,
'lng': -74.3061084180977},
{'label': 'entrance', 'lat': 40.660264, 'lng': -74.306191}],
'distance': 543,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['250 Springfield Ave (Orange Avenue)',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4f4528bc4b90abdf24c9de85',
'name': 'Athletics & Sports',
'pluralName': 'Athletics & Sports',
'shortName': 'Athletics & Sports',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/sports_outdoors_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []},
'venuePage': {'id': '60380091'}},
'referralId': 'e-0-4c13c8d2b7b9c928d127aa37-0'},
{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '4d965995e07ea35d07e2bd02',
'name': 'Mizu Sushi',
'location': {'address': '103 Union Ave.',
'lat': 40.65664427772896,
'lng': -74.30343966195308,
'labeledLatLngs': [{'label': 'display',
'lat': 40.65664427772896,
'lng': -74.30343966195308}],
'distance': 939,
'postalCode': '07016',
'cc': 'US',
'city': 'Cranford',
'state': 'NJ',
'country': 'United States',
'formattedAddress': ['103 Union Ave.',
'Cranford, NJ 07016',
'United States']},
'categories': [{'id': '4bf58dd8d48988d1d2941735',
'name': 'Sushi Restaurant',
'pluralName': 'Sushi Restaurants',
'shortName': 'Sushi',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/sushi_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-4d965995e07ea35d07e2bd02-1'}]}]}}
Look more closely at response that you're getting - there's no "venues" key there. Closest one that I see is "groups" list, which has "items" list in it, and individual items have "venue" key in them.

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file

{'meta': {'code': 200, 'requestId': '5e7c703bb9a389001b7d1e8c'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Lagos',
'headerFullLocation': 'Lagos',
'headerLocationGranularity': 'city',
'totalResults': 39,
'suggestedBounds': {'ne': {'lat': 6.655478745000045,
'lng': 3.355524537252914},
'sw': {'lat': 6.565478654999954, 'lng': 3.2650912627470863}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '502806dce4b0f23b021f3b77',
'name': 'KFC',
'location': {'lat': 6.604589745106469,
'lng': 3.3089358809010045,
'labeledLatLngs': [{'label': 'display',
'lat': 6.604589745106469,
'lng': 3.3089358809010045}],
'distance': 672,
'cc': 'NG',
'city': 'Egbeda',
'state': 'Lagos',
'country': 'Nigeria',
'formattedAddress': ['Egbeda', 'Lagos', 'Nigeria']},
'categories': [{'id': '4bf58dd8d48988d16e941735',
'name': 'Fast Food Restaurant',
'pluralName': 'Fast Food Restaurants',
'shortName': 'Fast Food',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-502806dce4b0f23b021f3b77-0'},
That is a part of my file about called 'results'
I then
def getCAT(row):
try:
categories_list=row['categories']
except:
categories_list=row['venue.categories']
if len(categories_list)==0:
return None
else:
return categories_list[0]['name']
venues=results['response']['groups'][0]['items']
nearby_venues=pd.json_normalize(venues)
filtered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']
nearby_venues= nearby_venues.loc[: , filtered_cols]
nearby_venues['venue.categories']=nearby_venues.apply(getCAT, axis=1)
nearby_venues.columns=[col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()
I get KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file.
if I comment out that part, it runs well but with limited result. What am I doing wrong?
pandas.DataFrame.loc
property DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
Try to remove the venue. from the line iltered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']

Get value from data-set field sublist

I have a dataset (that pull its data from a dict) that I am attempting to clean and republish. Within this data set, there is a field with a sublist that I would like to extract specific data from.
Here's the data:
[{'id': 'oH58h122Jpv47pqXhL9p_Q', 'alias': 'original-pizza-brooklyn-4', 'name': 'Original Pizza', 'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/HVT0Vr_Vh52R_niODyPzCQ/o.jpg', 'is_closed': False, 'url': 'https://www.yelp.com/biz/original-pizza-brooklyn-4?adjust_creative=IelPnWlrTpzPtN2YRie19A&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=IelPnWlrTpzPtN2YRie19A', 'review_count': 102, 'categories': [{'alias': 'pizza', 'title': 'Pizza'}], 'rating': 4.0, 'coordinates': {'latitude': 40.63781, 'longitude': -73.8963799}, 'transactions': [], 'price': '$', 'location': {'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}, 'phone': '+17185313559', 'display_phone': '(718) 531-3559', 'distance': 319.98144420799355},
Here's how the data is presented within the csv/spreadsheet:
location
{'address1': '9514 Ave L', 'address2': '', 'address3': '', 'city': 'Brooklyn', 'zip_code': '11236', 'country': 'US', 'state': 'NY', 'display_address': ['9514 Ave L', 'Brooklyn, NY 11236']}
Is there a way to pull location.city for example?
The below code simply adds a few fields and exports it to a csv.
def data_set(data):
df = pd.DataFrame(data)
df['zip'] = get_zip()
df['region'] = get_region()
newdf = df.filter(['name', 'phone', 'location', 'zip', 'region', 'coordinates', 'rating', 'review_count',
'categories', 'url'], axis=1)
if not os.path.isfile('yelp_data.csv'):
newdf.to_csv('data.csv', header='column_names')
else: # else it exists so append without writing the header
newdf.to_csv('data.csv', mode='a', header=False)
If that doesn't make sense, please let me know. Thanks in advance!

Categories

Resources