Iterate through dictionary to replace leading zeros?

Iterate through dictionary to replace leading zeros? - python

I want to iterate through this dictionary and find any 'id' that has a leading zero, like the one below, and replace it without the zero. So 'id': '01001' would become 'id': '1001'
Here is how to get the data I'm working with:
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response)
so far I've been able to get one ID at a time, but not sure how to loop through to get all of the IDs:
My code so far: counties['features'][0]['id']
{ 'type': 'FeatureCollection',
'features': [{'type': 'Feature',
'properties': {'GEO_ID': '0500000US01001',
'STATE': '01',
'COUNTY': '001',
'NAME': 'Autauga',
'LSAD': 'County',
'CENSUSAREA': 594.436},
'geometry': {'type': 'Polygon',
'coordinates': [[[-86.496774, 32.344437],
[-86.717897, 32.402814],
[-86.814912, 32.340803],
[-86.890581, 32.502974],
[-86.917595, 32.664169],
[-86.71339, 32.661732],
[-86.714219, 32.705694],
[-86.413116, 32.707386],
[-86.411172, 32.409937],
[-86.496774, 32.344437]]]},
'id': '01001'}
]
}

from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response)
Then iterate over the list if id's with your JSON structure. And update the id
as
counties['features'][0]['id'] = counties['features'][0]['id'].lstrip("0")
lstrip will remove leading zeroes from the string.

Suppose your dictionary counties has the following data. You can use the following code:
counties={'type': 'FeatureCollection',
'features': [ {'type': 'Feature','properties': {'GEO_ID': '0500000US01001','STATE': '01','COUNTY': '001','NAME': 'Autauga', 'LSAD': 'County','CENSUSAREA': 594.436},
'geometry': {'type': 'Polygon','coordinates': [[[-86.496774, 32.344437],[-86.717897, 32.402814],[-86.814912, 32.340803],
[-86.890581, 32.502974],
[-86.917595, 32.664169],
[-86.71339, 32.661732],
[-86.714219, 32.705694],
[-86.413116, 32.707386],
[-86.411172, 32.409937],
[-86.496774, 32.344437] ]] } ,'id': '01001'}, {'type': 'Feature','properties': {'GEO_ID': '0500000US01001','STATE': '01','COUNTY': '001','NAME': 'Autauga', 'LSAD': 'County','CENSUSAREA': 594.436},
'geometry': {'type': 'Polygon','coordinates': [[[-86.496774, 32.344437],[-86.717897, 32.402814],[-86.814912, 32.340803],
[-86.890581, 32.502974],
[-86.917595, 32.664169],
[-86.71339, 32.661732],
[-86.714219, 32.705694],
[-86.413116, 32.707386],
[-86.411172, 32.409937],
[-86.496774, 32.344437] ]] } ,'id': '000000000001001'} ]}
for feature in counties['features']:
feature ['id']=feature ['id'].lstrip("0")
print(counties)

Here is shorter and faster way of doing this using json object hooks,
def stripZeroes(d):
if 'id' in d:
d['id'] = d['id'].lstrip('0')
return d
return d
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response, object_hook=stripZeroes)

Related

How to load a nested json file into a pandas DataFrame

please help I cannot seem to get the json data into a Dataframe.
loaded the data
data =json.load(open(r'path'))#this works fine and displays:
json data
{'type': 'FeatureCollection', 'name': 'Altstadt Nord', 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}}, 'features': [{'type': 'Feature', 'properties': {'Name': 'City-Martinsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9595637, 50.9418396], [6.956624, 50.9417382], [6.9543173, 50.941603], [6.9529869, 50.9413664], [6.953062, 50.9408593], [6.9532873, 50.9396289], [6.9533624, 50.9388176], [6.9529333, 50.9378373], [6.9527509, 50.9371815], [6.9528367, 50.9360659], [6.9532122, 50.9352884], [6.9540705, 50.9350653], [6.9553258, 50.9350044], [6.9568815, 50.9351667], [6.9602074, 50.9355047], [6.9608189, 50.9349165], [6.9633939, 50.9348827], [6.9629433, 50.9410622], [6.9616236, 50.9412176], [6.9603898, 50.9414881], [6.9595637, 50.9418396]]]}}, {'type': 'Feature', 'properties': {'Name': 'Gereonsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9410622], [6.9629433, 50.9431646], [6.9611408, 50.9433539], [6.9601752, 50.9436649], [6.9588234, 50.9443409], [6.9579651, 50.9449763], [6.9573213, 50.945801], [6.9563128, 50.9451926], [6.9551756, 50.9448546], [6.9535663, 50.9446518], [6.9523432, 50.9449763], [6.9494464, 50.9452602], [6.9473435, 50.9454495], [6.9466998, 50.9456928], [6.9458415, 50.946531], [6.9434168, 50.9453954], [6.9424726, 50.9451926], [6.9404342, 50.9429888], [6.9404771, 50.9425156], [6.9403269, 50.9415016], [6.9400479, 50.9405281], [6.9426228, 50.9399872], [6.9439103, 50.9400143], [6.9453051, 50.9404875], [6.9461634, 50.9408931], [6.9467427, 50.941096], [6.9475581, 50.9410013], [6.9504227, 50.9413191], [6.9529869, 50.9413664], [6.9547464, 50.9416368], [6.9595637, 50.9418396], [6.9603898, 50.9414881], [6.9616236, 50.9412176], [6.9629433, 50.9410622]]]}}, {'type': 'Feature', 'properties': {'Name': 'Kunibertsviertel', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9629433, 50.9431646], [6.9637129, 50.9454917], [6.9651506, 50.9479252], [6.9666097, 50.9499124], [6.9667599, 50.9500882], [6.9587777, 50.9502504], [6.9573213, 50.945801], [6.9579651, 50.9449763], [6.9588234, 50.9443409], [6.9601752, 50.9436649], [6.9611408, 50.9433539], [6.9629433, 50.9431646]]]}}, {'type': 'Feature', 'properties': {'Name': 'Nördlich Neumarkt', 'description': None}, 'geometry': {'type': 'Polygon', 'coordinates': [[[6.9390331, 50.9364418], [6.9417153, 50.9358738], [6.9462214, 50.9358062], [6.9490109, 50.9355628], [6.9505129, 50.9353329], [6.9523798, 50.9352924], [6.9532122, 50.9352884], [6.9528367, 50.9360659], [6.9527509, 50.9371815], [6.9529333, 50.9378373], [6.9533624, 50.9388176], [6.9532381, 50.9398222], [6.9529869, 50.9413664], [6.9504227, 50.9413191], [6.9475581, 50.9410013], [6.9467427, 50.941096], [6.9453051, 50.9404875], [6.9439103, 50.9400143], [6.9424663, 50.9399574], [6.9400479, 50.9405281], [6.9390331, 50.9364418]]]}}]}
now i cannot seem to fit it into a Dataframe //
pd.DataFrame(data) --> ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.full error
I tried to flatten with json_flatten but ModuleNotFoundError: No module named 'flatten_json' even though I installed json-flatten via pip install
also tried df =pd.DataFrame.from_dict(data,orient='index')
df
Out[22]:
0
type FeatureCollection
name Altstadt Nord
crs {'type': 'name', 'properties': {'name': 'urn:o...
features [{'type': 'Feature', 'properties': {'Name': 'C...
df Out[22]

I think you can use json_normalize to load them to pandas.
test.json in this case is your full json file (with double quotes).
import json
from pandas.io.json import json_normalize
with open('path_to_json.json') as f:
data = json.load(f)
df = json_normalize(data, record_path=['features'], meta=['name'])
print(df)
This results in a dataframe as shown below.
You can further add record field in the normalize method to create more columns for the polygon coordinates.
You can find more documentation at https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html
Hope that helps.

The json data contains elements with different datatypes and these cannot be loaded into one single dataframe.
View datatypes in the json:
[type(data[k]) for k in data.keys()]
# Out: [str, str, dict, list]
data.keys()
# Out: dict_keys(['type', 'name', 'crs', 'features'])
You can load each single chunk of data in a separate dataframe like this:
df_crs = pd.DataFrame(data['crs'])
df_features = pd.DataFrame(data['features'])
data['type'] and data['name'] are strings
data['type']
# Out 'FeatureCollection'
data['name']
# Out 'Altstadt Nord'

Trying to follow django docs to create serialized json

Trying to seed a database in django app. I have a csv file that I converted to json and now I need to reformat it to match the django serialization required format found here
This is what the json format needs to look like to be acceptable to django (Which looks an awful lot like a dictionary with 3 keys, the third having a value which is a dictionary itself):
[
{
"pk": "4b678b301dfd8a4e0dad910de3ae245b",
"model": "sessions.session",
"fields": {
"expire_date": "2013-01-16T08:16:59.844Z",
...
}
}
]
My json data looks like this after converting it from csv with pandas:
[{'model': 'homepage.territorymanager', 'pk': 1, 'Name': 'Aaron ##', 'Distributor': 'National Energy', 'State': 'BC', 'Brand': 'Trane', 'Cell': '778-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None}, {'model': 'homepage.territorymanager', 'pk': 2, 'Name': 'Aaron Martin ', 'Distributor': 'Pierce ###', 'State': 'PA', 'Brand': 'Bryant/Carrier', 'Cell': '267-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None},...]
I am using this function to try and reformat
def re_serialize_reg_json(d, jsonFilePath):
for i in d:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
d[i] = {'pk': d[i]['pk'],'model' : d[i]['model'], 'fields' : d2}
print(d)
and it returns this error which doesn't make any sense because the format that django requires has a dictionary as the value of the third key:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
TypeError: list indices must be integers or slices, not dict
Any help appreciated!
Here is what I did to get d:
df = pandas.read_csv('/Users/justinbenfit/territorymanagerpython/territory managers - Sheet1.csv')
df.to_json('/Users/justinbenfit/territorymanagerpython/territorymanagers.json', orient='records')
jsonFilePath = '/Users/justinbenfit/territorymanagerpython/territorymanagers.json'
def load_file(file_path):
with open(file_path) as f:
d = json.load(f)
return d
d = load_file(jsonFilePath)
print(d)

D is actually a list containing multiple dictionaries, so in order to make it work you want to change that for i in d part to: for i in range(len(d)).

Flattening of JSON for analysis

Require a small help. I need to flatten this json so that i can use it for analysis.
Sample for the json is :
{'data': [{'tag': 'U128_CRC_2', 'timestamp': 1575234889002, 'value': 0.0}],
'metadata': {'event': 'alarm.reset',
'idx': '1372',
'timestamp': 1575234889.002701},
'productID': '41ae4b41-56be-4bf2-a6a8-7aee4d15bf54',
'timestamp': 1575234889008,
'topicIdx': '1'}
I ran the following code :
from pandas import json_normalize
with open('NewJson.json') as f:
d1 = json.load(f)
works_data = json_normalize(data=d1, record_path='data',
meta=['tag','value','timestamp'])
I get the below error for the same:
KeyError: "Try running with errors='ignore' as key 'tag' is not always present"
Can anybody please help

Your problem is that the 'data' key is a list and a dictionary. You have to remove the list manually:
Example:
from pandas.io.json import json_normalize
d1 = {'data': [{'tag': 'U128_CRC_2', 'timestamp': 1575234889002, 'value': 0.0}],
'metadata': {'event': 'alarm.reset',
'idx': '1372',
'timestamp': 1575234889.002701},
'productID': '41ae4b41-56be-4bf2-a6a8-7aee4d15bf54',
'timestamp': 1575234889008,
'topicIdx': '1'}
d1['data'] = d1.get('data')[0]
works_data = json_normalize(data=d1)
works_data
Output:
productID timestamp topicIdx data.tag data.timestamp data.value metadata.event metadata.idx metadata.timestamp
0 41ae4b41-56be-4bf2-a6a8-7aee4d15bf54 1575234889008 1 U128_CRC_2 1575234889002 0.0 alarm.reset 1372 1.575235e+09

Extract specific keys from list of dict in python. Sentinelhub

I seem to be stuck on very simple task. I'm still dipping my toes into Python.
I'm trying to download Sentinel 2 Images with SentinelHub API:SentinelHub
The result of data that my code returns is like this:
{'geometry': {'coordinates': [[[[35.895906644, 31.602691754],
[36.264307655, 31.593801516],
[36.230618703, 30.604681346],
[35.642363693, 30.617971909],
[35.678587829, 30.757888786],
[35.715700562, 30.905919341],
[35.754290061, 31.053632806],
[35.793289298, 31.206946419],
[35.895906644, 31.602691754]]]],
'type': 'MultiPolygon'},
'id': 'ee923fac-0097-58a8-b861-b07d89b99310',
'properties': {'**productType**': '**S2MSI1C**',
'centroid': {'coordinates': [18.1321538275, 31.10368655], 'type': 'Point'},
'cloudCover': 10.68,
'collection': 'Sentinel2',
'completionDate': '2017-06-07T08:15:54Z',
'description': None,
'instrument': 'MSI',
'keywords': [],
'license': {'description': {'shortName': 'No license'},
'grantedCountries': None,
'grantedFlags': None,
'grantedOrganizationCountries': None,
'hasToBeSigned': 'never',
'licenseId': 'unlicensed',
'signatureQuota': -1,
'viewService': 'public'},
'links': [{'href': 'http://opensearch.sentinel-hub.com/resto/collections/Sentinel2/ee923fac-0097-58a8-b861-b07d89b99310.json?&lang=en',
'rel': 'self',
'title': 'GeoJSON link for ee923fac-0097-58a8-b861-b07d89b99310',
'type': 'application/json'}],
'orbitNumber': 10228,
'organisationName': None,
'parentIdentifier': None,
'platform': 'Sentinel-2',
'processingLevel': '1C',
'productIdentifier': 'S2A_OPER_MSI_L1C_TL_SGS__20170607T120016_A010228_T36RYV_N02.05',
'published': '2017-07-26T13:09:17.405352Z',
'quicklook': None,
'resolution': 10,
's3Path': 'tiles/36/R/YV/2017/6/7/0',
's3URI': 's3://sentinel-s2-l1c/tiles/36/R/YV/2017/6/7/0/',
'sensorMode': None,
'services': {'download': {'mimeType': 'text/html',
'url': 'http://sentinel-s2-l1c.s3-website.eu-central-1.amazonaws.com#tiles/36/R/YV/2017/6/7/0/'}},
'sgsId': 2168915,
'snowCover': 0,
'spacecraft': 'S2A',
'startDate': '2017-06-07T08:15:54Z',
'thumbnail': None,
'title': 'S2A_OPER_MSI_L1C_TL_SGS__20170607T120016_A010228_T36RYV_N02.05',
'updated': '2017-07-26T13:09:17.405352Z'},
'type': 'Feature'}
Can you explain how can I iterate through this set of data and extract only 'productType'? For example, if there are several similar data sets it would return only different product types.
My code is :
import matplotlib.pyplot as plt
import numpy as np
from sentinelhub import AwsProductRequest, AwsTileRequest, AwsTile, BBox, CRS
betsiboka_coords_wgs84 = [31.245117,33.897777,34.936523,36.129002]
bbox = BBox(bbox=betsiboka_coords_wgs84, crs=CRS.WGS84)
date= '2017-06-05',('2017-06-08')
data=sentinelhub.opensearch.get_area_info(bbox, date_interval=date, maxcc=None)
for i in data:
print(i)

Based on what you have provided, replace your bottom for loop:
for i in data:
print(i)
with the following:
for i in data:
print(i['properties']['**productType**'])

If you want to access only the propertyType you can use i['properties']['productType'] in your for loop. If you want to access it any time you want without writing each time those keys, you can define a generator like this:
def property_types(data_array):
for data in data_array
yield data['properties']['propertyType']
So you can use it like this in a loop (your data_array is data, as returned by sentinelhub api):
for property_type in property_types(data):
# do stuff with property_type

keys = []
for key in d.keys():
if key == 'properties':
for k in d[key].keys():
if k == '**productType**' and k not in keys:
keys.append(d[key][k])
print(keys)

Getting only specific (nested) values: Since your request key is nested, and resides inside the parent "properties" object, you need to access it first, preferably using the get method. This can be done as follows (note the '{}' parameter in the first get, this returns an empty dictionary if the first key is not present)
data_dictionary = json.loads(data_string)
product_type = data_dictionary.get('properties', {}).get('**productType**')
You can then aggregate the different product_type objects in a set, which will automatically guarantee that no 2 objects are the same
product_type_set = set()
product_type.add(product_type)

Iterating over a dict object that contains nested elements

Python 5.6
Here is the result from a call using the geocoder module
import geocoder
anaddress = 'State Street, Hood River, OR'
g = geocoder.arcgis(anaddress)
d = g.geojson
print(d)
{'geometry': {'type': 'Point', 'coordinates': [-121.52181774656506, 45.707876183969184]}, 'type': 'Feature', 'properties':
{'provider': 'arcgis', 'ok': True, 'location': '1037 State St, Hood River, OR', 'lat': 45.707876183969184, 'lng': -121.52
181774656506, 'bbox': [-121.52281774656507, 45.706876183969186, -121.52081774656506, 45.70887618396918], 'encoding': 'utf-
8', 'status': 'OK', 'address': '1037 State St, Hood River, Oregon, 97031', 'status_code': 200, 'confidence': 9}, 'bbox': [
-121.52281774656507, 45.706876183969186, -121.52081774656506, 45.70887618396918]}
How can I iterate through this structure and print it out nicely?

Is your goal, only to the print the structure or to parse it as well?
In case you want to just print your output nicely, try this
from pprint import pprint
pprint(d)
This shall provide you with a nicely printed structure.
In order to parse this, you can do it as you would with any dictionary object using keys and values.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterate through dictionary to replace leading zeros? - python

Related

How to load a nested json file into a pandas DataFrame

Trying to follow django docs to create serialized json

Flattening of JSON for analysis

Extract specific keys from list of dict in python. Sentinelhub

Iterating over a dict object that contains nested elements

Categories

Resources