Converting to list of dictionary - python

I have a text file filled with place data provided by twitter api. Here is the sample data of 2 lines
{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
{'country': 'India', 'full_name': 'New Delhi, India', 'id': '317fcc4b21a604d5', 'country_code': 'IN', 'name': 'New Delhi', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[76.84252, 28.397657], [77.347652, 28.397657], [77.347652, 28.879322], [76.84252, 28.879322]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/317fcc4b21a604d5.json'}
I want 'country', 'name' and 'cordinates' filed of each line.In order to do this we need to iterate line by line the entire file.so i append each line to a list
data = []
with open('place.txt','r') as f:
for line in f:
data.append(line)
when i checked the data type it shows as 'str' instead of 'dict'.
type(data[0])
str
data[0].keys()
AttributeError: 'str' object has no attribute 'keys'
how to fix this so that it can be saved as list of dictionaries.
Originally tweets were encoded and decoded by following code:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n') #encoded and saved to a .txt file
tweets.append(jsonpickle.decode(line)) # decoding
And place data file is saved by following code:
fName = "place.txt"
newLine = "\n"
with open(fName, 'a', encoding='utf-8') as f:
for i in range(len(tweets)):
f.write('{}'.format(tweets[i]['place']) +'\n')

In your case you should use json to do the data parsing. But if you have a problem with json (which is almost impossible since we are talking about an API ), then in general to convert from string to dictionary you can do:
>>> import ast
>>> x = "{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
"
>>> d = ast.literal_eval(x)
>>> d
d now is a dictionary instead of a string.
But again if your data are in json format python has a built-in lib to handle json format, and is better and safer to use json than ast.
For example if you get a response let's say resp you could simply do:
response = json.loads(resp)
and now you could parse response as a dictionary.

Note: Single quotes are not valid JSON.
I have never tried Twitter API. Looks like your data are not valid JSON. Here is a simple preprocess method to replace '(single quote) into "(double quote)
data = "{'country': 'United Kingdom', ... }"
json_data = data.replace('\'', '\"')
dict_data = json.loads(json_data)
dict_data.keys()
# [u'full_name', u'url', u'country', ... ]

You should use python json library for parsing and getting the value.
In python it's quite easy.
import json
x = '{"country": "United Kingdom", "full_name": "Dorridge, England", "id": "31fe56e2e7d5792a", "country_code": "GB", "name": "Dorridg", "attributes": {}, "contained_within": [], "place_type": "city", "bounding_box": {"coordinates": [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], "type": "Polygon"}, "url": "https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json"}'
y = json.loads(x)
print(y["country"],y["name"],y["bounding_box"]["coordinates"])

You can use list like this
mlist= list()
for i in ndata.keys():
mlist.append(i)

Related

How can I get attribute valus of a JSON using JMESPath

I need to get the value of _ISSUE_CURRENCY.
I have a JSON which is as below:
{
'#value': 'VR-GROUP PLC',
'_ISSUE_CURRENCY': 'EUR',
'_PRICING_MULTIPLIER': 1,
'_TYPE': 'Debt',
'_SETTLEMENT_CALENDAR_ID': 'Tgt',
'_SUBTYPE': 'Bond',
'_IS_UNIT_TRADED': 'N',
'_ISSUE_STATUS': 'Active',
'_OWNERSHIP_TYPE': 'Unknown',
'_ISSUE_METHOD': 'Unknown',
'_DENOMINATION_CURRENCY': 'EUR'
}
My code so far:
f_asset = open(f"{tempdir}\\cdwassets_all.csv").read().replace("\n", "")
json_obj_asset = json.loads(f_asset, strict=False)
try:
issue_cur = jmespath.search("validatedAsset.assetName", doc)
except:
issue_cur = ''
# currency.append(issue_cur)
print(issue_cur)
# output:
{'#value': 'VR-GROUP PLC', '_ISSUE_CURRENCY': 'EUR', '_PRICING_MULTIPLIER': 1, '_TYPE': 'Debt', '_SETTLEMENT_CALENDAR_ID': 'Tgt', '_SUBTYPE': 'Bond', '_IS_UNIT_TRADED': 'N', '_ISSUE_STATUS': 'Active', '_OWNERSHIP_TYPE': 'Unknown', '_ISSUE_METHOD': 'Unknown', '_DENOMINATION_CURRENCY': 'EUR'}
I tried to do it this way, but without success.
issue_cur = jmespath.search("validatedAsset.assetName", doc)["_ISSUE_CURRENCY"]
print(issue_cur)
# output
{'#value': 'VR-GROUP PLC', '_ISSUE_CURRENCY': 'EUR', '_PRICING_MULTIPLIER': 1, '_TYPE': 'Debt', '_SETTLEMENT_CALENDAR_ID': 'Tgt', '_SUBTYPE': 'Bond', '_IS_UNIT_TRADED': 'N', '_ISSUE_STATUS': 'Active', '_OWNERSHIP_TYPE': 'Unknown', '_ISSUE_METHOD': 'Unknown', '_DENOMINATION_CURRENCY': 'EUR'}
I need to verify that _ISSUE_CURRENCY attribute exists.
You are stating:
I have a JSON which is as below
This is not a JSON, as described in the RFC 7159, describing what is a valid JavaScript Object Notation (JSON), the quotation mark that delimits strings is the character %x22, so a double quote ".reference
This should actually be raised by the json.loads function call of your code already, the script:
import json
json_data = """
{
'validatedAsset': {
'assetName': {
'#value': 'VR-GROUP PLC',
'_ISSUE_CURRENCY': 'EUR',
'_PRICING_MULTIPLIER': 1,
'_TYPE': 'Debt',
'_SETTLEMENT_CALENDAR_ID': 'Tgt',
'_SUBTYPE': 'Bond',
'_IS_UNIT_TRADED': 'N',
'_ISSUE_STATUS': 'Active',
'_OWNERSHIP_TYPE': 'Unknown',
'_ISSUE_METHOD': 'Unknown',
'_DENOMINATION_CURRENCY': 'EUR'
}
}
}
"""
json.loads(json_data, strict=False)
Would raise:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 3 column 3 (char 5)
So, I am going to assume that what you are calling a valid JSON is the actual print of the json.loads function, which is then a valid Python dictionary.
So, you just have to modify your JMESPath query to get the _ISSUE_CURRENCY, and make it this query:
validatedAsset.assetName._ISSUE_CURRENCY
Which would give you EUR as a result.
Given the code
import json
import jmespath
json_data = """
{
"validatedAsset": {
"assetName": {
"#value": "VR-GROUP PLC",
"_ISSUE_CURRENCY": "EUR",
"_PRICING_MULTIPLIER": 1,
"_TYPE": "Debt",
"_SETTLEMENT_CALENDAR_ID": "Tgt",
"_SUBTYPE": "Bond",
"_IS_UNIT_TRADED": "N",
"_ISSUE_STATUS": "Active",
"_OWNERSHIP_TYPE": "Unknown",
"_ISSUE_METHOD": "Unknown",
"_DENOMINATION_CURRENCY": "EUR"
}
}
}
"""
print(jmespath.search(
"validatedAsset.assetName._ISSUE_CURRENCY",
json.loads(json_data, strict=False)
))
This yields:
EUR

Trying to follow django docs to create serialized json

Trying to seed a database in django app. I have a csv file that I converted to json and now I need to reformat it to match the django serialization required format found here
This is what the json format needs to look like to be acceptable to django (Which looks an awful lot like a dictionary with 3 keys, the third having a value which is a dictionary itself):
[
{
"pk": "4b678b301dfd8a4e0dad910de3ae245b",
"model": "sessions.session",
"fields": {
"expire_date": "2013-01-16T08:16:59.844Z",
...
}
}
]
My json data looks like this after converting it from csv with pandas:
[{'model': 'homepage.territorymanager', 'pk': 1, 'Name': 'Aaron ##', 'Distributor': 'National Energy', 'State': 'BC', 'Brand': 'Trane', 'Cell': '778-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None}, {'model': 'homepage.territorymanager', 'pk': 2, 'Name': 'Aaron Martin ', 'Distributor': 'Pierce ###', 'State': 'PA', 'Brand': 'Bryant/Carrier', 'Cell': '267-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None},...]
I am using this function to try and reformat
def re_serialize_reg_json(d, jsonFilePath):
for i in d:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
d[i] = {'pk': d[i]['pk'],'model' : d[i]['model'], 'fields' : d2}
print(d)
and it returns this error which doesn't make any sense because the format that django requires has a dictionary as the value of the third key:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
TypeError: list indices must be integers or slices, not dict
Any help appreciated!
Here is what I did to get d:
df = pandas.read_csv('/Users/justinbenfit/territorymanagerpython/territory managers - Sheet1.csv')
df.to_json('/Users/justinbenfit/territorymanagerpython/territorymanagers.json', orient='records')
jsonFilePath = '/Users/justinbenfit/territorymanagerpython/territorymanagers.json'
def load_file(file_path):
with open(file_path) as f:
d = json.load(f)
return d
d = load_file(jsonFilePath)
print(d)
D is actually a list containing multiple dictionaries, so in order to make it work you want to change that for i in d part to: for i in range(len(d)).

Unable to load dicts strings into JSON

import requests
import re
import json
def parser(code):
params = {
'template': 'professional',
'level': 'search',
'search': code
}
r = requests.get("https://maps.locations.husqvarna.com/api/getAsyncLocations",
params=params).json()
goal = re.search(r'({.+})', r['maplist'], re.M | re.DOTALL).group(1)
print(goal)
parser("35801")
The code will return a string of dicts which is not wrapped. i tried to dump/loads and wrapped it within [ ] but for weired reason it's still a string
You need to convert goal into a list manually, to receive Python objects:
import requests
import re
import json
def parser(code):
params = {
'template': 'professional',
'level': 'search',
'search': code
}
r = requests.get("https://maps.locations.husqvarna.com/api/getAsyncLocations",
params=params).json()
goal = re.search(r'({.+})', r['maplist'], re.M | re.DOTALL).group(1)
jsonList = '[%s]' % goal # Make proper json list!
items = json.loads(jsonList)
for item in items:
print(item)
parser("35801")
Out:
{'fid': 'USF221344-2115METROCIRCLE', 'lid': '56063', 'lat': '34.7004049', 'lng': '-86.5924508', 'url': 'https://locations.husqvarna.com/al/huntsville/product-manufacturer-usf221344-2115metrocircle.html', 'country': 'US', 'url_slug': 'product-manufacturer-usf221344-2115metrocircle.html', 'location_name': 'HEDDEN LAWN & GARDEN', 'address_1': '2115 METRO CIRCLE', 'address_2': '', 'city': 'HUNTSVILLE', 'city_clean': 'huntsville', 'region': 'AL', 'region_lc': 'al', 'post_code': '35801', 'local_phone': '(256) 885-1750', 'local_phone_pn_dashes': '256-885-1750', 'local_fax': '', 'local_fax_pn_dashes': '', 'from_email': '', 'hours_timezone': '', 'hours_dst': '', 'distance': '2.2', 'hours_sets:primary': '{"label":"Primary Hours","name":"primary","type":"0","timezone":"-6","dst":"1"}', 'Store Type_CS': 'Buy,Service', 'Location Type_CS': 'Authorized Dealers,Servicing Locations'}
...

How do I use Pandas to convert an Excel file to a nested JSON?

I am a rookie programmer and I'm trying to convert an excel file into a nested JSON using Pandas.
I am posting my code and the expected output, which I am not able to achieve so far. The problem is that the excel columns which I transform into nested info, should actually fall under the name "addresses" and I can't figure out how to do that. Will be grateful for any advice.
This is how the excel file looks like:
import pandas as pd
import json
df = pd.read_excel("...", encoding = "utf-8-sig")
df.fillna('', inplace = True)
def get_nested_entry(key, grp):
entry = {}
entry['Forename'] = key[0]
entry['Middle Name'] = key[1]
entry['Surname'] = key[2]
for field in ['Address - Country']:
entry[field] = list(grp[field].unique())
return entry
entries = []
for key, grp in df.groupby(['Forename', 'Middle Name', 'Surname']):
entry = get_nested_entry(key, grp)
entries.append(entry)
print(entries)
with open("excel_to_json_output.json", "w", encoding = "utf-8-sig") as f:
json.dump(entries, f, indent = 4)
This is the expected outcome
[
{
"firstName": "Angela",
"lastName": "L.",
"middleName": "Johnson",
"addresses": [
{
"postcode": "32807",
"city": "Orlando",
"state": "FL",
"country": "United States of America"
}
],
What I get is this
[
{
"Forename": "Angela",
"Middle Name": "L.",
"Surname": "Johnson",
"Address - Country": [
"United States of America"
]
},
Try this
b = {'First_Name': ["Angela","Peter","John"],
'Middle_Name': ["L","J","A"],
'Last_Name': ["Johnson","Roth","Williams"],
'City': ["chicago","seattle","st.loius"],
'state': ["IL","WA","MO"],
'zip': [60007,98105,63115],
'country': ["USA","USA","USA"]}
df = pd.DataFrame(b)
predict = df.iloc[:,:3].to_dict(orient='records')
postdict = df.iloc[:,3:].to_dict(orient='records')
entities=[]
for i in range(df.shape[0]):
tm = predict[i]
tm["addresses"] = [postdict[i]]
entities.append(tm)
output
[{'First_Name': 'Angela',
'Middle_Name': 'L',
'Last_Name': 'Johnson',
'addresses': [{'City': 'chicago',
'state': 'IL',
'zip': 60007,
'country': 'USA'}]},
{'First_Name': 'Peter',
'Middle_Name': 'J',
'Last_Name': 'Roth',
'addresses': [{'City': 'seattle',
'state': 'WA',
'zip': 98105,
'country': 'USA'}]},
{'First_Name': 'John',
'Middle_Name': 'A',
'Last_Name': 'Williams',
'addresses': [{'City': 'st.loius',
'state': 'MO',
'zip': 63115,
'country': 'USA'}]}]

How to compare json file with expected result in Python 3?

I need to prepare test which will be comparing content of .json file with expected result (we want to check if values in .json are correctly generated by our dev tool).
For test I will use robot framework or unittests but I don't know yet how to parse correctly json file.
Json example:
{
"Customer": [{
"Information": [{
"Country": "",
"Form": ""
}
],
"Id": "110",
"Res": "",
"Role": "Test",
"Limit": ["100"]
}]
}
So after I execute this:
with open('test_json.json') as f:
hd = json.load(f)
I get dict 'hd' where key is:
dict_keys(['Customer'])
and values:
dict_values([[{'Information': [{'Form': '', 'Country': ''}], 'Role': 'Test', 'Id': '110', 'Res': '', 'Limit': ['100']}]])
My problem is that I don't know how to get to only one value from Dict(e.g: Role: Test), because I can get only extract whole value. I can prepare a long string to compare with but it is not best solution for tests.
Any ideas how I can get to only one row from .json file?
Your JSON has single key 'Customer' and it has a value of list type. So when you ppass dict_keys(['Customer']) you are getting list value.
>>> hd['Customer']
[{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}]
First element in list:
>>> hd['Customer'][0]
{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}
Now access inside dict structure using:
>>> hd['Customer'][0]['Role']
'Test'
You can compare the dict that you loaded (say hd) to the expected results dict (say expected_dict) by running
hd.items() == expected_dict.items()

Categories

Resources