normalize mixed list of dicts/lists in python - python

I have a big dataset of addresses in following mixed formats:
1) Simple straight variant of houses and flats:
"Big District, Main Street, House 1, flat 1",
"Big District, Main Street, House 1, flat 2"
district
street
house
flat
flat
2) Complex variant of houses, flats and buildings:
"Big District, Main Street, House 1, flat 1"
"Big District, Main Street, House 1, flat 2"
"Big District, Main Street, House 1, Building 1, flat 1"
"Big District, Main Street, House 1, Building 1, flat 2"
(So there are House 1 with flats and House 1 building 1 with flats)
district
street
house
flat
flat
building
flat
flat
3) Varinat of house that only have buildings
"Big District, Main Street, House 1, Building 1, flat 1"
"Big District, Main Street, House 1, Building 1, flat 2"
"Big District, Main Street, House 1, Building 2, flat 1"
"Big District, Main Street, House 1, Building 2, flat 2"
(There is no house 1 without buildings in this case, only House 1 building 1 and House 1 building 2)
district
street
house
building
flat
flat
building
flat
flat
Data is structured as follows:
[
{"text": "street 1",
"level": 7,
"children":[
{"text": "house 1",
"level":8,
"children":[
{"text": "flat 1", "level": 11},
{"text": "flat 2", "level": 11}
]
},
{"text": "house 2",
"level": 8,
"children":[
{"text": "building 1",
"level": 9,
"children":[
{"text": "flat 1", "level": 11}
]
},
{"text": "flat 1", "level": 11}
]
},
{"text": "house 3",
"children": []
}
]
}
]
What I need is a list of dicts:
[
{"level 7": "Street 1", "level 8": "house 1", "level 9": NaN, "level 11":"flat 1"},
{"level 7": "Street 1", "level 8": "house 1", "level 9": NaN, "level 11":"flat 2"},
{"level 7": "Street 1", "level 8": "house 2", "level 9": "building 1", "level 11":"flat 1"},
{"level 7": "Street 1", "level 8": "house 2", "level 9": NaN, "level 11":"flat 1"},
{"level 7": "Street 1", "level 8": "house 3", "level 9": NaN, "level 11":NaN}
]
And I'm really stuck how to make this algorithm.

Related

Modify existing json to create new custom one python

I'm trying to trim unused data in json to create new one with only two fields. Title and description. The title works great but I can't figure out how to get the description field. The json is public and you can get it here or at the end of the post.
My code that extracts title field:
import requests
import json
def trim_json(d):
newd = {}
for name in ['title']:
newd[name] = d[name]
return newd
def clean():
books = requests.get('https://openlibrary.org/authors/OL23919A/works.json')
books_parsed = books.json()
book_data = books_parsed['entries']
book_data = [trim_json(d) for d in book_data]
print(book_data)
return book_data
update
clean function returns list of dicts in this format:
[{'title': 'Harry Potter House Gryffindor Edition Series 1-5 Books Collection Set By J.K. Rowling'}]
What I want to get is:
[{'title': 'Harry Potter House Gryffindor Edition Series 1-5 Books Collection Set By J.K. Rowling', 'description': 'lorem ipsum'}]
and if there is no description:
[{'title': 'Harry Potter House Gryffindor Edition Series 1-5 Books Collection Set By J.K. Rowling', 'description': 'undefind'}]
How can I get json that returns title & description?
{
"type": {
"key": "/type/work"
},
"title": "Journey to Hogwarts",
"authors": [
{
"type": {
"key": "/type/author_role"
},
"author": {
"key": "/authors/OL23919A"
}
}
],
"covers": [
2520429
],
"key": "/works/OL28602152W",
"latest_revision": 1,
"revision": 1,
"created": {
"type": "/type/datetime",
"value": "2022-08-05T00:16:59.602176"
},
"last_modified": {
"type": "/type/datetime",
"value": "2022-08-05T00:16:59.602176"
}
},
{
"description": "Harry Potter #2\r\n\r\nThroughout the summer holidays after his first year at Hogwarts School of Witchcraft and Wizardry, Harry Potter has been receiving sinister warnings from a house-elf called Dobby.\r\n\r\nNow, back at school to start his second year, Harry hears unintelligible whispers echoing through the corridors.\r\n\r\nBefore long the attacks begin: students are found as if turned to stone.\r\n\r\nDobby’s predictions seem to be coming true.\r\n\r\n[Source][1]\r\n\r\n\r\n [1]: https://www.jkrowling.com/book/harry-potter-chamber-secrets/",
"links": [
{
"title": "Author's book page",
"url": "https://www.jkrowling.com/book/harry-potter-chamber-secrets/",
"type": {
"key": "/type/link"
}
},
{
"url": "https://en.wikipedia.org/wiki/Harry_Potter_and_the_Chamber_of_Secrets",
"title": "Wikipedia entry",
"type": {
"key": "/type/link"
}
},
{
"title": "Harry Potter and the Chamber of Secrets by J.K. Rowling - review | Children's books | The Guardian",
"url": "https://www.theguardian.com/childrens-books-site/2015/mar/02/review-j-k-rowling-harry-potter-chamber-secrets",
"type": {
"key": "/type/link"
}
},
{
"url": "https://www.theguardian.com/childrens-books-site/2016/may/26/harry-potter-and-the-chamber-of-secrets-jk-rowling-review",
"title": "Harry Potter and the Chamber of Secrets by J.K. Rowling - review 2 | Children's books | The Guardian",
"type": {
"key": "/type/link"
}
}
],
"title": "Harry Potter and the Chamber of Secrets",
"covers": [
8234423,
8237628,
8237644,
8392798,
8995302,
8762432,
8081272,
8353396,
10301720,
8938317,
10471286,
10413455,
10487260,
-1,
10535729,
10722535,
10722534,
11522289,
12347254,
12581306,
12606939,
10536577,
11540339,
12023623
],
"subject_places": [
"England",
"London",
"Hogwarts School of Witchcraft and Wizardry",
"Inglaterra",
"Privet Drive"
],
"subjects": [
"Fantasy fiction",
"school stories",
"Fiction",
"Fantasy",
"Nestlé Smarties Book Prize winner",
"Juvenile fiction",
"Wizards",
"Magic",
"Schools",
"Spanish language materials",
"Magia",
"Escuelas",
"Ficción juvenil",
"Novela fantástica",
"Hogwarts School of Witchcraft and Wizardry (Imaginary place)",
"Harry Potter (Fictitious character)",
"Wizards -- Juvenile fiction",
"Witches",
"Hogwarts School of Witchcraft and Wizardry (Imaginary organization)",
"Magos",
"Translations from English",
"Chinese fiction",
"Orphans",
"Aunts",
"Uncles",
"Cousins",
"Determination (Personality trait) in children",
"Friendship",
"Potter, Harry (Fictitious character)",
"Witches Fiction",
"Wizards Fiction",
"Schools Fiction",
"England Fiction",
"Magic -- Juvenile fiction",
"Hogwarts School of Witchcraft and Wizardry (Imaginary place) -- Juvenile fiction",
"Schools -- Juvenile fiction",
"Wizards -- Fiction",
"Magic -- Fiction",
"Schools -- Fiction",
"England -- Juvenile fiction",
"England -- Fiction",
"Fantasy & Magic",
"Action & Adventure",
"Witchcraft",
"Harry Potter (Fictional character)",
"Engels",
"Social Themes",
"Reading Level-Grade 11",
"Reading Level-Grade 12",
"Schools, fiction",
"England, fiction",
"Potter, harry (fictitious character), fiction",
"Hogwarts school of witchcraft and wizardry (imaginary organization), fiction",
"Wizards, fiction",
"Magic, fiction",
"Children's fiction",
"Adventure and adventurers, fiction",
"English literature",
"Fiction, fantasy, general",
"Large type books",
"Hermione Granger (Fictitious character)",
"Ron Weasley (Fictitious character)",
"Latin language materials",
"Children's stories",
"Magiciens",
"Romans, nouvelles, etc. pour la jeunesse",
"Nécromancie",
"Écoles",
"Potter, Harry (Personnage fictif)",
"Romans, nouvelles",
"Magie",
"Family",
"Orphans & Foster Homes",
"Magía",
"Novela juvenil",
"Juvenile",
"Children's stories, English",
"Sieg",
"Basilisk",
"Das Böse",
"Das Gute",
"Internat",
"Lebensgefahr",
"Lebensrettung",
"List",
"Magier",
"Jugendbuch",
"Kampf",
"Schule",
"Basilisk (Fabeltier)",
"Junge",
"Phönix",
"Deutschland Grenzschutzkommando Mitte Schule",
"Deutschland",
"Friendship, fiction",
"Hogwartes School of Witchcraft and Wizardry (Imaginary place)",
"General",
"Social Issues",
"Witches, fiction"
],
"subject_people": [
"Harry Potter",
"Hermione Granger",
"Ron Weasley",
"Albus Dumbledore",
"Hagrid",
"The Dursleys",
"Gilderoy Lockhart",
"Dobby",
"Moaning Myrtle",
"Ginny Weasley",
"Draco Malfoy",
"Hermine Granger",
"Ron Weasly",
"Harry Potter (Fictitious character)"
],
"key": "/works/OL82537W",
"authors": [
{
"author": {
"key": "/authors/OL23919A"
},
"type": {
"key": "/type/author_role"
}
}
],
"excerpts": [
{
"excerpt": "Not for the first time, an argument had broken out over breakfast at number four, Privet Drive.",
"comment": "first sentence",
"author": {
"key": "/people/seabelis"
}
}
],
"type": {
"key": "/type/work"
},
"latest_revision": 80,
"revision": 80,
"created": {
"type": "/type/datetime",
"value": "2009-10-17T07:07:29.461716"
},
"last_modified": {
"type": "/type/datetime",
"value": "2022-06-22T07:57:49.863271"
}
},
All entries don't have title and description field. Therefore you have to use try...except clauses to prevent KeyErrors to happen.
def trim_json(d):
newd = {}
try:
newd["title"] = d["title"]
except KeyError:
pass
try:
newd["description"] = d["description"]
except KeyError:
pass
return newd
Or, in a more elegant way, you could use a filter in a dictionnary comprehension:
key_filter = ['title', 'description']
cleaned_data = [{k:d[k] for k in key_filter if k in d} for d in book_data]
And since the first element in the entries list is not a book data (and does not have a title nor a description key), you should start the list comprehension after the first element :
def clean():
books = requests.get('https://openlibrary.org/authors/OL23919A/works.json')
books_parsed = books.json()
book_data = books_parsed['entries']
cleaned_data = [trim_json(d) for d in book_data[1:]]
return book_data
It prevents obtaining an empty dictionnary that corresponds to no book.
Use the json library. It comes installed in python by default.
Let us say your json string is stored in a variable called json_str, we can run:
import json
info = json.loads(json_str)
title = info['title']

how to convert json response to excel using python

this reponse I am getting:
{
"value": [
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
},
{
"id": "/providers/Microsoft.Billing/Departments/1234/providers/Microsoft.Billing/billingPeriods/201903/providers/Microsoft.Consumption/usageDetails/usageDetails_Id1",
"name": "usageDetails_Id1",
"type": "Microsoft.Consumption/usageDetails",
"kind": "legacy",
"tags": {
"env": "newcrp",
"dev": "tools"
},
"properties": {
"billingAccountId": "xxxxxxxx",
"billingAccountName": "Account Name 1",
"billingPeriodStartDate": "2019-03-01T00:00:00.0000000Z",
"billingPeriodEndDate": "2019-03-31T00:00:00.0000000Z",
"billingProfileId": "xxxxxxxx",
"billingProfileName": "Account Name 1",
"accountName": "Account Name 1",
"subscriptionId": "00000000-0000-0000-0000-000000000000",
"subscriptionName": "Subscription Name 1",
"date": "2019-03-30T00:00:00.0000000Z",
"product": "Product Name 1",
"partNumber": "Part Number 1",
"meterId": "00000000-0000-0000-0000-000000000000",
"meterDetails": null,
"quantity": 0.7329,
"effectivePrice": 0.000402776395232,
"cost": 0.000295194820065,
"unitPrice": 4.38,
"billingCurrency": "CAD",
"resourceLocation": "USEast",
"consumedService": "Microsoft.Storage",
"resourceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Resource Group 1/providers/Microsoft.Storage/storageAccounts/Resource Name 1",
"resourceName": "Resource Name 1",
"invoiceSection": "Invoice Section 1",
"costCenter": "DEV",
"resourceGroup": "Resource Group 1",
"offerId": "Offer Id 1",
"isAzureCreditEligible": false,
"chargeType": "Usage",
"benefitId": "00000000-0000-0000-0000-000000000000",
"benefitName": "Reservation_purchase_03-09-2018_10-59"
}
}
]
}
code:
import pandas as pd
frame=pd.DataFrame()
for i in range (len(json_output['value'])):
df1= pd.DataFrame(data={'kind':json_output['value'][i]['kind'],
'id': json_output['value'][i]['id'],
'tags': json_output['value'][i]['tags'],
'name':json_output['value'][i]['name'],
'type':json_output['value'][i]['type'],
'billingAccountid':json_output['value'][i]['properties']['billingAccountId']},index=[i])
print(df1)
frame=frame.append(df1)
frame.to_csv('datt.csv')
Can you please help me to convert this data in to csv.
I am looking for
id,name,type,kind,tags,billingAccountId,resourceName etc into all column
I tried to convert into DataFrame it didn't work.
At last I am trying above python but its giving tags into null.
Note : I want to keep tags in dict format (for now)
I tried your code and stored json file into an output first:
-TAGS is a dictionary you access it without any keys so it will be NONE
If not comfortable by splitting TAGS use:
'tags':json_output['value'][i]['tags']['env']+json_output['value'][i]['tags']['dev']

Convert JSON table to JSON tree

I have the results of an SQL query in JSON format
value = [
{"Machine": "Mach 1", "Device": "Dev a", "Identifier": "HMI 1"},
{"Machine": "Mach 1", "Device": "Dev a", "Identifier": "HMI 2"},
{"Machine": "Mach 1", "Device": "Dev b", "Identifier": "HMI 3"},
{"Machine": "Mach 1", "Device": "Dev c", "Identifier": "HMI 5"},
{"Machine": "Mach 2", "Device": "Dev c", "Identifier": "HMI 6"},
{"Machine": "Mach 2", "Device": "Dev d", "Identifier": "HMI 7"},
{"Machine": "Mach 3", "Device": "Dev e", "Identifier": "HMI 8"}
]
I'm trying to generate a tree of the form:
Tree to be generated
[ ]- Mach 1
+[ ]- Dev a
| +-- HMI 2
| +-- HMI 3
+[ ]- Dev c
+-- HMI 5
[ ]- Mach 2
+[ ]- Dev c
| +-- HMI 6
+[ ]- Dev d
| +-- HMI 7
+[ ]- Dev e
+-- HMI 8
The output of the function is to be used by Inductive Automation's Perspective Tree component which expects it in the format:
items = [
{
"label": "Mach 1",
"expanded": true,
"data": "",
"items": [
{
"label": "Dev a",
"expanded": true,
"data": "",
"items": [
{
"label": "HMI 1",
"expanded": true,
"data": {
"Identifier": "HMI1",
"Device": "Dev a",
"Machine": "Mach 1"
},
"items": []
},
{
"label": "HMI 2",
"expanded": true,
"data": {
"Identifier": "HMI2",
"Device": "Dev a",
"Machine": "Mach 1"
},
"items": []
}
]
},
{
"label": "Dev b",
"expanded": true,
"data": "",
"items": [
{
"label": "HMI 3",
"expanded": true,
"data": {
"Identifier": "HMI3",
"Device": "Dev b",
"Machine": "Mach 1"
},
"items": []
}
]
}
]
},
…
I have created some linear Python code for a tree depth of three but I'd like to modify it to work automatically with tree depth from 1 to 6 (or so) returned by the SQL query. (The sample input and output above is three-level.) Unfortunately I can't figure out how to modify this to work with recursion for a variable number of columns.
Figure 1. The results of my lazy code (available on request).
Can anyone suggest an approach using Python - the script language of the Ignition application I'm using?
Many thanks.
You would need to provide the order in which the keys should be used to drill down in the hierarchy. This is good practice, as the order of the keys in a dictionary might not represent the desired order.
Once you have these keys as a list, you could use it to iteratively dig deeper into the hierarchy.
def makeForest(values, levels):
items = [] # The top level result array
paths = {} # Objects keyed by path
root = { "items": items } # Dummy: super root of the forest
for data in values:
parent = root
path = ""
for key in levels:
label = data[key]
path += repr([label])
node = paths.get(path, None)
if not node:
node = {
"label": data[key],
"expanded": True,
"data": "",
"items": []
}
paths[path] = node
parent["items"].append(node)
parent = node
parent["data"] = data
return items
# Example use:
value = [{"Machine": "Mach 1", "Device": "Dev a", "Identifier": "HMI 1"},{"Machine": "Mach 1", "Device": "Dev a", "Identifier": "HMI 2"},{"Machine": "Mach 1", "Device": "Dev b", "Identifier": "HMI 3"},{"Machine": "Mach 1", "Device": "Dev c", "Identifier": "HMI 5"},{"Machine": "Mach 2", "Device": "Dev c", "Identifier": "HMI 6"},{"Machine": "Mach 2", "Device": "Dev d", "Identifier": "HMI 7"},{"Machine": "Mach 3", "Device": "Dev e", "Identifier": "HMI 8"}]
forest = makeForest(value, ["Machine", "Device", "Identifier"])
print(forest)

how can i declare a list of map defined types in cassandra

i want to declare a list of objects in cassandra and i have already created the type object
CREATE TYPE profiles.educations (
major text,
end text,
name text,
degree text,
start text,
desce text
);
how can declare a list of map educations type
cause i have a json file this format:
{
...
"educations": [
{
"start": "2009",
"major": "Business Administration and Management, General",
"end": "2010",
"name": "Gordon Institute of Business Science - University of Pretoria",
"degree": "PDBA"
},
{
"start": "2002",
"major": "Marketing Management",
"end": "2006",
"name": "University of Pretoria/Universiteit van Pretoria",
"degree": "B. com with specialization in Marketing Management"
},
{
"major": "Finanzas",
"end": "2013",
"name": "Universidad de Los Andes",
"degree": "Maestr\u00eda en Finanzas",
"start": "2011",
"desce": ""
}]
...
}

Removing duplicate output from list of Dictionary in JSON Python

I have following JSON file that I am trying to parse and encountering some issues.
[
{
"ballot_name": "LAPP, David",
"office": "MAYOR",
"votes": "7",
"voting_station": "3",
"voting_station_id": "703",
"voting_station_name": "Branton JR High School",
"voting_station_type": "Regular",
"ward": "7"
},
{
"ballot_name": "SMITH, Bill",
"office": "MAYOR",
"votes": "683",
"voting_station": "1",
"voting_station_id": "1101",
"voting_station_name": "St. Mary's Parish Hall",
"voting_station_type": "Regular",
"ward": "11"
},
{
"ballot_name": "HEATHER, Larry R",
"office": "MAYOR",
"votes": "1",
"voting_station": "37",
"voting_station_id": "737",
"voting_station_name": "Clover Living",
"voting_station_type": "Special",
"ward": "7"
},
{
"ballot_name": "OLSON, Curtis",
"office": "MAYOR",
"votes": "0",
"voting_station": "32",
"voting_station_id": "1432",
"voting_station_name": "Lake Bonavista Village",
"voting_station_type": "Special",
"ward": "14"
},
{
"ballot_name": "LIN, Jun",
"office": "COUNCILLOR",
"votes": "2",
"voting_station": "66",
"voting_station_id": "366",
"voting_station_name": "Memorial Park Library",
"voting_station_type": "Advance",
"ward": "3"
},
{
"ballot_name": "HEJDUK, Marek",
"office": "COUNCILLOR",
"votes": "0",
"voting_station": "67",
"voting_station_id": "767",
"voting_station_name": "Saddletowne Library",
"voting_station_type": "Advance",
"ward": "7"
},
My objectives so far to do the following
1> Print the list of voting_station_name removing all the duplicates - Which I can print but not able to remove duplicates?
Below is the code I have tried so far.
import json
import urllib
print "This is Json Data Parser Program \nThis program will download the Election Results from 2017 file from OpenData Portal"
_url_= "https://data.cityname.ca/resource/kqmd-3dsq.json"
_response_ = urllib.urlopen(_url_)
_data_= json.loads(_response_.read())
#with open('data.json', 'w') as outfile:
# json.dump(_data_,outfile,indent=4,sort_keys=True)
def _ward_(_no_):
print "Your choosen ward number is" , _no_
for _i_ in _data_:
result = []
if (_i_["ward"] == _no_ and _i_["voting_station_name"] not in result):
result.append(_i_["voting_station_name"])
print result
_ward_("12")
I am able to get the output as following but as we can see it has some duplicates "voting_station_name"
How can I remove the duplicates in my output?
This is Json Data Parser Program
This program will download the CoC Election Results from 2017 file from OpenData Portal
Your choosen ward number is 12
Cranston School
McKenzie Towne Care Centre
Millican/Ogden Community Association
Age Care - Seton Seniors Community
Auburn Heights Retirement Residence
University of Calgary Taylor Family Digital Librar
McKenzie Towne Church
Age Care - Seton Seniors Community
Christ the King Catholic School
Auburn Heights Retirement Residence
You are reinitializing the list in each iteration, hence it is always empty when you perform the check:
def _ward_(_no_):
print "Your choosen ward number is" , _no_
result = []
for _i_ in _data_:
if (_i_["ward"] == _no_ and _i_["voting_station_name"] not in result):
result.append(_i_["voting_station_name"])
print result
EDIT:
You ask me for improvements on the code structure. I'm not sure if it is an improvement, you should try and benchmark the result, but my development would had been something like:
def _ward_(_no_):
print "Your choosen ward number is" , _no_
print set([e["voting_station_name"] for e in _data_ if e["ward"]==_no_])
In this code, I generate a list comprenhension that extract the "voting_station_name" from all elements of _data_ that have a "ward" equals to _no_. I convert this list to a set to remove the duplicates and print the result.

Categories

Resources