Create nested dictionary with predefined keys but without values in python - python

Aim
For an assignment I need to put some building information (geometries and their properties) into a GeoJSON data structure.
Approach
My general approach is as follows:
create an empty dictionary with the necessary GeoJSON data structure and .append the data to this structure
(see: below)
output_buildings = {
'type': "FeatureCollection",
'features': [
{
'type': "Feature",
'geometry': {
'type': ,
'coordinates':
},
'properties': {
'building_id': ,
'building_year': ,
'building_status': ,
'building_occurance':
}
}
]
}
Issue
I know how to create a simple empty dictionary, but my problem is that I don't know how to create a key structure without values (as I want to append those values later on). I receive an error at type (the second one within features) at the ,.
Previous efforts
I found this topic on StackOverflow:
Method for Creating a Nested Dictionary from a List of Keys;
but it doesn't fit my purpose.
I searched in the python docs with terms like "create empty nested dictionary", "create dictionary with empty values", but didn't find what I was looking for.
I tried to use placeholders (%s / %d), but they are not suitable for this purpose
Finally I tried to use pass (didn't work).
Question
I haven't found a solution yet. Can you please provide me with some suggestions?
Thanks in advance!

Your current dictionary structure is invalid, since you must have key-value pairs existent.
You can get around this by filling in None placeholders for keys that don't have a designated value. The key coordinates can be set to an empty list however, since you can have multiple coordinates.
Valid GeoJSON structure:
output_buildings = {
'type': "FeatureCollection",
'features': [
{
'type': "Feature",
'geometry': {
'type': None,
'coordinates': []
},
'properties': {
'building_id': None,
'building_year': None,
'building_status': None,
'building_occurance': None
}
}
]
}

Related

Extracting data from nested JSON using python

I am hoping someone can help me solve this problem I am having with a nested JSON response. I have been trying to crack this for a few weeks now with no success.
Using a sites API I am trying to create a dictionary which can hold three pieces of information, for each user, extracted from the JSON responses. The first JSON response holds the users uid and crmid that I require.
The API comes back with a large JSON response, with an object for each account. An extract of this for a single account can be seen below:
{
'uid': 10,
'key':
'[
N#839374',
'customerUid': 11,
'selfPaid': True,
'billCycleAllocationMethodUid': 1,
'stmtPaidForAccount': False,
'accountInvoiceDeliveryMethodUid': 1,
'payerAccountUid': 0,
'countryCode': None,
'currencyCode': 'GBP',
'languageCode': 'en',
'customFields':
{
'field':
[{
'name': 'CRMID',
'value': '11001'
}
]
},
'consentDetails': [],
'href': '/accounts/10'}
I have made a loop which extracts each UID for each account:
get_accounts = requests.get('https://website.com/api/v1/accounts?access_token=' + auth_key)
all_account_info = get_accounts.json()
account_info = all_account_info['resource']
account_information = {}
for account in account_info:
account_uid = account['uid']
I am now trying to extract the CRMID value, in this case '11001': {'name': 'CRMID', 'value': '11001'}.
I have been struggling all week to make this work, I have two problems:
I would like to extract the UID (which I have done) and the CRMID from the deeply nested 'customFields' dictionary in the JSON response. I manage to get as far as ['key'][0], but I am not sure how to access the next dictionary that is nested in the list.
I would like to store this information in a dictionary in the format below:
{'accounts': [{'uid': 10, 'crmid': 11001, 'amount': ['bill': 4027]}{'uid': 11, 'crmid': 11002, 'amount': ['bill': 1054]}]}
(The 'bill' information is going to come from a separate JSON response.)
My problem is, with every loop I design the dictionary seems to only hold one account/the last account it loops over. I cant figure out a way to append to the dictionary instead of overwrite whilst using a loop. If anyone has a useful link on how to do this it would be much appreciated.
My end goal is to have a single dictionary which holds the three pieces of information for each account (uid, crmid, bill). I'm then going to export this into a CSV document.
Any help, guidance, useful links etc would be much appreciated.
In regards to question 1, it may be helpful to print each level as you go down, then try and work out how to access the object you are returned at that level. If it is an array it will using number notation like [0] and if it is a dictionary it will use key notation like ['key']
Regarding question 2, your dictionary needs unique keys. You are probably looping over and replacing the whole thing each time.
The final structure you suggest is a bit off, imagine it as:
accounts: {
'10': {
'uid': '10',
'crmid': 11001,
'amount': {
'bill': 4027
}
},
'11': {
'uid': '11',
'crmid': 11011,
'amount': {
'bill': 4028
}
}
}
etc.
So you can access accounts['10']['crmid'] or accounts['10']['amount']['bill'] for example.

How to get the size of the intersection of two arrays in MongoDB using Python

I have several documents stored in MongoDB which each have an array of twitter account id's. I need to have mongoDB return the size of the intersection of two of these arrays.
The documents look like this:
{
'screen_name': 'BBC',
'name': 'British Broadcasting Company',
'twitter_id' = '2139471834'
'followers' = [] # This array will likely have millions of elements
}
{
'screen_name': 'TheEconomist',
'name': 'The Economist',
'twitter_id' = '213945674'
'followers' = [] # This array will likely have millions of elements
}
I would like the aggregation pipelines to return the size of the intersection of the two arrays. I have tried different methods of returning the size of the intersection and none have worked. The closest I have gotten is this query:
db.<collection>.aggregate(
[
{ $match :
{ $or :
[
{'screen_name': 'TheEconomist' },
{'screen_name': 'BBC'}
]
}
},
{ $project:
{'num_common_followers':
{$size:
{ $setIntersection:
['$followers']
}
}
}
}
])
However, this does not reference the 'followers' field from each of the two documents which pass the $match stage. Therefore, it seems like I either need to find a way to reference the 'followers' field from each document, or pass a single document with the two arrays to the $project stage. I would love an answer, but also any suggestions on better ways to do this function. Thanks in advance!

parse a key:value style file and get a value

I have a text file which looks like:
{
"content":
[
{
"id": "myid1",
"path": "/x/y"
},
{
"id": "myid2",
"path": "/a/b"
}
]
}
Is there a way to get the value corresponding to "id" when I pass the
"path" value to my method? For example when I pass /a/b I should get "myid2" in
return. Should I create a dictionary?
Maybe explain briefly what it is you need to actually do as I get a hunch that there might be an easier way to do what you're trying to do.
If i understand the question correctly, if you wanted to find the id by passing a value such as "/x/y" then why not structure the dictionary as
{
"content":
{
"/x/y": "myid1"
},
...(more of the same)
}
This would give you direct access to the value you want as otherwise you need to iterate through arrays.
This looks very much like JSON, so you can use the json module to parse the file. Then, just iterate the dictionaries in the "contents" list and get the one with the matching "path".
import json
with open("data.json") as f:
data = json.load(f)
print(data)
path = "/a/b"
for d in data["content"]:
if d["path"] == path:
print(d["id"])
Output:
{'content': [{'path': '/x/y', 'id': 'myid1'}, {'path': '/a/b', 'id': 'myid2'}]}
myid2

Creating a nested lists of lists in python ... (really csv -> json conversion)

I've just been pounding at this problem which should be easy -- I'm just very new to Python which is required in this case.
I'm readying in a .csv file and trying to created a nested structure so that json.dumps gives me a pretty nice nested .json file.
The result json is actually six levels deep but I thought if I could get the bottom two working the rest would be the same. The input is working just great as I've ended up with job['fieldname'] for building the structure. The problem is getting the result to nest.
Ultimately I want:
"PAYLOAD": {
"TEST": [
{
"JOB_ONE": {
"details": {
"customerInformation": {
"lastName": "Chun",
"projectName": "N Pacific Recovery",
"firstName": "Wally",
"secondaryPhoneNumber": ""
},
"description": "N Pacific Garbage Sweep",
"productType": "Service Generation",
"address": {
"city": "Bristol",
"zipCodePlusSix": "",
"stateName": "",
"zipCode": "53104",
"line1": "12709 789441th Ave",
"county": "",
"stateCode": "WI",
"usage": "NA",
"zipCodePlusFour": "",
"territory": "",
}
}
}
},
{
"JOB_TWO": {
"details": {
.... similar to JOB_ONE ....
}
}
}
}],
"environment": "N. Pacific",
"requestorName": "Waldo P Rossem",
"requestorEmail": "waldo# no where.com",
However, with the code below, which only deals with the "details section", I end up with a stack of all addresses, followed by all of the customer information. So, the loop is processing all the csv records and appending the addresses, and then looping csv records and appending the info.
for job in csv.DictReader(csv_file):
if not job['Cancelled']:
# actually have no idea how to get these two to work
details['description']: job['DESCRIBE']
details['projectType']: job['ProjectType']
# the following cycle through the customerInformation and then
# appends the addresses. So I end up with a large block of customer
# records and then a second block of their addresses
details['customerInformation'].append({
'lastName': "job[Lastname]",
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
})
details['address'].append({
'city': job['City'],
'zipCode': job['Zip'],
'line1': job['Address'],
'stateCode': job['State'],
'market': job['Market']
})
What I am trying to understand is how to fix this loop and get the description and project type to appear in the right place AND setup the data structure so that the bottom flags are also properly structure for the final json dump.
This is largely due to my lack of experience with Python but unfortunately, its a requirement -- otherwise, I could have had it done hours ago using gawk!
Requested CSV follows:
Sure... took me a while to dummy it up as the above is an abbreviated snippet.
JobNumber,FirstName,Lastname,secondaryPhoneNumber,Market,Address,City,State,Zip,requestorName,requestorEmail,environment
22056,Wally,Fruitvale,,N. Pacific,81 Stone Church Rd,Little Compton,RI,17007,Waldo P Rossem,waldo# no where.com,N. Pacific
22057,William,Stevens,,Southwest,355 Vt Route 8a,Jacksonville,VT,18928,Waldo P Rossem,waldo# no where.com,N. Pacific
22058,Wallace,Chen,,Northeast,1385 Jepson Rd,Stamford,VT,19403,Waldo P Rossem,waldo# no where.com,N.
You can create the details dict as a literal vs. create and key assignment:
data = []
for job in csv.DictReader(csv_file):
if job['Cancelled']:
continue
details = {
'description': job['DESCRIBE'],
'projectType': job['ProjectType'],
'customerInformation' : {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
...
},
...
}
data.append(details)
json_str = json.dumps(data)
I think all you need for your puzzle is to know a few basic things about dictionaries:
Initial assignment:
my_dict = {
"key1": "value1",
"key2": "value2",
...
}
Writing key/value pairs to an already initialized dict:
my_dict["key2"] = "new value"
Reading:
my_dict["key2"]
prints> "new value"
Looping keys:
for key in my_dict:
print(key)
prints> "key1"
prints> "key2"
Looping both key and value:
for key, value in my_dict.items():
...
Looping values only:
for value in my_dict.values():
...
If all you want is a JSON compatible dict, then you won't need much else than this, without me going into defaultdicts, tuple keys and so on - just know that it's worth reading up on that once you've figured out basic dicts, lists, tuples and sets.
Edit: One more thing: Even when new I think it's worth trying Jupyter notebook to explore your ideas in Python. I find it to be much faster to try things out and get the results back immediately, since you don't have to switch between editor and console.
You're not far off.
You first need to initialise details as a dict:
details = {}
Then add the elements you want:
details['description'] = job['DESCRIBE']
details['projectType'] = job['ProjectType']
Then for the nested ones:
details['customerInformation'] = {
'lastName': job['Lastname'],
'firstName': job['FirstName'],
'projectName':"N Pacific Prototype",
}
For more details on how to use dict: https://docs.python.org/3/library/stdtypes.html?highlight=dict#dict.
Then you can get the JSON with JSON.dumps(details) (documentation here: https://docs.python.org/3/library/json.html?highlight=json#json.dumps).
Or you can first gather all the details in a list, and then turn the list into a JSON string:
all_details = []
for job in ...:
(build details dict)
all_details.append(details)
output = JSON.dumps(all_details)

Dealing with JSON with duplicate keys [duplicate]

This question already has answers here:
json.loads allows duplicate keys in a dictionary, overwriting the first value
(3 answers)
Closed 12 days ago.
If I have JSON with duplicate keys and different values in each of the duplicate keys, how can I extract both in python?
ex:
{
'posting': {
'content': 'stuff',
'timestamp': '123456789'
}
'posting': {
'content': 'weird stuff',
'timestamp': '93828492'
}
}
If I wanted to grab both timestamps, how would I do so?
I tried a a = json.loads(json_str) and then a['posting']['timestamp'] but that only returns one of the values.
You can't have duplicate keys. You can change the object to array instead.
[
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]
Duplicate keys actually overwrite the previous entry. Instead you maintain an array for that key. Example json is as below
{
'posting' : [
{
'content': 'stuff',
'timestamp': '123456789'
},
{
'content': 'weird stuff',
'timestamp': '93828492'
}
]
}
you can now access different elements in posting key like this
json.posting[0] , json.posting[1]
As has already been covered: it is against the standard, and the outcome across systems is undefined, so avoid duplicate keys.
Yet, if a third party software component forces this upon you, note the section abut this topic from the standard library https://docs.python.org/3/library/json.html#repeated-names-within-an-object
By default, this module does not raise an exception; instead, it ignores all but the last name-value pair for a given name [...] The object_pairs_hook parameter can be used to alter this behavior.
So let's do it!
import itertools, json
def duplicate_object_pairs_hook(pairs):
def _key(pair):
(k, v) = pair
return k
def gpairs():
for (k, group) in itertools.groupby(pairs, _key):
ll = [v for (_, v) in group]
(v, *extra) = ll
yield (k, ll if extra else v)
return dict(gpairs())
badj = """{
"posting": {"content": "stuff", "timestamp": "123456789"},
"posting": {"content": "weird stuff", "timestamp": "93828492"}
}"""
data = json.loads(badj, object_pairs_hook=duplicate_object_pairs_hook)
Now data evals to
{
'posting': [
{'content': 'stuff', 'timestamp': '123456789'},
{'content': 'weird stuff', 'timestamp': '93828492'},
],
}
Remember that this hook will be called for every json node parsed, with the list of tuples of key-value pairs parsed. The default behavior should be equivalent to the dict constructor given a key-value tuple iterable.
Also, I assumed duplicate keys are adjacent, as that's my use-case, but you might have to sort the pairs before grouping them.

Categories

Resources