I'm struggling to build a multi-nested dictionary that I'll be using to add to my collection in mongodb. I'm questioning the approach and my attempt at the the solution.
Here is the problem:
I built a function that identifies the delta's between the local collection data and updates I'm getting from my golden source.
The function produces a dictionary of all the delta's. The dictionary contains the tag as the key, and the new delta update as the value.
I then pass the delta dictionary and current data dictionary to another function, who is responsible for doing the following:
identifying the old value and new value using the delta.key()
building a new dictionary which should contain the a full path to the nested dictionary, which will contain only two values: newValue and oldValue.
What I'm struggling with is that when I do the four loop, it just seems to overwrite the the previous record. The data should get appended. If the path exists, when the update should only be adding to the delta's. Unless the value already exists then I can understand the update.
For example:
Same date -> Different Tags: should append the new tag and it's oldvalue and newvalue.
Same date -> Same tag: Should update the existing tag's
The reason I am trying to do this in this manner is so that I can avoid multiple calls and updates to the collection. Ideally stick to one update.
But my concerns are the following:
Is this the best approach when working with nested dictionaries and MongoDB ?
What issues will this cause when I go to update mongodb using "pymongo". I'm worried it's going to over ride existing records on the update. I want the records to be appended not overwritten.
Is there a different approach that would make more sense?
This is my first attempt 1:
def update_record(_collection, _key, _data, _delta):
today = date.today()
today_formatted = today.strftime("%Y-%m-%d")
_query_criteria = {_key: _data[_key]}
_update_values = {}
_append_delta = {}
x = 0
for delta_key in _delta.keys():
_update_values = {delta_key: _delta[delta_key]}
_append_delta["delta"]["byType"][delta_key][today_formatted] = {"oldValue": _data[delta_key],
"newValue": _delta[delta_key]}
_append_delta["delta"]["byDate"][today_formatted][delta_key] = {"oldValue": _data[delta_key],
"newValue": _delta[delta_key]}
Attempt 2:
def update_record(_collection, _key, _data, _delta):
today = date.today()
today_formatted = today.strftime("%Y-%m-%d")
_query_criteria = {_key: _data[_key]}
_update_values = {}
_append_delta = {}
x = 0
for delta_key in _delta.keys():
_update_values = {delta_key: _delta[delta_key]}
x_dict = {}
y_dict = {}
if x == 0:
_append_delta["delta"]["byType"] = {delta_key: {today_formatted: {}}}
_append_delta["delta"]["byDate"][today_formatted] = {delta_key: {}}
x += 1
_append_delta["delta"]["byType"][delta_key][today_formatted] = {"oldValue": _data[delta_key],
"newValue": _delta[delta_key]}
_append_delta["delta"]["byDate"][today_formatted][delta_key] = {"oldValue": _data[delta_key],
"newValue": _delta[delta_key]}
else:
_append_delta.update(
{"delta":
{"byType": {
delta_key: {today_formatted: {"oldValue": _data[delta_key], "newValue": _delta[delta_key]}}},
"byDate": {
today_formatted: {delta_key: {"oldValue": _data[delta_key], "newValue": _delta[delta_key]}}}
}
}
)
Example of what I want the collection to look like in MongoDB:
[{name: "Apple",
ticker: "appl",
description: "Apple Computers",
currency: "usd",
delta: {
byTag: {
name: {
"2021-06-01": {
oldValue: "appl",
newValue: "Apple"
}
},
description: {
"2021-06-06": {
oldValue: "Apple",
newValue: "Apple Computers"
}
}
},
byDate: {
"2021-06-01": {
name: {
oldValue: "appl",
newValue: "Apple"
}
},
"2021-06-06": {
description: {
oldValue: "Apple",
newValue: "Apple Computers"
}
}
}
}
}]
You have a lot of questions here. You may get a better response if you break them down into bite-size issues.
In terms of dealing with changes to your data, you might want to take a look at dictdiffer. Like a lot of things in python there's usually a good library to achieve what you are looking to do. It won't give you the format you are looking for but will give you a format that the community has determined is best practice for this sort of problem. You get extra great stuff as well, like being able to patch old records with the delta.
Separately, with nested dicts, I think it's easier to create them based on the object structure rather than relying on building from keys. It more verbose but clearer in my opinion. The code below is a sample using classes to give you an idea of this concept:
from pymongo import MongoClient
from datetime import date
from bson.json_util import dumps
db = MongoClient()['mydatabase']
class UpdateRecord:
def __init__(self, name, ticker, description, currency, delta):
self.name = name
self.ticker = ticker
self.description = description
self.currency = currency
self.date = date.today().strftime("%Y-%m-%d")
self.delta = delta
# Need to code to work out the deltas
def by_tags(self):
tags = dict()
for tag in ['name', 'description']:
tags.update({
tag: {
self.date: {
'oldValue': "appl",
'newValue': "Apple"
}
}
})
return tags
def by_date(self):
dates = dict()
for dt in ['2021-06-01', '2021-06-06']:
dates.update({
dt: {
self.date: {
'oldValue': "appl",
'newValue': "Apple"
}
}
})
return dates
def to_dict(self):
return {
'name': self.name,
'ticker': self.ticker,
'description': self.description,
'currency': self.currency,
'delta': {
'byTag': self.by_tags(),
'byDate': self.by_date()
}
}
def update(self, _id):
db.mycollection.update_one({'_id': _id}, {'$push': {'Updates': self.to_dict()}})
delta = {
'oldValue': "appl",
'newValue': "Apple"
}
#
# Test it out
#
dummy_record = {'a': 1}
db.mycollection.insert_one(dummy_record)
record = db.mycollection.find_one()
update_record = UpdateRecord(name='Apple', ticker='appl', description='Apple Computer', currency='usd', delta=delta)
update_record.update(record.get('_id'))
print(dumps(db.mycollection.find_one({}, {'_id': 0}), indent=4))
prints:
{
"a": 1,
"Updates": [
{
"name": "Apple",
"ticker": "appl",
"description": "Apple Computer",
"currency": "usd",
"delta": {
"byTag": {
"name": {
"2021-08-14": {
"oldValue": "appl",
"newValue": "Apple"
}
},
"description": {
"2021-08-14": {
"oldValue": "appl",
"newValue": "Apple"
}
}
},
"byDate": {
"2021-06-01": {
"2021-08-14": {
"oldValue": "appl",
"newValue": "Apple"
}
},
"2021-06-06": {
"2021-08-14": {
"oldValue": "appl",
"newValue": "Apple"
}
}
}
}
}
]
}
Related
Im new in python but always trying to learn.
Today I got this error while trying select a key from dictionary:
print(data['town'])
KeyError: 'town'
My code:
import requests
defworld = "Pacera"
defcity = 'Svargrond'
requisicao = requests.get(f"https://api.tibiadata.com/v2/houses/{defworld}/{defcity}.json")
data = requisicao.json()
print(data['town'])
The json/dict looks this:
{
"houses": {
"town": "Venore",
"world": "Antica",
"type": "houses",
"houses": [
{
"houseid": 35006,
"name": "Dagger Alley 1",
"size": 57,
"rent": 2665,
"status": "rented"
}, {
"houseid": 35009,
"name": "Dream Street 1 (Shop)",
"size": 94,
"rent": 4330,
"status": "rented"
},
...
]
},
"information": {
"api_version": 2,
"execution_time": 0.0011,
"last_updated": "2017-12-15 08:00:00",
"timestamp": "2017-12-15 08:00:02"
}
}
The question is, how to print the pairs?
Thanks
You have to access the town object by accessing the houses field first, since there is nesting.
You want print(data['houses']['town']).
To avoid your first error, do
print(data["houses"]["town"])
(since it's {"houses": {"town": ...}}, not {"town": ...}).
To e.g. print all of the names of the houses, do
for house in data["houses"]["houses"]:
print(house["name"])
As answered, you must do data['houses']['town']. A better approach so that you don't raise an error, you can do:
houses = data.get('houses', None)
if houses is not None:
print(houses.get('town', None))
.get is a method in a dict that takes two parameters, the first one is the key, and the second parameter is ghe default value to return if the key isn't found.
So if you do in your example data.get('town', None), this will return None because town isn't found as a key in data.
I'm crawling a site and getting data from some ajax dropdowns, and the data is related.
So basically, for simplicity, let's say I crawl the 1st dropdown and it gives me, name and value, and I use the values to run a loop and for the next dropdown and get its name, value, etc. Let's say the data is for countries, then regions, then districts, etc.
So I can get the name, values; now I want to join each country to be filled with its related regions, and regions be filled with its related districts.
Sample Code:
import requests
from bs4 import BeautifulSoup
URL = "https://somesite.com/"
COUNTRIES = {
"NAME": 1,
"ANOTHER": 2
}
REGIONS = {}
DISTRICTS = {}
def fetch(s, url, value, store):
data = {
'id': str(value)
}
res = s.post(url, data=data)
soup = BeautifulSoup(res.content, 'html5lib')
options = soup.find_all('option')[1:]
for option in options:
name = option.text
value = option.get('value')
#value = option.attrs['value']
store[name] = value
for name, val in COUNTRIES.items():
fetch(requests, URL+"getregions", val, REGION)
for name, val in REGIONS.items():
fetch(requests, URL+"getdistricts", val, DISTRICTS)
I want to combine all this in the end to have one nested json/dict of the form:
DATA = {
"COUNTRY1": {
"REGION1": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
"REGION2": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
},
"COUNTRY2": {
"REGION1": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
"REGION2": {
"DISTRICT1": { "WARDS": ..... },
"DISTRICT2": { "WARDS": ..... },
},
},
}
If possible also in this form:
[{
country: "NAME",
region: "RNAME",
district: "DNAME",
ward: "WNAME"
},
{
country: "NAME",
region: "RNAME",
district: "DNAME",
ward: "WNAME"
},
For both SQL and NoSQL.
I've thought of closures and such but I just can't seem to find the logic to implement it.
Anyone who can help will be really thankful, preferred answer in Python, please.
I'm new to asking questions here and it took me a while to compose this question, I apologize if it's not concise and please ask if you haven't understood so I can explain more.
Is there any way to pull the key from JSON if the only thing I know is the value? (In groovy or python)
An example:
I know the "_number" value and I need a key.
So let's say, known _number is 2 and as an output, I should get dsf34f43f34f34f
{
"id": "8e37ecadf4908f79d58080e6ddbc",
"project": "some_project",
"branch": "master",
"current_revision": "3rtgfgdfg2fdsf",
"revisions": {
"43g5g534534rf34f43f": {
"_number": 3,
"created": "2019-04-16 09:03:07.459000000",
"uploader": {
"_account_id": 4
},
"description": "Rebase"
},
"dsf34f43f34f34f": {
"_number": 2,
"created": "2019-04-02 10:54:14.682000000",
"uploader": {
"_account_id": 2
},
"description": "Rebase"
}
}
}
With Groovy:
def json = new groovy.json.JsonSlurper().parse("x.json" as File)
println(json.revisions.findResult{ it.value._number==2 ? it.key : null })
// => dsf34f43f34f34f
Python 3: (assuming that data is saved in data.json):
import json
with open('data.json') as f:
json_data = json.load(f)
for rev, revdata in json_data['revisions'].items():
if revdata['_number'] == 2:
print(rev)
Prints all revs where _number equals 2.
using dict-comprehension:
print({k for k,v in d['revisions'].items() if v.get('_number') == 2})
OUTPUT:
{'dsf34f43f34f34f'}
I'm comparing json files between two different API endpoints to see which json records need an update, which need a create and what needs a delete. So, by comparing the two json files, I want to end up with three json files, one for each operation.
The json at both endpoints is structured like this (but they use different keys for same sets of values; different problem):
{
"records": [{
"id": "id-value-here",
"c": {
"d": "eee"
},
"f": {
"l": "last",
"f": "first"
},
"g": ["100", "89", "9831", "09112", "800"]
}, {
…
}]
}
So the json is represented as a list of dictionaries (with further nested lists and dictionaries).
If a given json endpoint (j1) id value ("id":) exists in the other endpoint json (j2), then that record should be added to j_update.
So far I have something like this, but I can see that .values() doesn't work because it's trying to operate on the list instead of on all the listed dictionaries(?):
j_update = {r for r in j1['records'] if r['id'] in
j2.values()}
This doesn't return an error, but it creates an empty set using test json files.
Seems like this should be simple, but tripping over the nesting I think of dictionaries in a list representing the json. Do I need to flatten j2, or is there a simpler dictionary method python has to achieve this?
====edit j1 and j2====
have same structure, use different keys; toy data
j1
{
"records": [{
"field_5": 2329309841,
"field_12": {
"email": "cmix#etest.com"
},
"field_20": {
"last": "Mixalona",
"first": "Clara"
},
"field_28": ["9002329309999", "9002329309112"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309831,
"field_12": {
"email": "mherbitz345#test.com"
},
"field_20": {
"last": "Herbitz",
"first": "Michael"
},
"field_28": ["9002329309831", "9002329309112", "8002329309999"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309855,
"field_12": {
"email": "nkatamaran#test.com"
},
"field_20": {
"first": "Noriss",
"last": "Katamaran"
},
"field_28": ["9002329309111", "8002329309112"],
"field_44": ["1002329309877"]
}]
}
j2
{
"records": [{
"id": 2329309831,
"email": {
"email": "mherbitz345#test.com"
},
"name_primary": {
"last": "Herbitz",
"first": "Michael"
},
"assign": ["8003329309831", "8007329309789"],
"hr_id": ["1002329309877"]
}, {
"id": 2329309884,
"email": {
"email": "yinleeshu#test.com"
},
"name_primary": {
"last": "Lee Shu",
"first": "Yin"
},
"assign": ["8002329309111", "9003329309831", "9002329309111", "8002329309999", "8002329309112"],
"hr_id": ["1002329309832"]
}, {
"id": 23293098338,
"email": {
"email": "amlouis#test.com"
},
"name_primary": {
"last": "Maxwell Louis",
"first": "Albert"
},
"assign": ["8002329309111", "8007329309789", "9003329309831", "8002329309999", "8002329309112"],
"hr_id": ["1002329309877"]
}]
}
If you read the json it will output a dict. You are looking for a particular key in the list of the values.
if 'records' in j2:
r = j2['records'][0].get('id', []) # defaults if id does not exist
It it prettier to do a recursive search but i dunno how you data is organized to quickly come up with a solution.
To give an idea for recursive search consider this example
def resursiveSearch(dictionary, target):
if target in dictionary:
return dictionary[target]
for key, value in dictionary.items():
if isinstance(value, dict):
target = resursiveSearch(value, target)
if target:
return target
a = {'test' : 'b', 'test1' : dict(x = dict(z = 3), y = 2)}
print(resursiveSearch(a, 'z'))
You tried:
j_update = {r for r in j1['records'] if r['id'] in j2.values()}
Aside from the r['id'/'field_5] problem, you have:
>>> list(j2.values())
[[{'id': 2329309831, ...}, ...]]
The id are buried inside a list and a dict, thus the test r['id'] in j2.values() always return False.
The basic solution is fairly simple.
First, create a set of j2 ids:
>>> present_in_j2 = set(record["id"] for record in j2["records"])
Then, rebuild the json structure of j1 but without the j1 field_5 that are not present in j2:
>>> {"records":[record for record in j1["records"] if record["field_5"] in present_in_j2]}
{'records': [{'field_5': 2329309831, 'field_12': {'email': 'mherbitz345#test.com'}, 'field_20': {'last': 'Herbitz', 'first': 'Michael'}, 'field_28': ['9002329309831', '9002329309112', '8002329309999'], 'field_44': ['1002329309832']}]}
It works, but it's not totally satisfying because of the weird keys of j1. Let's try to convert j1 to a more friendly format:
def map_keys(json_value, conversion_table):
"""Map the keys of a json value
This is a recursive DFS"""
def map_keys_aux(json_value):
"""Capture the conversion table"""
if isinstance(json_value, list):
return [map_keys_aux(v) for v in json_value]
elif isinstance(json_value, dict):
return {conversion_table.get(k, k):map_keys_aux(v) for k,v in json_value.items()}
else:
return json_value
return map_keys_aux(json_value)
The function focuses on dictionary keys: conversion_table.get(k, k) is conversion_table[k] if the key is present in the conversion table, or the key itself otherwise.
>>> j1toj2 = {"field_5":"id", "field_12":"email", "field_20":"name_primary", "field_28":"assign", "field_44":"hr_id"}
>>> mapped_j1 = map_keys(j1, j1toj2)
Now, the code is cleaner and the output may be more useful for a PUT:
>>> d1 = {record["id"]:record for record in mapped_j1["records"]}
>>> present_in_j2 = set(record["id"] for record in j2["records"])
>>> {"records":[record for record in mapped_j1["records"] if record["id"] in present_in_j2]}
{'records': [{'id': 2329309831, 'email': {'email': 'mherbitz345#test.com'}, 'name_primary': {'last': 'Herbitz', 'first': 'Michael'}, 'assign': ['9002329309831', '9002329309112', '8002329309999'], 'hr_id': ['1002329309832']}]}
I am trying to specify a name for my google spreadsheet api. This is done in the 'title' key value. I have tried with the below but it adds a new key to the existing json. Is there a way to get to the "title": "" and update that value with the new_date item?
prev_date = datetime.date.today()-datetime.timedelta(1)
new_date = str(prev_date.isoformat())
res = {
"requests": [
{
"addSheet": {
"properties": {
"title": ""
}
}
}
]
}
res['title'] = new_date
print (res)
This is the output:
{'requests': [{'addSheet': {'properties': {'title': ''}}}], 'title': '2016-12-29'}
This is what I would like it to be:
{'requests': [{'addSheet': {'properties': {'title': '2016-12-29'}}}]}
From the structure you mentioned, the title key that you need to modify is more nested than what you are providing with.
You need to make the following change:
prev_date = datetime.date.today()-datetime.timedelta(1)
new_date = str(prev_date.isoformat())
res = {
"requests": [
{
"addSheet": {
"properties": {
"title": ""
}
}
}
]
}
res['requests'][0]['addSheet']['properties']['title'] = new_date
print (res)
Where:
'requests' value is a list
0 is the first item in the list (and the only item)
'addSheet' is the key in the dictionary that is the value of the item in the list in the 0 index
'properties' is the key in the above dictionary
'title' is the key in the above dictonary, and the one you need upon your request
You are incorrectly indexing your JSON object and adding a new key named 'title' in the root of the object, while you are trying to update the value inside the array. In your case, you should be accessing res['requests'][0]['addSheet']['properties']['title'] = new_date
I now realize I can pass my variables directly in the json.
prev_date = datetime.date.today()-datetime.timedelta(1)
new_date = str(prev_date.isoformat())
req = {
"requests": [
{
"addSheet": {
"properties": {
"title": new_date
}
}