Create JSON from five top to bottom related tables

Create JSON from five top to bottom related tables - python

We run an app that is highly dependent on location. So, we have five models: Country, Province, District, Sector, Cell and Village:
What I want is to generate a JSON that represents them. What I tried aleady is quite long, but I noticed since the structure is the same, one chunk of the module would show my problem.
So, each cell can have multipe villages inside it:
cells=database.bring('SELECT id,name FROM batteries_cell WHERE sector_id=' + str(sectorID))
if cells:
for cell in cells:
cellID=cell[0]
cellName=cell[1]
cell_pro={'id':cellID,'name':cellName,'villages':{}}
villages=database.bring('SELECT id,name FROM batteries_village WHERE cell_id=' + str(cellID))
if villages:
for village in villages:
villageID=village[0]
villageName=village[1]
village_pro={'id':villageID, 'name':villageName}
cell_pro['villages'].update(village_pro)
However, the update just stores the last village for each cell. Any idea what I am doing wrong because I have been trying and deleting different ways to end up in the same result.
UPDATE needed output is:
[
{
"id": 1,
"name": "Ethiopia",
"villages": [{
"vid": 1,
"vname": "one"
},
{
"vid": 2,
"vname": "village two"
}
]
},
{
"id": 2,
"name": "Sene",
"villages": [{
"vid": 3,
"vname": "third"
},
{
"vid": 4,
"vname": "fourth"
}
]
}
]

The update keeps overwriting the same keys in the cell_pro villages dict. For example, if village_pro is {'id':'1', 'name':'A'}, then cell_pro['villages'].update(village_pro) will set cell_pro['villages']['id'] = '1' and cell_pro['villages']['name'] = 'A'. The next village in the loop will overwrite the id and name with something else.
You probably either want to make cell_pro['villages'] into a list or keep it as a dict and add the villages keyed by id:
cell_pro['villages'][villageID] = village_pro
What format do you want the resulting JSON to be? Maybe you just want:
cell_pro['villages'][villageID] = villageName
EDITED FOR DESIRED JSON ADDED TO QUESTION:
In the JSON, the villages are in an array. For that we use a list in Python. Note that cell_pro['villages'] is now a list and we use append() to add to it.
cells=database.bring('SELECT id,name FROM batteries_cell WHERE sector_id=' + str(sectorID))
if cells:
for cell in cells:
cellID=cell[0]
cellName=cell[1]
cell_pro={'id':cellID,'name':cellName,'villages':[]}
villages=database.bring('SELECT id,name FROM batteries_village WHERE cell_id=' + str(cellID))
if villages:
for village in villages:
villageID=village[0]
villageName=village[1]
village_pro={'vid':villageID, 'vname':villageName}
cell_pro['villages'].append(village_pro)
TIP: I don't know what database access module you're using but it's generally bad practice to build SQL queries like that because the parameters may not be escaped properly and could lead to SQL injection attacks or errors. Most modules have a way to build query strings with bound parameters that automatically safely escape variables in the query string.

Related

How can I best convert an API JSON object to a single row for SQL server?

I have a script setup to pull a JSON from an API and I need to convert objects into different columns for a single row layout for a SQL server. See the example below for the body raw layout of an example object:
"answers": {
"agent_star_rating": {
"question_id": 145,
"question_text": "How satisfied are you with the service you received from {{ employee.first_name }} today?",
"comment": "John was exceptionally friendly and knowledgeable.",
"selected_options": {
"1072": {
"option_id": 1072,
"option_text": "5",
"integer_value": 5
}
}
},
In said example I need the output for all parts of agent_star_rating to be individual columns so all data spits out 1 row for the entire survey on our SQL server. I have tried mapping several keys like so:
agent_star_rating = [list(response['answers']['agent_star_rating']['selected_options'].values())[0]['integer_value']]
agent_question = (response['answers']['agent_star_rating']['question_text'])
agent_comment = (response['answers']['agent_star_rating']['comment'])
response['agent_question'] = agent_question
response['agent_comment'] = agent_comment
response['agent_star_rating'] = agent_star_rating
I get the expected result until we reach a point where some surveys have skipped a field like ['question text'] and we'll get a missing key error. This happens over the course of other objects and I am failing to come up with a solution for these missing keys. If there is a better way to format the output as I've described beyond the keys method I've used I'd also love to hear ideas! I'm fresh to learning python/pandas so pardon any improper terminology!

I would do something like this:
# values that you always capture
row = ['value1', 'value2', ...]
gottem_attrs = {'question_id': '' ,
'question_text': '',
'comment': '',
'selected_options': ''}
# find and save the values that response have
for attr in list(response['agent_star_rating']):
gottem_attrs[attr] = response['agent_star_rating'][attr]
# then you have your final row
final_row = row + gottem_attrs.values()
If the response have a value in his attribute, this code will save it. Else, it will save a empty string for that value.

What python data structure can easily add new variables on the fly?

I'm trying to iteratively capture data on a fixed list of items but the data fields/attributes are not fixed at the start. As I iterate over each item new attributes may crop up which need to be added to the database. What is a good way to do this?
For an easy example, suppose the list has three items (people) and these are their attributes:
Person 1: Height=168cm, Eyes=Brown
Person 2: Height=155cm, Occupation=Teacher
Person 3: Age=43, Country=Spain, Occupation=Writer
For Person 1 two variables need to be captured: height and eye color. On the next iteration for Person 2 there is one existing attribute (height) and one new attribute (occupation). Person 3 has three new attributes that need to be added.
The list of items is currently stored in a pandas dataframe. My only idea so far is to create one field which stores all the attributes in a single dictionary item for each person (e.g.
[
{"Height": "168cm", "Eyes": "Brown"},
{"Height": "155cm, "Occupation": "Teacher"},
{"Age": "43", "Country": "Spain", "Occupation": "Writer"}
]
Is there a better way to store the data which will be easier to query later on? I'm very new to python. Thanks!

Try using a multi-dimensional dictionary. These are formatted as
dict2d = {
"Person1":{"Height": "168cm", "Eyes": "Brown"}
"Person2":{"Height": "155cm", "Occupation": "Teacher"}
"Person3":{"Age": "43", "Country": "Spain", "Occupation": "Writer"}
}
so in your code when you encounter a new element you can run something along the lines of
# code following the encounter
if person_name in dict2d:
dict2d[person_name[new_data_name]] = new_data_value
else:
dict2d[person_name] = {new_data_name : new_data_value}
# this is general pseudocode, there may be a few syntax errors
using these, you will be able to access an element of each person.

Assigning python dictionary's nested value without mentioning the immediate key

I have dozens of lines to update values in nested dictionary like this:
dictionary["parent-key"]["child-key"] = [whatever]
And that goes with different parent-key for each lines, but it always has the same child-keys.
Also, the [whatever] part is written in unique manner for each lines, so the simple recursion isn't the option here. (Although one might suggest to make a separate lists of value to be assigned, and assign them to each dictionary entry later on.)
Is there a way do the same but in even shorter manner to avoid duplicated part of the code?
I'd be happy if it could be written something like this:
update_child_val("parent-key") = [whatever]
By the way, that [whatever] part that I'm assigning will be a long and complicated code, therefore I don't wish to use function such as this:
def update_child_val(parent_key, child_val):
dictionary[parent_key]["child-key"] = child_val
update_child_val("parent-key", [whatever])
Specific Use Case:
I'm making ETL to convert database's table into CSV, and this is the part of the process. I wrote some bits of example below.
single_item_template = {
# Unique values will be assigned in place of `None`later
"name": {
"id": "name",
"name": "Product Name",
"val": None
},
"price": {
"id": "price",
"name": "Product Price (pre-tax)",
"val": None
},
"tax": {
"id": "tax",
"name": "Sales Tax",
"val": 10
},
"another column id": {
"id": "another column id",
"name": "another 'name' for this column",
"val": "another 'val' for this column"
},
..
}
And I have a separate area to assign values to the copy of the dictionary single_item_template for the each row of source database table.
for table_row in table:
item = Item(table_row)
Item class here will return the copy of dictionary single_item_template with updated values assigned for item[column][val]. And each of vals will involve unique process for changing values in setter function within the given class such as
self._item["name"]["val"] = table_row["prod_name"].replace('_', ' ')
self._item["price"]["val"] = int(table_row["price_0"].replace(',', ''))
..
etcetera, etcetera.
In above example, self._item can be shortened easily by assigning it to variable, but I was wondering if I could also save the last five character ["val"].
(..or putting the last logic part as a string and eval later, which I really really do not want to do.)
(So basically all I'm saying here is that I'm lazy typing out ["val"], but I don't bother doing it either. Although I was still interested if there's such thing while I'm not even sure such thing exists in programming in general..)

While you can't get away from doing the work, you can abstract it away in a couple of different ways.
Let's say you have a mapping of parent IDs to intended value:
values = {
'name': None,
'price': None,
'tax': 10,
'[another column id]': "[another 'val' for this column]"
}
Setting all of these at once is only two lines of code:
for parent, val in values.items():
dictionary[parent]['val'] = val
Unfortunately there isn't an easy or legible way to transform this into a dict comprehension. You can easily put this into a utility function that will turn it into a one-line call:
def set_children(d, parents, values, child='val'):
for parent, values in zip(parents, values):
d[parent][child] = value
set_children(dictionary, values.keys(), values.values())
In this case, your values mapping will encode the transformations you want to perform:
values = {
'name': table_row["prod_name"].replace('_', ' '),
'price': int(table_row["price_0"].replace(',', '')),
...
}

Parsing JSON output efficiently in Python?

The below block of code works however I'm not satisfied that it is very optimal due to my limited understanding of using JSON but I can't seem to figure out a more efficient method.
The steam_game_db is like this:
{
"applist": {
"apps": [
{
"appid": 5,
"name": "Dedicated Server"
},
{
"appid": 7,
"name": "Steam Client"
},
{
"appid": 8,
"name": "winui2"
},
{
"appid": 10,
"name": "Counter-Strike"
}
]
}
}
and my Python code so far is
i = 0
x = 570
req_name_from_id = requests.get(steam_game_db)
j = req_name_from_id.json()
while j["applist"]["apps"][i]["appid"] != x:
i+=1
returned_game = j["applist"]["apps"][i]["name"]
print(returned_game)
Instead of looping through the entire app list is there a smarter way to perhaps search for it? Ideally the elements in the data structure with 'appid' and 'name' were numbered the same as their corresponding 'appid'
i.e.
appid 570 in the list is Dota2
However element 570 in the data structure in appid 5069 and Red Faction
Also what type of data structure is this? Perhaps it has limited my searching ability for this answer already. (I.e. seems like a dictionary of 'appid' and 'element' to me for each element?)
EDIT: Changed to a for loop as suggested
# returned_id string for appid from another query
req_name_from_id = requests.get(steam_game_db)
j_2 = req_name_from_id.json()
for app in j_2["applist"]["apps"]:
if app["appid"] == int(returned_id):
returned_game = app["name"]
print(returned_game)

The most convenient way to access things by a key (like the app ID here) is to use a dictionary.
You pay a little extra performance cost up-front to fill the dictionary, but after that pulling out values by ID is basically free.
However, it's a trade-off. If you only want to do a single look-up during the life-time of your Python program, then paying that extra performance cost to build the dictionary won't be beneficial, compared to a simple loop like you already did. But if you want to do multiple look-ups, it will be beneficial.
# build dictionary
app_by_id = {}
for app in j["applist"]["apps"]:
app_by_id[app["appid"]] = app["name"]
# use it
print(app_by_id["570"])
Also think about caching the JSON file on disk. This will save time during your program's startup.

It's better to have the JSON file on disk, you can directly dump it into a dictionary and start building up your lookup table. As an example I've tried to maintain your logic while using the dict for lookups. Don't forget to encode the JSON it has special characters in it.
Setup:
import json
f = open('bigJson.json')
apps = {}
with open('bigJson.json', encoding="utf-8") as handle:
dictdump = json.loads(handle.read())
for item in dictdump['applist']['apps']:
apps.setdefault(item['appid'], item['name'])
Usage 1:
That's the way you have used it
for appid in range(0, 570):
if appid in apps:
print(appid, apps[appid].encode("utf-8"))
Usage 2: That's how you can query a key, using getinstead of [] will prevent a KeyError exception if the appid isn't recorded.
print(apps.get(570, 0))

REGEX Searching in pymongo

I am attempting to create a search in pymongo using REGEX. After the match, I want the data to be appended to a list in the module. I thought that I had everything set, but no matter what I set for the REGEX it returns 0 results. The code is below:
REGEX = '.*\.com'
def myModule(self, data)
#after importing everything and setting up the collection function in the DB I call the following:
cursor = collection.find({'multiple.layers.of.data' : REGEX})
data = []
for x in cursor:
matches.append(x)
return matches
This is but one module of three I am using to filter through a huge amount of json files that have been stored in a mongodb. However, no matter how many times I change this formatting such as /.*.com/ to declare in the operation or using the $regex in mongo...it never finds my data and appends it in the list.
EDIT: Adding in the full code along with what I am trying to identify:
RegEx = '.*\.com' #Or RegEx = re.compile('.*\.com')
def filterData(self, data):
db = self.client[self.dbName]
collection = db[self.collectionName]
cursor = collection.find({'data.item11.sub.level3': {'$regex': RegEx}})
data = []
for x in cursor:
data.append(x)
return data
I am attempting to parse through JSON data in a mongodb. The data is structured like so:
"data": {
"0": {
"item1": "something",
"item2": 0,
"item3": 000,
"item4": 000000000,
"item5": 000000000,
"item6": "0000",
"item7": 00,
"item8": "0000",
"item9": 00,
"item10": "useful",
"item11": {
"0000": {
"sub": {
"level": "letter",
"level1": 0000,
"level2": 0000000000,
"level3": "domain.com"
},
"more_data": "words"
}
}
}
UPDATE: After further testing it appears as though I need to include all of the layers in the search. Thus, it should look like
collection.find({'data.0.item11.0000.sub.level3': {'$regex': RegEx}}).
However, the "0" can be 1 - 50 and the "0000" is randomly generated. Is there a way to set these to index's as variables so that it will step into it no matter what the value? It will always be a number value.

Well, you need to tell mongodb the string should be treated as a regular expression, using the $regex operator:
cursor = collection.find({'multiple.layers.of.data' : {'$regex': REGEX}})
I think simply replacing REGEX = '.*\.com' with import re; REGEX = re.compile('.*\.com') might also work, but I'm not sure (would rely on a specific handling in the pymongo driver).
EDIT:
Regarding the wildcard part of the question: The answer is no.
In a nutshell, values that unknown should
never be assigned as keys because it makes querying very inefficient.
There are no 'wild card' queries.
It is better to restructure the database such that values that are
unknown are not keys
See:
MongoDB wildcard in the key of a query
http://groups.google.com/group/mongodb-user/browse_thread/thread/32b00d38d50bd858
https://groups.google.com/forum/#!topic/mongodb-user/TnAQMe-5ZGs

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create JSON from five top to bottom related tables - python

Related

How can I best convert an API JSON object to a single row for SQL server?

What python data structure can easily add new variables on the fly?

Assigning python dictionary's nested value without mentioning the immediate key

Parsing JSON output efficiently in Python?

REGEX Searching in pymongo

Categories

Resources