Join nested list to ID value

Join nested list to ID value - python

I retrieve data from my DB for a Python app and it comes in the following format (as a list, tbl):
[
{
"id": "rec2fiwnTQewTv9HC",
"createdTime": "2022-06-27T08:25:47.000Z",
"fields": {
"Num": 19,
"latitude": 31.101405,
"longitude": 36.391831,
"State": 2,
"Label": "xyz",
"Red": 0,
"Green": 255,
"Blue": 0
}
},
{
"id": "rec4y7vhgZVDHrhrQ",
"createdTime": "2022-06-27T08:25:47.000Z",
"fields": {
"Num": 30,
"latitude": 31.101405,
"longitude": 36.391831,
"State": 2,
"Label": "abc",
"Red": 0,
"Green": 255,
"Blue": 0
}
}
]
I can retrieve the values in the fields nested list by doing this:
pd.DataFrame([d['fields'] for d in tbl])
I would like to add the id field to each row of the dataframe but I can't figure out how to do this.

Try:
data = [
{
"id": "rec2fiwnTQewTv9HC",
"createdTime": "2022-06-27T08:25:47.000Z",
"fields": {
"Num": 19,
"latitude": 31.101405,
"longitude": 36.391831,
"State": 2,
"Label": "xyz",
"Red": 0,
"Green": 255,
"Blue": 0,
},
},
{
"id": "rec4y7vhgZVDHrhrQ",
"createdTime": "2022-06-27T08:25:47.000Z",
"fields": {
"Num": 30,
"latitude": 31.101405,
"longitude": 36.391831,
"State": 2,
"Label": "abc",
"Red": 0,
"Green": 255,
"Blue": 0,
},
},
]
df = pd.DataFrame([{"id": d["id"], **d["fields"]} for d in data])
print(df)
Prints:
id Num latitude longitude State Label Red Green Blue
0 rec2fiwnTQewTv9HC 19 31.101405 36.391831 2 xyz 0 255 0
1 rec4y7vhgZVDHrhrQ 30 31.101405 36.391831 2 abc 0 255 0

Related

How to return the last 5 values that exist before the last and penultimate in a JSON?

At the moment for me to be able to do this, I get the last 7 values, then I create a list with the first 5:
last_seven = response['graphPoints'][-7:]
only_five = [last_seven[0],last_seven[1],last_seven[2],last_seven[3],last_seven[4]]
As I'm still learning, I had to do it in this archaic way because I couldn't understand how I can get these 5 directly from ([-7:]) JSON minus the last and penultimate one, I would like some help to do it correctly.
My expected colect on this example are:
{
"minute": 33,
"value": 42
},
{
"minute": 34,
"value": 28
},
{
"minute": 35,
"value": 16
},
{
"minute": 36,
"value": -30
},
{
"minute": 37,
"value": -22
}
To make it easier, I leave here an example JSON in case you want to test it yourself:
{
"graphPoints": [
{
"minute": 1,
"value": 0
},
{
"minute": 2,
"value": 0
},
{
"minute": 3,
"value": 5
},
{
"minute": 4,
"value": 8
},
{
"minute": 5,
"value": 25
},
{
"minute": 6,
"value": 65
},
{
"minute": 7,
"value": 39
},
{
"minute": 8,
"value": 23
},
{
"minute": 9,
"value": -25
},
{
"minute": 10,
"value": -9
},
{
"minute": 11,
"value": -39
},
{
"minute": 12,
"value": -24
},
{
"minute": 13,
"value": -14
},
{
"minute": 14,
"value": -7
},
{
"minute": 15,
"value": 60
},
{
"minute": 16,
"value": 36
},
{
"minute": 17,
"value": 22
},
{
"minute": 18,
"value": 8
},
{
"minute": 19,
"value": 10
},
{
"minute": 20,
"value": 7
},
{
"minute": 21,
"value": 4
},
{
"minute": 22,
"value": 8
},
{
"minute": 23,
"value": 5
},
{
"minute": 24,
"value": 3
},
{
"minute": 25,
"value": 2
},
{
"minute": 26,
"value": 61
},
{
"minute": 27,
"value": 41
},
{
"minute": 28,
"value": 35
},
{
"minute": 29,
"value": 51
},
{
"minute": 30,
"value": 40
},
{
"minute": 31,
"value": 20
},
{
"minute": 32,
"value": 72
},
{
"minute": 33,
"value": 42
},
{
"minute": 34,
"value": 28
},
{
"minute": 35,
"value": 16
},
{
"minute": 36,
"value": -30
},
{
"minute": 37,
"value": -22
},
{
"minute": 38,
"value": -43
},
{
"minute": 39,
"value": -26
}
],
"periodTime": null,
"periodCount": 2
}

You want to get the first 5 values of the last seven values.
This can be done in two ways:
response['graphPoints'][-7:][5:]
Explaining the code above: First you get the last 7 values as a list. Then by typing [5:] you get the first 5 values of the result.
Better way
BUT there is a better way. You can do this in one indexing:
response['graphPoints'][-7:-2]
This way you tell python to give you the values that their indexes are -7, -6, -5, -4 and -3. Note that -2 is not in the range bacause the number after : is not in the results so it goes to before index -2 that is index -3.
I tested all these ways on your data and it works perfectly.

How to get specific data from JSON object in Python

I have a dict stored under the variable parsed:
{
"8119300029": {
"store": 4,
"total": 4,
"web": 4
},
"8119300030": {
"store": 2,
"total": 2,
"web": 2
},
"8119300031": {
"store": 0,
"total": 0,
"web": 0
},
"8119300032": {
"store": 1,
"total": 1,
"web": 1
},
"8119300033": {
"store": 0,
"total": 0,
"web": 0
},
"8119300034": {
"store": 2,
"total": 2,
"web": 2
},
"8119300036": {
"store": 0,
"total": 0,
"web": 0
},
"8119300037": {
"store": 0,
"total": 0,
"web": 0
},
"8119300038": {
"store": 2,
"total": 2,
"web": 2
},
"8119300039": {
"store": 3,
"total": 3,
"web": 3
},
"8119300040": {
"store": 3,
"total": 3,
"web": 3
},
"8119300041": {
"store": 0,
"total": 0,
"web": 0
}
}
I am trying to get the "web" value from each JSON entry but can only get the key values.
for x in parsed:
print(x["web"])
I tried doing this ^ but kept getting this error: "string indices must be integers". Can somebody explain why this is wrong?

because your x variable is dict key name
for x in parsed:
print(parsed[x]['web'])

A little information on your parsed data there: this is basically a dictionary of dictionaries. I won't go into too much of the nitty gritty but it would do well to read up a bit on json: https://www.w3schools.com/python/python_json.asp
In your example, for x in parsed is iterating through the keys of the parsed dictionary, e.g. 8119300029, 8119300030, etc. So x is a key (in this case, a string), not a dictionary. The reason you're getting an error about not indexing with an integer is because you're trying to index a string -- for example x[0] would give you the first character 8 of the key 8119300029.
If you need to get each web value, then you need to access that key in the parsed[x] dictionary:
for x in parsed:
print(parsed[x]["web"])
Output:
4
2
0
...

Take the first n dictionaries of a specific key in a sorted list

I writing a script which calculates the distance in miles between an order's shipping address and each store location for a specific chain of stores. So far, I have created a sorted list of dictionaries (sorted by order_id and then distance). It looks like this:
[
{
"order_id": 1,
"distance": 10,
"storeID": 1112
},
{
"order_id": 1,
"distance": 20,
"storeID": 1116
},
{
"order_id": 1,
"distance": 30,
"storeID": 1134
},
{
"order_id": 1,
"distance": 40,
"storeID": 1133
},
{
"order_id": 2,
"distance": 6,
"storeID": 1112
},
{
"order_id": 2,
"distance": 12,
"storeID": 1116
},
{
"order_id": 2,
"distance": 18,
"storeID": 1134
},
{
"order_id": 2,
"distance": 24,
"storeID": 1133
}
]
From here, I would like to find the two closest stores for each order_id, as well as their distances.
What I'd ultimately want to end up with is a list that looks like this:
[
{
"order_id": 1,
"closet_store_distance": 10,
"closest_store_id": 1112,
"second_closet_store_distance": 20,
"second_closest_store_id": 1116
},
{
"order_id": 2,
"closet_store_distance": 6,
"closest_store_id": 1112,
"second_closet_store_distance": 12,
"second_closest_store_id": 1116
}
]
I am unsure of how to loop through each order_id in this list and select the two closest stores. Any help is appreciated.

Try something like this, I made the assumption that the initial data was in a file called sample.txt.
import json
from operator import itemgetter
def make_order(stores, id):
return {
"order_id": id,
"closet_store_distance": stores[0][1],
"closest_store_id": stores[0][0],
"second_closet_store_distance": stores[1][1],
"second_closest_store_id": stores[1][0]
}
def main():
with open('sample.txt', 'r') as data_file:
data = json.loads(data_file.read())
id1 = {}
id2 = {}
for i in data:
if i["order_id"] == 1:
id1[i["storeID"]] = i["distance"]
else:
id2[i["storeID"]] = i["distance"]
top1 = sorted(id1.items(), key=itemgetter(1))
top2 = sorted(id2.items(), key=itemgetter(1))
with open('results.json', 'w') as result_file:
order1 = make_order(top1, 1)
order2 = make_order(top2, 2)
json.dump([order1, order2], result_file, indent=3, separators=(',', ': '))
if __name__ == '__main__':
main()
The resulting file looks like:
[
{
"second_closest_store_id": 1116,
"closet_store_distance": 10,
"closest_store_id": 1112,
"order_id": 1,
"second_closet_store_distance": 20
},
{
"second_closest_store_id": 1116,
"closet_store_distance": 6,
"closest_store_id": 1112,
"order_id": 2,
"second_closet_store_distance": 12
}
]

A nice readable answer (but using one of my free libraries.):
from PLOD import PLOD
order_store_list = [
{
"order_id": 1,
"distance": 10,
"storeID": 1112
},
{
"order_id": 1,
"distance": 20,
"storeID": 1116
},
{
"order_id": 1,
"distance": 30,
"storeID": 1134
},
{
"order_id": 1,
"distance": 40,
"storeID": 1133
},
{
"order_id": 2,
"distance": 6,
"storeID": 1112
},
{
"order_id": 2,
"distance": 12,
"storeID": 1116
},
{
"order_id": 2,
"distance": 18,
"storeID": 1134
},
{
"order_id": 2,
"distance": 24,
"storeID": 1133
}
]
#
# first, get the order_ids (place in a dictionary to ensure uniqueness)
#
order_id_keys = {}
for entry in order_store_list:
order_id_keys[entry["order_id"]] = True
#
# next, get the two closest stores per order_id
#
closest_stores = []
for order_id in order_id_keys:
top_two = PLOD(order_store_list).eq("order_id", order_id).sort("distance").returnList(limit=2)
closest_stores.append({
"order_id": order_id,
"closet_store_distance": top_two[0]["distance"],
"closest_store_id": top_two[0]["storeID"],
"second_closet_store_distance": top_two[1]["distance"],
"second_closest_store_id": top_two[1]["storeID"]
})
#
# sort by order_id again (if that is important)
#
closest_stores = PLOD(closest_stores).sort("order_id").returnList()
This example assumes the production order_store_list will fit in memory. If you are using a larger dataset, I strongly recommend using a database and python library for that database.
My PLOD library is free and open source (MIT), but requires Python 2.7. I'm about two weeks away from a Python 3.5 release. See https://pypi.python.org/pypi/PLOD/0.1.7

Create hierarchical json dump from list of dictionary in python

The table:
categories = Table("categories", metadata,
Column("id", Integer, primary_key=True),
Column("name", String),
Column("parent_id", Integer, ForeignKey("categories.id"),
CheckConstraint('id!=parent_id'), nullable=True),
)
A category can have many children, but only 1 parent. I have got the list of dictionary values as follows using CTE: eg. For id :14, parent is 13 and traversed from parent 8->10->12->13->14 where parent 8 has no parent id.
[
{
"id": 14,
"name": "cat14",
"parent_id": 13,
"path_info": [
8,
10,
12,
13,
14
]
},
{
"id": 15,
"name": "cat15",
"parent_id": 13,
"path_info": [
8,
10,
12,
13,
15
]
}
]
I would like to get the attributes of the parent also embedded as subcategories in the list as:
{
"id": 14,
"name": "cat14",
"parent_id": 13,
"subcats": [
{
"id: 8",
"name": "cat8",
"parent_id":null
},
{
"id: 10",
"name": "cat10",
"parent_id":8
},
{
"id: 12",
"name": "cat12",
"parent_id":10
},
and similarly for ids 13 and 14.....
]
},
{
"id": 15,
"name": "cat15",
"parent_id": 13,
"subcats": [
{
"id: 8",
"name": "cat8",
"parent_id":null
},
{
"id: 10",
"name": "cat10",
"parent_id":8
},
{
"id: 12",
"name": "cat12",
"parent_id":10
},
and similarly for ids 13, 14, 15.....
]
}
]
Notice that 'path_info' has been deleted from the dictionary and each id has been displayed with its details. I want json dumps with the above indented format. How to go about? Using flask 0.10, python 2.7

There is a tolerable way to do this with a few list/dict comprehensions.
lst = [{"id": 14, "name": "cat14", "parent_id": 13, "path_info": [8, 10, 12, 13, 14]}, {"id": 15, "name": "cat15", "parent_id": 13, "path_info": [8, 10, 12, 13, 15]}]
master_dct = { d['id'] : d for d in lst}
for d in lst:
d['subcats'] = [{field : master_dct[i][field] for field in ['id', 'name', 'parent_id']} \
for i in d['path_info'] if i in master_dct]
import json
with open('out.json', 'w') as f:
json.dump(lst, f)

You can perform it in python code:
Given we have a json object. I've slightly modified it - added absent nodes and wrap into an object as it is required by the specification:
{
"array": [
{
"id": 14,
"name": "cat14",
"parent_id": 13,
"path_info": [
8,
10,
12,
13,
14
]
},
{
"id": 15,
"name": "cat15",
"parent_id": 13,
"path_info": [
8,
10,
12,
13,
15
]
},
{
"id": 13,
"name": "cat13",
"parent_id": 12,
"path_info": [
8,
10,
12,
13
]
},
{
"id": 12,
"name": "cat12",
"parent_id": 10,
"path_info": [
8,
10,
12
]
},
{
"id": 10,
"name": "cat10",
"parent_id": 8,
"path_info": [
8,
10
]
},
{
"id": 8,
"name": "cat8",
"parent_id": null,
"path_info": [
8
]
}
]
}
Then you may use following code:
# load data above from file
j=json.load(open('json_file_above.json')) #
# the array with real data we need
a=j['array']
# auxiliary dict which have node identificators as keys and nodes as values
d={x['id']:x for x in a}
# here the magic begins :)
for x in a:
# add new key with list to each element
x['subcats'] = [
# compose dict element for subcats
dict(id=i, name=d[i]['name'], parent_id=d[i]['parent_id'])
for
i
in [
# we take path_info id list and
# cut off the first element - itself
y for y in x['path_info'][1:]
]
]
del x['path_info']
To be sure you are getting the thing you need:
>>> print(json.dumps(a, indent=True))
[
{
"name": "cat14",
"subcats": [
{
"name": "cat10",
"id": 10,
"parent_id": 8
},
{
"name": "cat12",
"id": 12,
"parent_id": 10
},
{
"name": "cat13",
"id": 13,
"parent_id": 12
},
{
"name": "cat14",
"id": 14,
"parent_id": 13
}
],
"id": 14,
"parent_id": 13
},
{
"name": "cat15",
"subcats": [
{
"name": "cat10",
"id": 10,
"parent_id": 8
},
{
"name": "cat12",
"id": 12,
"parent_id": 10
},
{
"name": "cat13",
"id": 13,
"parent_id": 12
},
{
"name": "cat15",
"id": 15,
"parent_id": 13
}
],
"id": 15,
"parent_id": 13
},
{
"name": "cat13",
"subcats": [
{
"name": "cat10",
"id": 10,
"parent_id": 8
},
{
"name": "cat12",
"id": 12,
"parent_id": 10
},
{
"name": "cat13",
"id": 13,
"parent_id": 12
}
],
"id": 13,
"parent_id": 12
},
{
"name": "cat12",
"subcats": [
{
"name": "cat10",
"id": 10,
"parent_id": 8
},
{
"name": "cat12",
"id": 12,
"parent_id": 10
}
],
"id": 12,
"parent_id": 10
},
{
"name": "cat10",
"subcats": [
{
"name": "cat10",
"id": 10,
"parent_id": 8
}
],
"id": 10,
"parent_id": 8
},
{
"name": "cat8",
"subcats": [],
"id": 8,
"parent_id": null
}
]
>>>

The pythonic code for this: Simple and straightforward
import json
categories = [] #input
def transform(category, child_node_id):
category['subcats'].append({
'id': child_node_id,
'name': 'cat%s' % child_node_id,
'parent_id': category['id']
})
for category in categories:
category['subcats'] = []
[transform(category, child_node_id) for child_node_id in category['path_info']]
category.pop('path_info', None)
print(json.dumps(categories, indent=4))

Count usage of foreign key in two tables in Flask-SqlAlchemy

I have three tables. One defines groups, with a groupid column, and the other two define users and events, which can each belong to a group.
class InterestGroup(db.Model):
__tablename__ = "interest_groups"
groupid = db.Column(db.Integer, primary_key=True)
groupcode = db.Column(db.String(10))
groupname = db.Column(db.String(200), unique=True)
class InterestGroupEvent(db.Model):
__tablename__ = "interest_group_events"
eventid = db.Column(db.Integer, db.ForeignKey("social_events.eventid"), primary_key=True)
groupid = db.Column(db.Integer, db.ForeignKey("interest_groups.groupid"), primary_key=True)
class InterestGroupUser(db.Model):
__tablename__ = "interest_group_users"
userid = db.Column(db.Integer, db.ForeignKey("users.userid"), primary_key=True)
groupid = db.Column(db.Integer, db.ForeignKey("interest_groups.groupid"), primary_key=True)
I want to list all the groups, with a count of how many users and how many events belong to each one, even if there aren't any.
I can get all the data I need, as follows:
IGs = InterestGroup.query.all()
IGEs = (
db.session.query(
InterestGroupEvent.groupid,
func.count(InterestGroupEvent.groupid)
)
.group_by(InterestGroupEvent.groupid)
.all()
)
IGUs = (
db.session.query(
InterestGroupUser.groupid,
func.count(InterestGroupUser.groupid)
)
.group_by(InterestGroupUser.groupid)
.all()
)
return json.dumps({
"groups": [{"groupid": IG.groupid} for IG in IGs],
"events": [{"groupid": IGE.groupid, "count": IGE[1]} for IGE in IGEs],
"users": [{"groupid": IGU.groupid, "count": IGU[1]} for IGU in IGUs]
})
which returns the following:
{
"events": [
{
"count": 2,
"groupid": 1
},
{
"count": 1,
"groupid": 2
}
],
"groups": [
{
"groupid": 1
},
{
"groupid": 2
},
{
"groupid": 3
}
],
"users": [
{
"count": 2,
"groupid": 1
},
{
"count": 1,
"groupid": 3
}
]
}
but what I want is the following:
[
{
"groupid": 1,
"eventcount": 2,
"usercount": 2
},
{
"groupid": 2,
"eventcount": 1,
"usercount": 0
},
{
"groupid": 3,
"eventcount": 0,
"usercount": 1
}
]
Obviously I could merge it manually, but I'm sure there's a way to get it direct from the database in a single query. I've tried the following:
IGs = (
db.session.query(InterestGroup.groupid,
func.count(InterestGroupEvent.groupid),
func.count(InterestGroupUser.groupid)
)
.group_by(InterestGroup.groupid)
.all()
)
return json.dumps([{"groupid": IG[0], "eventcount": IG[1], "usercount": IG[2]} for IG in IGs])
but that returns this:
[
{
"usercount": 9,
"groupid": 1,
"eventcount": 9
},
{
"usercount": 9,
"groupid": 2,
"eventcount": 9
},
{
"usercount": 9,
"groupid": 3,
"eventcount": 9
}
]
Hmm, this is close:
IGs = (
db.session.query(InterestGroup.groupid,
func.count(InterestGroupEvent.groupid),
func.count(InterestGroupUser.groupid)
)
.outerjoin(InterestGroupEvent)
.outerjoin(InterestGroupUser)
.group_by(InterestGroup.groupid)
.all()
)
return json.dumps([
{"groupid": IG[0], "eventcount": IG[1], "usercount": IG[2]} for IG in IGs
])
It returns:
[
{
"usercount": 4,
"groupid": 1,
"eventcount": 4
},
{
"usercount": 0,
"groupid": 2,
"eventcount": 1
},
{
"usercount": 1,
"groupid": 3,
"eventcount": 0
}
]
but the counts for group 1 are wrong. How should I go about it, please?
EDIT
In the absence of a better solution, this seems to work and will do for now:
IGEs = (
InterestGroup.query
.outerjoin(InterestGroupEvent)
.add_columns(func.count(InterestGroupEvent.groupid))
.group_by(InterestGroup.groupid)
.order_by(InterestGroup.groupid)
.all()
)
IGUs = (
InterestGroup.query
.outerjoin(InterestGroupUser)
.add_columns(func.count(InterestGroupUser.groupid))
.group_by(InterestGroup.groupid)
.order_by(InterestGroup.groupid)
.all()
)
results = []
for i in range(len(IGEs)):
results.append({"groupid": IGEs[i][0].groupid, "eventcount": IGEs[i][1], "usercount": IGUs[i][1]})
return json.dumps(results)
But I'd still like to know how to do it with a single query.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Join nested list to ID value - python

Related

How to return the last 5 values that exist before the last and penultimate in a JSON?

How to get specific data from JSON object in Python

Take the first n dictionaries of a specific key in a sorted list

Create hierarchical json dump from list of dictionary in python

Count usage of foreign key in two tables in Flask-SqlAlchemy

Categories

Resources