Execute user-defined query on list of dictionaries - python

I have a set of data that a user needs to query using their own query string. The current solution creates a temporary in-memory sqlite database that the query is run against.
The dataset is a list of "flat" dictionaries, i.e. there is no nested data. The query string does not need to be SQL, but it should be simple to define using an existing query framework.
It needs to support ordering (ascending, descending, custom) and filtering.
The purpose of this question is to get a range of different solutions that might work for this use case.
import sqlite3
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
# Assemble temporary sqlite database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
knownTypes = { "id": "real", "height": "real", "comment": "text" }
allKeys = list(set().union(*(d.keys() for d in items)))
allTypes = list(knownTypes.get(k, "text") for k in allKeys)
createTable_query = "CREATE TABLE data ({});".format(", ".join(["{} {}".format(x[0], x[1]) for x in zip(allKeys, allTypes)]))
cur.execute(createTable_query)
conn.commit()
qs = ["?" for i in range(len(allKeys))]
insertRow_query = "INSERT INTO data VALUES ({});".format(", ".join(qs))
for p in items:
vals = list([p.get(k, None) for k in allKeys])
cur.execute(insertRow_query, vals)
conn.commit()
# modify user query here
theUserQuery = "SELECT * FROM data"
# Get data from query
data = [row for row in cur.execute(theUserQuery)]

YAQL is what I'm looking for.
It doesn't do SQL, but it does execute a query string - which is a simple way to do complex user-defined sorting and filtering.

There's a library called litebox that does what you want. It is backed by SQLite.
from litebox import LiteBox
items = [
{'id': 1},
{'id': 2, 'description': 'This is a description'},
{'id': 3, 'comment': 'This is a comment'},
{'id': 4, 'height': 1.78}
]
types = {"id": int, "height": float, "comment": str}
lb = LiteBox(items, types)
lb.find("height > 1.5")
Result: [{'id': 4, 'height': 1.78}]

Related

Multi Level JSON Data into SQL Using Python

I've been working on taking JSON data and dumping it into a SQL database. I've run across some data that is "multi level" and I'm stuck on finding the best approach to handle the data and create the correct table structure in SQL.
Here's the portion of the JSON data that has a multi level structure:
"LimitedTaxonomy": {
"Children": [
{
"Children": [
{
"Children": [],
"NewPartCount": 0,
"Parameter": "Categories",
"ParameterId": -8,
"PartCount": 1,
"Value": "Logic - Flip Flops",
"ValueId": "706"
}
],
"NewPartCount": 0,
"Parameter": "Categories",
"ParameterId": -8,
"PartCount": 1,
"Value": "Integrated Circuits (ICs)",
"ValueId": "32"
}
],
"NewPartCount": 0,
"Parameter": "Categories",
"ParameterId": -8,
"PartCount": 1,
"Value": "Out of Bounds",
"ValueId": "0"
}
I have a function that I call when I'm parsing the JSON data that takes the structure and puts data in the SQL tables. Using the JSON data above I'd be passing:
thetable = 'LimitedTaxonomy'
the value = {'Children': [{'Children': [{'Children': [], 'NewPartCount': 0, 'Parameter': 'Categories', 'ParameterId': -8, 'PartCount': 1, 'Value': 'Logic - Flip Flops', 'ValueId': '706'}], 'NewPartCount': 0, 'Parameter': 'Categories', 'ParameterId': -8, 'PartCount': 1, 'Value': 'Integrated Circuits (ICs)', 'ValueId': '32'}], 'NewPartCount': 0, 'Parameter': 'Categories', 'ParameterId': -8, 'PartCount': 1, 'Value': 'Out of Bounds', 'ValueId': '0'}
def create_sql(thetable, thevalue):
if len(thevalue) > 0:
#print(thevalue[0], len(thevalue))
if type(thevalue) is list:
x = tuple(thevalue[0])
elif type(thevalue) is dict:
x = tuple(thevalue)
a = str(x).replace("'","").replace("(","")
query = "INSERT INTO " + thetable + " (PartDetailsId, " + a + " VALUES(" + str(data['PartDetails']['PartId'])
b = ""
for i in range(len(x)):
b += ", ?" # I've also seen %s used as a placeholder
b += ")"
query += b
print(query)
print(thevalue)
print()
#TODO Need to check before entering data, list has many records, dict has 1
if type(thevalue) is list:
cursor.executemany(query, [tuple(d.values()) for d in thevalue])
elif type(thevalue) is dict:
cursor.execute(query, tuple(thevalue.values()))
cursor.commit()
The above function seems to work well with "single level" JSON but this data has basically a table/array (Children) as a column.
In SQL I have the 2 tables as defined like this, which might not be the way to handle this:
SELECT TOP (1000) [LimitedTaxonomyId]
,[PartDetailsId]
,[NewPartCount]
,[Parameter]
,[ParameterId]
,[PartCount]
,[Value]
,[ValueId]
FROM [Components].[dbo].[LimitedTaxonomy]
SELECT TOP (1000) [ChildrenId]
,[LimitedTaxonomyId]
,[NewPartCount]
,[Parameter]
,[ParameterId]
,[PartCount]
,[Value]
,[ValueId]
FROM [Components].[dbo].[Children]
I think I need to check thevalue and if contains a list I need to first pull that list out of thevalue and then feed remaining data into my create_sql function. After that I then put that extracted list back through the create_sql function but first querying the SQL Database to grab the [LimitedTaxonomyId] value that was just entered.
This sounds like a big mess and maybe it's the only way but I'd like some 2nd opinions on how I'm going about this and if there's perhaps a better way.

How to concatenate structs in a loop in python

I am trying to search for all users in an sql database whose first names are "blah" and return that data to my html through an ajax call. I have this functioning with a single user like this:
user = db.execute(
'SELECT * FROM user WHERE genres LIKE ?', (str,)
).fetchone()
user_details = {
'first': user['first'],
'last': user['last'],
'email': user['email']
}
y = json.dumps(user_details)
return jsonify(y)
Now for multiple users I want the struct to look something like this:
users{
user1_details = {
'first': user['first'],
'last': user['last'],
'email': user['email']
}
user2_details = {
'first': user2['first'],
'last': user2['last'],
'email': user2['email']
}
user3_details = {
'first': user3['first'],
'last': user3['last'],
'email': user3['email']
}
}
generating each user_details in a loop. I know I can use fetchall() to find all the users, but how do I concatenate the details?
Fetch all the rows after the query, then structure the results as you'd like.
Example:
db = mysql.connection.cursor()
# query
db.execute('SELECT * FROM user')
# returned columns
header = [x[0] for x in db.description]
# returned rows
results = db.fetchall()
#data to be returned
users_object = {}
#structure results
for result in results:
users_object[result["user_id"]] = dict(zip(header,result))
return jsonify(users_object)
As you can see in under "#structure results", you just loop through the results and insert the data for each row into the users_object with key equal to "user_id" for example.
If you want the results in an array just convert users_object into an array e.g. users_array and append the dict to the array within the loop instead
The keys in the desired users dictionary do not seem particularly useful so you could instead build a list of user dicts. It's easy to go directly from fetchall() to such a list:
result = db.execute('SELECT * FROM user WHERE genres LIKE ?', (str,))
users = [{'first': first, 'last': last, 'email': email} for first, last, email in result.fetchall()]
return jsonify(users)
To return a dict containing the user list:
return jsonify({'users': users})

Printing the response of a RethinkDB query in a reasonable way

I am participating in the Yelp Dataset Challenge and I'm using RethinkDB to store the JSON documents for each of the different datasets.
I have the following script:
import rethinkdb as r
import json, os
RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
DB = 'test'
connection = r.connect(host=RDB_HOST, port=RDB_PORT, db=DB)
query = r.table('yelp_user').filter({"name":"Arthur"}).run(connection)
print(query)
But when I run it at the terminal in a virtualenv I get this as an example response:
<rethinkdb.net.DefaultCursor object at 0x102c22250> (streaming):
[{'yelping_since': '2014-03', 'votes': {'cool': 1, 'useful': 2, 'funny': 1}, 'review_count': 5, 'id': '08eb0b0d-2633-4ec4-93fe-817a496d4b52', 'user_id': 'ZuDUSyT4bE6sx-1MzYd2Kg', 'compliments': {}, 'friends': [], 'average_stars': 5, 'type': 'user', 'elite': [], 'name': 'Arthur', 'fans': 0}, ...]
I know I can use pprint to pretty print outputs but a bigger issue that I don't understand how to resolve is just printing them in an intelligent manner, like not just showing "..." as the end of the output.
Any suggestions?
run returns an iterable cursor. Iterate over it to get all the rows:
query = r.table('yelp_user').filter({"name":"Arthur"})
for row in query.run(connection):
print(row)

sqlalchemy: previous row and next row by id

I have a table Images with id and name. I want to query its previous image and next image in the database using sqlalchemy. How to do it in only one query?
sel = select([images.c.id, images.c.name]).where(images.c.id == id)
res = engine.connect().execute(sel)
#How to obtain its previous and next row?
...
Suppose it is possible that some rows have been deleted, i.e., the ids may not be continuous. For example,
Table: Images
------------
id | name
------------
1 | 'a.jpg'
2 | 'b.jpg'
4 | 'd.jpg'
------------
prev_image = your_session.query(Images).order_by(Images.id.desc()).filter(Images.id < id).first()
next_image = your_session.query(Images).order_by(Images.id.asc()).filter(Images.id > id).first()
# previous
prv = select([images.c.id, images.c.name]).where(images.c.id < id).order_by(images.c.id.desc()).limit(1)
res = engine.connect().execute(prv)
for res in res:
print(res.id, res.name)
# next
nxt = select([images.c.id, images.c.name]).where(images.c.id > id).order_by(images.c.id).limit(1)
res = engine.connect().execute(nxt)
for res in res:
print(res.id, res.name)
This can be accomplished in a "single" query by taking the UNION of two queries, one to select the previous and target records and one to select the next record (unless the backend is SQLite, which does not permit an ORDER BY before the final statement in a UNION):
import sqlalchemy as sa
...
with engine.connect() as conn:
target = 3
query1 = sa.select(tbl).where(tbl.c.id <= target).order_by(tbl.c.id.desc()).limit(2)
query2 = sa.select(tbl).where(tbl.c.id > target).order_by(tbl.c.id.asc()).limit(1)
res = conn.execute(query1.union(query2))
for row in res:
print(row)
producing
(2, 'b.jpg')
(3, 'c.jpg')
(4, 'd.jpg')
Note that we could make the second query the same as the first, apart from reversing the inequality
query2 = sa.select(tbl).where(tbl.c.id >= target).order_by(tbl.c.id.asc()).limit(2)
and we would get the same result as the union would remove the duplicate target row.
If the requirement were to find the surrounding rows for a selection of rows we could use the lag and lead window functions, if they are supported.
# Works in PostgreSQL, MariaDB and SQLite, at least.
with engine.connect() as conn:
query = sa.select(
tbl.c.id,
tbl.c.name,
sa.func.lag(tbl.c.name).over(order_by=tbl.c.id).label('prev'),
sa.func.lead(tbl.c.name).over(order_by=tbl.c.id).label('next'),
)
res = conn.execute(query)
for row in res:
print(row._mapping)
Output:
{'id': 1, 'name': 'a.jpg', 'prev': None, 'next': 'b.jpg'}
{'id': 2, 'name': 'b.jpg', 'prev': 'a.jpg', 'next': 'c.jpg'}
{'id': 3, 'name': 'c.jpg', 'prev': 'b.jpg', 'next': 'd.jpg'}
{'id': 4, 'name': 'd.jpg', 'prev': 'c.jpg', 'next': 'e.jpg'}
{'id': 5, 'name': 'e.jpg', 'prev': 'd.jpg', 'next': 'f.jpg'}
{'id': 6, 'name': 'f.jpg', 'prev': 'e.jpg', 'next': None}
To iterate through your records. I think that this is what you're looking for.
for row in res:
print row.id
print row.name

Unable to access dict values indjango view

I want to save an array of objects passed from javascript through ajax to me database. This is my view code:
data2 = json.loads(request.raw_get_data)
for i in data2:
print(key)
obj = ShoppingCart(quantity = i.quantity , user_id = 3, datetime = datetime.now(), product_id = i.pk)
obj.save()
return render_to_response("HTML.html",RequestContext(request))
After the first line, i get this in my dictionary:
[{'model': 'Phase_2.product', 'fields': {'name': 'Bata', 'category': 2, 'quantity': 1, 'subcategory': 1, 'count': 2, 'price': 50}, 'imageSource': None, 'pk': 1}]
(Only one object in the array right now)
I want to be able access individual fields like quantity, id, etc in order to save the data to my database. When i debug this code, it gives a name error on 'i'. I also tried accessing the fields like this: data2[0].quantity but it gives this error: {AttributeError}dict object has no attribute quantity.
Edited code:
for i in data2:
name = i["fields"]["name"]
obj = ShoppingCart(quantity = i["fields"]["quantity"] , user_id = 3, datetime = datetime.now(), product_id = i["fields"]["pk"])
obj.save()
It might help you to visualise the returned dict with proper formatting:
[
{
'model': 'Phase_2.product',
'fields': {
'name': 'Bata',
'category': 2,
'quantity': 1,
'subcategory': 1,
'count': 2,
'price': 50
},
'imageSource': None,
'pk': 1
}
]
The most likely reason for your error is that you are trying to access values of of the inner 'fields' dictionary as if they belong to the outer i dictionary.
i.e.
# Incorrect
i["quantity"]
# Gives KeyError
# Correct
i["fields"]["quantity"]
Edit
You have the same problem in your update:
# Incorrect
i["fields"]["pk"]
# Correct
i["pk"]
The "pk" field is in the outer dictionary, not the inner "fields" dictionary.
You may try:
i['fields']['quantity']
The json.loads() returns you a dictionary, which should be accessed by key.

Categories

Resources