Refactorable database queries - python

Say I have category="foo" and a NoSQL query={"category"=category}. Whenever I refactor my variable name of category, I need to manually change it inside the query if I want to adopt it.
In Python 3.8+ I'm able to get the variable name as a string via the variable itself.
Now I could use query={f"{category=}".split("=")[0]=category}. Now refactoring changes the query too. This applies to any database queries or statements (SQL etc.).
Would this be bad practice? Not just concerning Python but any language where this is possible.

Would this be bad practice?
Yes, the names of local variables do not need to correlate with the fields in data stores.
You should be able to retrieve a record and filter on its fields with any python variable, no matter its name or if its nested in a larger data structure.
In pseudocode:
connection = datastore.connect(...)
# passing a string directly
connection.fetch({"category": "fruit"})
# passing a string variable
category_to_fetch = "vegetable"
connection.fetch({"category": category_to_fetch})
# something more exotic like a previous list of records
r = [("fish",)]
connection.fetch({"category": r[0][0]})
# or even a premade filter dictionary
filter = {"category": "meat"}
connection.fetch(filter)

Related

Create SQL database from dict with different features in python

I have the following dict:
base = {}
base['id1'] = {'apple':2, 'banana':4,'coconut':1}
base['id2'] = {'apple':4, 'pear':8}
base['id3'] = {'banana':1, 'tomato':2}
....
base['idN'] = {'pineapple':1}
I want to create a SQL database to store it. I normally use sqlite but here the number of variables (features in the dict) is not the same for all ids and I do not know all of them thus I cannot use the standard procedure.
Does someone know an easy way to do it ?
ids will get duplicated if you use the sql i would suggest use postgres as it has a jsonfield ypu can put your data there corresponding to each key. Assuming you are not constrained to use SQL.

Should I use python classes for a MySQL database insert program?

I have created a database to store NGS sequencing results. It consists of 17 tables to store all of the information. The results are stored in spreadsheets which I parse data from and store in variables using python (2.7), and then use the python package mysqldb to insert data into the database. I mainly use functions to obtain the information i need in variables, then write a loop in which I call this function followed by a 'try:' statement to insert. Here is a simple example:
def sample_processer(file):
my_file = open(file, 'r+')
samples = []
for line in my_file:
...get info...
samples.append(line[0])
return(samples)
samples = sample_processor('path/to/file')
for sample in samples:
try:
sql = "samsql = "INSERT IGNORE INTO sample(sample_id, diagnosis, screening) VALUES ("
samsql = samsql + "'"+sample+"'," +sam_screen_dict.get(sample)+"')"
except e:
db.rollback()
print("Something went wrong inserting data into the sample table: %s" %(e))
*sam_screen_dict is a dictionary i made from another function.
This is a simple table that I upload into but many of them call of different dictionaries to make sure the correct results are uploaded. However I was wondering whether there would be a more robust way in which to do this using a class.
For example, my sample_id has an associated screening attribute in the sample table, so this is easy to do with one dictionary. I have more complex junction tables, such as the table in which the sample_id, experiment_id and found mutation are stored, alongside other data, would it be a good idea to create a class for this table, calling on a simple 'sample' class to inherit from? That way I would always know that the results being inserted will be for the correct sample/experiment etc.
Also, using classes could I write rules for each attribute so that if the source spreadsheet is for some reason incorrect, it will not insert into the database?
I.e: sample_id is in the format A123/16. Therefore using a class it will check that the first character is 'A' and that sample_id[-3] should always == '/'. I know I could write these into functions, but I feel like it would take up so much space and time writing so many 'if' statements, that if it is stored once in a class then this would be alot better.
Has anybody done anything similar using classes to pass through their variables to test that they are correct before it gets to the insert stage and an error is created?
I am new to python classes and understand the basics, still trying to get to grips with them so a point in the right direction would be great - as would any help on how to go about actually writing the code for a python class that would be used to make a more robust database insertion program.
17tables it means you may use about 17 classes.
Use a simple script. webpy.db
https://github.com/webpy/webpy/blob/master/web/db.py just modify few code.
Then you can use webpy api: http://webpy.org/docs/0.3/api#web.db to finish your job.
Hope it's useful for you

SQLite: Why can't parameters be used to set an identifier?

I'm refactoring a little side project to use SQLite instead of a python data structure so that I can learn SQLite. The data structure I've been using is a list of dicts, where each dict's keys represent a menu item's properties. Ultimately, these keys should become columns in an SQLite table.
I first thought that I could create the table programmatically by creating a single-column table, iterating over the list of dictionary keys, and executing an ALTER TABLE, ADD COLUMN command like so:
# Various import statements and initializations
conn = sqlite3.connect(database_filename)
cursor = conn.cursor()
cursor.execute("CREATE TABLE menu_items (item_id text)")
# Here's the problem:
cursor.executemany("ALTER TABLE menu_items ADD COLUMN ? ?", [(key, type(value)) for key, value in menu_data[0].iteritems()])
After some more reading, I realized parameters cannot be used for identifiers, only for literal values. The PyMOTW on sqlite3 says
Query parameters can be used with select, insert, and update statements. They can appear in any part of the query where a literal value is legal.
Kreibich says on p. 135 of Using SQLite (ISBN 9780596521189):
Note, however, that parameters can only be used to replace literal
values, such as quoted strings or numeric values. Parameters
cannot be used in place of identifiers, such as table names or
column names. The following bit of SQL is invalid:
SELECT * FROM ?; -- INCORRECT: Cannot use a parameter as an identifier
I accept that positional or named parameters cannot be used in this way. Why can't they? Is there some general principle I'm missing?
Similar SO question:
Python sqlite3 string formatting
Identifiers are syntactically significant while variable values are not.
Identifiers need to be known at SQL compilation phase so that the compiled internal bytecode representation knows about the relevant tables, columns, indices and so on. Just changing one identifier in the SQL could result in a syntax error, or at least a completely different kind of bytecode program.
Literal values can be bound at runtime. Variables behave essentially the same in a compiled SQL program regardless of the values bound in them.
I don't know why, but every database I ever used has the same limitation.
I think it would be analogous to use a variable to hold the name of another variable. Most languages do not allow that, PHP being the only exception I know of.
Regardless of the technical reasons, dynamically choosing table/column names in SQL queries is a design smell, which is why most databases do not support it.
Think about it; if you were coding a menu in Python, would you dynamically create a class for each combination of menu items? Of course not; you'd have one Menu class that contains a list of menu items. It's similar in SQL too.
Most of the time, when people ask about dynamically choosing table names, it's because they've split up their data into different tables, like collection1, collection2, ... and use the name to select which collection to query from. This isn't a very good design; it requires the service to repeat the schema for each table, including indexes, constraints, permissions, etc, and also makes altering the schema harder (Need to add a field? Now you need to do it across hundreds of tables instead of one).
The correct way of designing the database would be to have a single collection table and add a collection_id column; instead of querying collection4, you'd add a WHERE collection_id = 4 constraint to your SELECT queries. Note that the 4 is now a value, and can be replaced with a query parameter.
For your case, I would use this schema:
CREATE TABLE menu_items (
item_id TEXT,
key TEXT,
value NONE,
PRIMARY KEY(item_id, key)
);
Use executemany to insert a row for each entry in the dictionary. When you need to load the dictionary, run a SELECT filtering on item_id and recreate the dictionary one row/entry at a time.
(Of course, as with everything in Software Engineering, there are exception. Tools that operate on schemas generically, such as ORMs, will need to specify table/column names dynamically.)

GAE python ndb - How to get_by_id with projection?

I'd like to do this.
Content.get_by_id(content_id, projection=['title'])
However, I got an error.
TypeError: Unknown configuration option ('projection')
I should do like this. How?
Content.query(key=Key('Content', content_id)).get(projection=['title'])
Why bother projection for getting an entity? Because Content.body could be large so that I want to reduce db read time and instance hours.
If you are using ndb, the below query should work
Content.query(key=Key('Content', content_id)).get(projection=[Content.title])
Note: It gets this data from the query index. So, make sure that index is enabled for the column. Reference https://developers.google.com/appengine/docs/python/ndb/queries#projection
I figured out that following code.
Content.query(Content.key == ndb.Key('Content', content_id)).get(projection=['etag'])
I found a hint from https://developers.google.com/appengine/docs/python/ndb/properties
Don't name a property "key." This name is reserved for a special
property used to store the Model key. Though it may work locally, a
property named "key" will prevent deployment to App Engine.
There is a simpler method than the currently posted answers.
As previous answers have mentioned, projections are only for ndb.Queries.
Previous answers suggest to use the entity returned by get_by_id to perform a projection query in the form of:
<Model>.query(<Model>.key == ndb.Key('<Model>', model_id).get(projection=['property_1', 'property_2', ...])
However, you can just manipulate the model's _properties directly. (See: https://cloud.google.com/appengine/docs/standard/python/ndb/modelclass#intro_properties)
For example:
desired_properties = ['title', 'tags']
content = Content.get_by_id(content_id)
content._properties = {k: v for k, v in content._properties.iteritems()
if k in desired_properties}
print content
This would update the entity properties and only return those properties whose keys are in the desired_properties list.
Not sure if this is the intended functionality behind _properties but it works, and it also prevents the need of generating/maintaining additional indexes for the projection queries.
The only down-side is that this retrieves the entire entity in-memory first. If the entity has arbitrarily large metadata properties that will affect performance, it would be a better idea to use the projection query instead.
Projection is only for query, not get by id. You can put the content.body in a different db model and store only the ndb.Key of it in the Content.

How to check a document includes a list of fields

I have a mongodb collection that has documents that include both required and non-required data. I know how to create a query using the $exists operator to check if a field exists, however I do not want to define required field within the query, as the list is both long and subject to change (and is define elsewhere).
The following is great for checking a known field:
db.collectionofstuff.find({fieldIneed:{$exists:False}})
However I want something that function like this:
Using this Config file:
datadescriptorjson = {"thing1": {"count": 2,"range": 3},"thing2":{"pace": 12.5, "consistency": "angry"}}
create a query find/aggregation that looks something like this:
db.collectionofstuff.find({<list of fields from datadescriptorjson>:{$exists:Falze}})
I am not aware of anyway to do it directly with either the aggregation framework or using a simple find.
There is no such function, you will have to test each field manually. You can of course loop over your config data and recreate a query out of this. However, this should be something you do in your application.

Categories

Resources