More general way of generating PyODBC queries as a dict? - python

Here are my averagely general class methods for creating a dictionary from the result of database queries:
def make_schema_dict(self):
schema = [i[2] for i in self.cursor.tables()
if i[2].startswith('tbl_') or i[2].startswith('vw_')]
self.schema = {table: {'scheme': [row.column_name for row
in self.cursor.columns(table)]}
for table in schema}
def last_table_query_as_dict(self, table):
return {'data': [{col: row.__getattribute__(col) for col in self.schema[table]['scheme']
if col != 'RowNum'} for row in self.cursor.fetchall()]}
Unfortunately as you can see, there are many complications.
For example, when multiple tables are queried; some hackish lambdas are required to generate the resulting dictionary.
Can you think of some more general methods?

You should be able to use row.cursor_description to make this a lot simpler. This should get you a list of dictionaries for the results:
[{c[0]: v for (c, v) in zip(row.cursor_description, row)} for row in self.cursor.fetchall()]

A neat solution can be found in this thread: https://groups.google.com/forum/?fromgroups#!topic/pyodbc/BVIZBYGXNsk
The root of the idea being, subclass Connection to use a custom Cursor class, have the Cursor class automatically construct dicts for you. I'd call this a fancy pythonic solution. You could also just have an additional function fetchonedict() and extend the Cursor class rather than override so you could retain default behavior.
class ConnectionWrapper(object):
def __init__(self, cnxn):
self.cnxn = cnxn
def __getattr__(self, attr):
return getattr(self.cnxn, attr)
def cursor(self):
return CursorWrapper(self.cnxn.cursor())
class CursorWrapper(object):
def __init__(self, cursor):
self.cursor = cursor
def __getattr__(self, attr):
return getattr(self.cursor, attr)
def fetchone(self):
row = self.cursor.fetchone()
if not row:
return None
return dict((t[0], value) for t, value in zip (self.cursor.description, row))
Additionally, while not for PyODBC, check out this stackoverflow answer for links to DictCursor classes for MySQL and OurSQL if you need some inspiration for design.

Related

Not able handle "on_conflict_do_nothing" in Sqlalchemy in Mysql [duplicate]

Is there an elegant way to do an INSERT ... ON DUPLICATE KEY UPDATE in SQLAlchemy? I mean something with a syntax similar to inserter.insert().execute(list_of_dictionaries) ?
ON DUPLICATE KEY UPDATE post version-1.2 for MySQL
This functionality is now built into SQLAlchemy for MySQL only. somada141's answer below has the best solution:
https://stackoverflow.com/a/48373874/319066
ON DUPLICATE KEY UPDATE in the SQL statement
If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a #compiles decorator.
The code (linked from a good thread on the subject on reddit) for an example can be found on github:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if 'append_string' in insert.kwargs:
return s + " " + insert.kwargs['append_string']
return s
my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)
But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.
ON DUPLICATE KEY UPDATE functionality within the ORM
SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.
session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.
But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.
def get_or_create(session, model, defaults=None, **kwargs):
instance = session.query(model).filter_by(**kwargs).first()
if instance:
return instance
else:
params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
if defaults:
params.update(defaults)
instance = model(**params)
return instance
I should mention that ever since the v1.2 release, the SQLAlchemy 'core' has a solution to the above with that's built in and can be seen under here (copied snippet below):
from sqlalchemy.dialects.mysql import insert
insert_stmt = insert(my_table).values(
id='some_existing_id',
data='inserted value')
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
data=insert_stmt.inserted.data,
status='U'
)
conn.execute(on_duplicate_key_stmt)
Based on phsource's answer, and for the specific use-case of using MySQL and completely overriding the data for the same key without performing a DELETE statement, one can use the following #compiles decorated insert expression:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def append_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
if insert.kwargs.get('on_duplicate_key_update'):
fields = s[s.find("(") + 1:s.find(")")].replace(" ", "").split(",")
generated_directive = ["{0}=VALUES({0})".format(field) for field in fields]
return s + " ON DUPLICATE KEY UPDATE " + ",".join(generated_directive)
return s
It's depends upon you. If you want to replace then pass OR REPLACE in prefixes
def bulk_insert(self,objects,table):
#table: Your table class and objects are list of dictionary [{col1:val1, col2:vale}]
for counter,row in enumerate(objects):
inserter = table.__table__.insert(prefixes=['OR IGNORE'], values=row)
try:
self.db.execute(inserter)
except Exception as E:
print E
if counter % 100 == 0:
self.db.commit()
self.db.commit()
Here commit interval can be changed to speed up or speed down
My way
import typing
from datetime import datetime
from sqlalchemy.dialects import mysql
class MyRepository:
def model(self):
return MySqlAlchemyModel
def upsert(self, data: typing.List[typing.Dict]):
if not data:
return
model = self.model()
if hasattr(model, 'created_at'):
for item in data:
item['created_at'] = datetime.now()
stmt = mysql.insert(getattr(model, '__table__')).values(data)
for_update = []
for k, v in data[0].items():
for_update.append(k)
dup = {k: getattr(stmt.inserted, k) for k in for_update}
stmt = stmt.on_duplicate_key_update(**dup)
self.db.session.execute(stmt)
self.db.session.commit()
Usage:
myrepo.upsert([
{
"field11": "value11",
"field21": "value21",
"field31": "value31",
},
{
"field12": "value12",
"field22": "value22",
"field32": "value32",
},
])
The other answers have this covered but figured I'd reference another good example for mysql I found in this gist. This also includes the use of LAST_INSERT_ID, which may be useful depending on your innodb auto increment settings and whether your table has a unique key. Lifting the code here for easy reference but please give the author a star if you find it useful.
from app import db
from sqlalchemy import func
from sqlalchemy.dialects.mysql import insert
def upsert(model, insert_dict):
"""model can be a db.Model or a table(), insert_dict should contain a primary or unique key."""
inserted = insert(model).values(**insert_dict)
upserted = inserted.on_duplicate_key_update(
id=func.LAST_INSERT_ID(model.id), **{k: inserted.inserted[k]
for k, v in insert_dict.items()})
res = db.engine.execute(upserted)
return res.lastrowid
ORM
use upset func based on on_duplicate_key_update
class Model():
__input_data__ = dict()
def __init__(self, **kwargs) -> None:
self.__input_data__ = kwargs
self.session = Session(engine)
def save(self):
self.session.add(self)
self.session.commit()
def upsert(self, *, ingore_keys = []):
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in self.__input_data__.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = self.__input_data__[key]
insert_stmt = insert(self.__table__).values(**udpate_data)
all_ignore_keys = ['id']
if isinstance(ingore_keys, list):
all_ignore_keys =[*all_ignore_keys, *ingore_keys]
else:
all_ignore_keys.append(ingore_keys)
udpate_columns = dict()
for key in self.__input_data__.keys():
if key not in column_keys or key in all_ignore_keys:
continue
else:
udpate_columns[key] = insert_stmt.inserted[key]
on_duplicate_key_stmt = insert_stmt.on_duplicate_key_update(
**udpate_columns
)
# self.session.add(self)
self.session.execute(on_duplicate_key_stmt)
self.session.commit()
class ManagerAssoc(ORM_Base, Model):
def __init__(self, **kwargs):
self.id = idWorker.get_id()
column_keys = self.__table__.columns.keys()
udpate_data = dict()
for key in kwargs.keys():
if key not in column_keys:
continue
else:
udpate_data[key] = kwargs[key]
ORM_Base.__init__(self, **udpate_data)
Model.__init__(self, **kwargs, id = self.id)
....
# you can call it as following:
manager_assoc.upsert()
manager.upsert(ingore_keys = ['manager_id'])
Got a simpler solution:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
my_connection.execute(my_table.insert(replace_string=""), my_values)
I just used plain sql as:
insert_stmt = "REPLACE INTO tablename (column1, column2) VALUES (:column_1_bind, :columnn_2_bind) "
session.execute(insert_stmt, data)
As none of these solutions seem all the elegant. A brute force way is to query to see if the row exists. If it does delete the row and then insert otherwise just insert. Obviously some overhead involved but it does not rely on modifying the raw sql and it works on non orm stuff.

Design a connection to multiple databases in Python

I have a Python application which uses both SQLite and Postgresql. It has a connector class for each database:
class PostgresqlDatabase(Database):
...
class SQLite(Database):
...
Both class share the same methods and logic, and the only thing that differentiate them the SQL is the parametrization of SQL queries. Most of the SQL queries are even identical, e.g. both have a method called _get_tag:
# postgresql method with %s
def _get_tag(self, tagcipher):
sql_search = "SELECT ID FROM TAG WHERE DATA = %s"
self._cur.execute(sql_search, ([tagcipher]))
rv = self._cur.fetchone()
return rv
# sqlite method with ?
def _get_tag(self, tagcipher):
sql_search = "SELECT ID FROM TAG WHERE DATA = ?"
self._cur.execute(sql_search, ([tagcipher]))
rv = self._cur.fetchone()
return rv
To really make it clear, the classes have exact identical method names. The SQL queries differ in each method. So what is my problem?
I find maintaining both classes annoying, and I feel a common class would benefit the code in the long run.
However, creating a common class, would create a complex code. The __init__ would probably have to initialize the correct underlying cursor. This would create a small starting overhead, and small performance penalty if for example I would lookup the correct string every time, e.g.
#property:
def sql_search(self):
return "SELECT ID FROM TAG WHERE DATA = {}".format(
'?' if self.db == 'SQLite' else '%s')
def _get_tag(self, tagcipher):
self._cur.execute(self.sql_search, ([tagcipher]))
rv = self._cur.fetchone()
return rv
I am also afraid this approach would be also harder to understand when first looking at it.
Leaving my personal example, I would like to know what is the most acceptable way here.
Should I keep maintaining both classes or write one more complicated class that does it all?
Is there a general rule of thumb?
It seems that inheritance is what you're looking for. It is a key feature of [OOP][1] (Another one in Java, Yes Java, but I like their docs).
As thefourtheye said in the comments, I believe you should move the identical methods into one class (in other words, delete one set of the identical methods).
Here is a very quick example:
class Connector(Database):
"""This is a super class, common elements go here"""
def __init__(self):
self.sql_search = "SELECT ID FROM TAG WHERE DATA = %s"
self.common_varialbe = None #placeholder
Database.__init__(self) #add necessary arguments
def _get_tag(self, tagcipher, wildcard):
#replace the sql search string with the wildcard.
self._cur.execute(self.sql_search % (wildcard) , ([tagcipher]))
rv = self._cur.fetchone()
return rv
def some_common_method(self, uncommon_value):
self.common_variable = uncommon_value
class Postgresql(Connector):
"""postgresql subclass using %s.
unique postgresql elements go here"""
def __init__(self):
#initialise the superclass
Connector.__init__(self)
self.wildcard = '%s'
self.uncommon_value = 'py hole'
#other unique values go here
class Sqlite(Connector):
"""etc"""
def __init__(self):
#initialise the superclass
Connector.__init__(self)
self.wildcard = '?'
#other unique values go here
#other methods
Even from this example you can see some redundancy, but was included to show how things could be split up if necessary. With this class, i can:
>>>import connector
>>>sqlite = connector.Sqlite()
>>>sqlite.wilcard
`?`
>>>sqlite.sql_search
`SELECT ID FROM TAG WHERE DATA = %s`
>>>sqlite.sql_search % sqlite.wildcard
`SELECT ID FROM TAG WHERE DATA = ?`
If they truly differ only by strings, only one subclass is needed. you can use dict()s to store the unique bits:
class Connector(Database):
def __init__(self,type):
#describe all types in this dict
types = {"sqlite":"?",
"postgre":"%s"}
#Database.__init__(self) as necessary
self.sql_search = "SELECT ID FROM TAG WHERE DATA = %s" % types[type]
def _get_tag(self, tagcipher):
#replace the sql search string with the wildcard.
self._cur.execute(self.sql_search, ([tagcipher]))
rv = self._cur.fetchone()
return rv
So with this class:
>>>c = connector.Connector('sqlite')
>>>c.sql_search
`SELECT ID FROM TAG WHERE DATA = ?`
As long as they are properly inheriting from the Database superclass, subclasses will share its cursor when Database.__init__(*args) is called

Using overloaded methods for hashing

I have a list of JSON objects (around 30,000) and would like to remove duplicates from them. I consider them a duplicate as long as ModuleCode is the same. Below is an example of one object.
[{"AveragePoints": "4207",
"ModuleTitle": "Tool Engineering",
"Semester": "2",
"ModuleCode": "ME4261",
"StudentAcctType": "P",
"AcadYear": "2013/2014"}]
Planning to do so by hashing, following the example given here. After some experimentation I'm still unsure of how to correctly use the overloaded methods __eq__ and __hash__. Do I create a new class and contain the two methods inside?
Below is my attempt at a solution. It returns NameError: name 'obj' is not defined which I suspect is my incorrect usage of class.
import json
json_data = open('small.json')
data = json.load(json_data)
class Module(obj):
def __eq__(self, other):
return self.ModuleCode == other.ModuleCode
def __hash__(self):
return hash(('ModuleCode', self.ModuleCode))
hashtable = {} #python's dict is implemented as a hashtable
for item in data:
cur = Module(item)
if hashtable[hash(cur)] == item.ModuleCode:
print "duplicate" + item.ModuleCode
else:
hashtable[hash(cur)] = item.ModuleCode
json_data.close()
The problem is that you are referring to obj, which doesn't exist, instead of object. Also, you don't actually define Module.__init__, so never initialise the ModuleCode attribute. Here is one way you could do it:
class Module(object):
def __init__(self, ModuleCode, **data):
self.ModuleCode = ModuleCode
self.data = data
def __eq__(self, other):
return self.ModuleCode == other.ModuleCode
def __hash__(self):
return hash(('ModuleCode', self.ModuleCode))
Then when you create the instance:
cur = Module(**item)
(If the syntax is unfamiliar, see e.g. What does ** (double star) and * (star) do for parameters?)
Also, note that you can use a set rather than a dict for removing duplicates; storing the ModuleCode as the value is duplicating information (as that's the whole point of implementing __hash__ and __eq__):
unique = set()
for item in data:
cur = Module(**item)
if cur in unique:
print "duplicate" + cur.ModuleCode
else:
unique.add(cur)

python generator

I have homework that I am stuck on. I have gone as far as I can but I am stuck, can someone point me in the right direction.... I am getting stick in making each data row a new object. Normally i would think I could just iterate over the rows, but that will only return last row
Question:
Modify the classFactory.py source code so that the DataRow class returned by the build_row function has another method:
retrieve(self, curs, condition=None)
self is (as usual) the instance whose method is being called, curs is a database cursor on an existing database connection, and condition (if present) is a string of condition(s) which must be true of all received rows.
The retrieve method should be a generator, yielding successive rows of the result set until it is completely exhausted. Each row should be a new object of type DataRow.
This is what I have------
the test:
import unittest
from classFactory import build_row
class DBTest(unittest.TestCase):
def setUp(self):
C = build_row("user", "id name email")
self.c = C([1, "Steve Holden", "steve#holdenweb.com"])
def test_attributes(self):
self.assertEqual(self.c.id, 1)
self.assertEqual(self.c.name, "Steve Holden")
self.assertEqual(self.c.email, "steve#holdenweb.com")
def test_repr(self):
self.assertEqual(repr(self.c),
"user_record(1, 'Steve Holden', 'steve#holdenweb.com')")
if __name__ == "__main__":
unittest.main()
the script I am testing
def build_row(table, cols):
"""Build a class that creates instances of specific rows"""
class DataRow:
"""Generic data row class, specialized by surrounding function"""
def __init__(self, data):
"""Uses data and column names to inject attributes"""
assert len(data)==len(self.cols)
for colname, dat in zip(self.cols, data):
setattr(self, colname, dat)
def __repr__(self):
return "{0}_record({1})".format(self.table, ", ".join([" {0!r}".format(getattr(self, c)) for c in self.cols]))
DataRow.table = table
DataRow.cols = cols.split()
return DataRow
It should roughly be something like the following:
def retrieve(self, curs, condition=None):
query_ = "SELECT * FROM rows"
if condition is not None:
query_ += " %s" %condition
curs.execute(query_)
for row in curs.fetchall(): # iterate over the retrieved results
yield row # and yield each row in turn
Iterate over the rows as normal, but use yield instead of return.

Implementing sub-table (view into a table): designing class relationship

I'm using Python 3, but the question isn't really tied to the specific language.
I have class Table that implements a table with a primary key. An instance of that class contains the actual data (which is very large).
I want to allow users to create a sub-table by providing a filter for the rows of the Table. I don't want to copy the table, so I was planning to keep in the sub-table just the subset of the primary keys from the parent table.
Obviously, the sub-table is just a view into the parent table; it will change if the parent table changes, will become invalid if the parent table is destroyed, and will lose some of its rows if they are deleted from the parent table. [EDIT: to clarify, if parent table is changed, I don't care what happens to the sub-table; any behavior is fine.]
How should I connect the two classes? I was thinking of:
class Subtable(Table):
def __init__(self, table, filter_function):
# ...
My assumption was that Subtable keeps the interface of Table, except slightly overrides the inherited methods just to check if the row is in. Is this a good implementation?
The problem is, I'm not sure how to initialize the Subtable instance given that I don't want to copy the table object passed to it. Is it even possible?
Also I was thinking to give class Table an instance method that returns Subtable instance; but that creates a dependency of Table on Subtable, and I guess it's better to avoid?
I'm going to use the following (I omitted many methods such as sort, which work quite well in this arrangement; also omitted error handling):
class Table:
def __init__(self, *columns, pkey = None):
self.pkey = pkey
self.__columns = columns
self.__data = {}
def __contains__(self, key):
return key in self.__data
def __iter__(self):
for key in self.__order:
yield key
def __len__(self):
return len(self.__data)
def items(self):
for key in self.__order:
yield key, self.__data[key]
def insert(self, *unnamed, **named):
if len(unnamed) > 0:
row_dict = {}
for column_id, column in enumerate(self.__columns):
row_dict[column] = unnamed[column_id]
else:
row_dict = named
key = row_dict[self.pkey]
self.__data[key] = row_dict
class Subtable(Table):
def __init__(self, table, row_filter):
self.__order = []
self.__data = {}
for key, row in table.items():
if row_filter(row):
self.__data[key] = row
Essentially, I'm copying the primary keys only, and create references to the data tied to them. If a row in the parent table is destroyed, it will still exist in the sub-table. If a row is modified in the parent table, it is also modified in the sub-table. This is fine, since my requirements was "anything goes when parent table is modified".
If you see any issues with this design, let me know please.

Categories

Resources