Assuming we have a table consisting of column_1, column_2, ... , column_n and all of them are string fields. The conditions we are going to do case-insensitive query are stored in a dictionary d like d[column_1] = "Hello", which may or may not contains all columns. How can we do the query?
I checked the question Case Insensitive Flask-SQLAlchemy Query. It contains a lot of awesome answers, but none of them works if we do not know what conditions we have until runtime.
You would need to build the query looping through each key of the dictionary.
As you didn't give any code sample, I'm going to call the table model class TableModel and each column will be column_1, column_2, etc.
Something like this should work:
d = {'column_1': 'some_string', 'column_3': 'another_string'}
# skipping 'column_2' just to exemplify how every column is optional in the dictionary
my_query = TableModel.query
for k in d:
my_query = my_query.filter(getattr(TableModel, k).ilike(d[k]))
And that's about it. Afterwards you can use my_query as any other query, e.g., my_query.count() or my_query.all()
Related
SUMMARY:
How to query against values from different data frame columns with table.column_name combinations in SQL Alchemy using the OR_ statement.
I'm working on a SQL Alchemy project where I pull down valid columns of a dataframe and enter them all into SQL Alchemy's filter. I've successfully got it running where it would enter all entries of a column using the head of the column like this:
qry = qry.filter(or_(*[getattr(Query_Tbl,column_head).like(x) \
for x in (df[column_head].dropna().values)]))
This produced the pattern I was looking for of (tbl.column1 like a OR tbl.column1 like b...) AND- etc.
However, there are groups of the dataframe that need to be placed together where the columns are different but still need to be placed within the OR_ category,
i.e. (The desired result)
(tbl1.col1 like a OR tbl.col1 like b OR tbl.col2 like c OR tbl.col2 like d OR tbl.col3 like e...) etc.
My latest attempt was to sub-group the columns I needed grouped together, then repeat the previous style inside those groups like:
qry = qry.filter(or_((*[getattr(Query_Tbl, set_id[0]).like(x) \
for x in (df[set_id[0]].dropna().values)]),
(*[getattr(Query_Tbl, set_id[1]).like(y) \
for y in (df[set_id[1]].dropna().values)]),
(*[getattr(Query_Tbl, set_id[2]).like(z) \
for z in (df[set_id[2]].dropna().values)])
))
Where set_id is a list of 3 strings corresponding to column1, column2, and column 3 so I get the designated results, however, this produces simply:
(What I'm actually getting)
(tbl.col1 like a OR tbl.col1 like b..) AND (tbl.col2 like c OR tbl.col2 like d...) AND (tbl.col3 like e OR...)
Is there a better way to go about this in SQL Alchemy to get the result I want, or would it better to find a way of implementing column values with Pandas directly into getattr() to work it into my existing code?
Thank you for reading and in advance for your help!
It appears I was having issues with the way the data-frame was formatted, and I was reading column names into groups differently. This pattern works for anyone who want to process multiple df columns into the same OR statements.
I apologize for the issue, if anyone has any comments or questions on the subject I will help others with this type of issue.
Alternatively, I found a much cleaner answer. Since SQL Alchemy's OR_ function can be used with a variable column if you use Python's built in getattr() function, you only need to create (column,value) pairs where by you can unpack both in a loop.
for group in [group_2, group_3]:
set_id = list(set(df.columns.values) & set(group))
if len(set_id) > 1:
set_tuple = list()
for column in set_id:
for value in df[column].dropna().values:
set_tuple.append((column, value))
print(set_tuple)
qry = qry.filter(or_(*[getattr(Query_Tbl,id).like(x) for id, x in set_tuple]))
df = df.drop(group, axis=1)
If you know what column need to be grouped in the Or_ statement, you can put them into lists and iterate through them. Inside those, you create a list of tuples where you create the (column, value) pairs you need. Then within the Or_ function you upact the column and values in a loop, and assign them accordingly. The code is must easier to read and much for compack. I found this to be a more robust solution than explicitly writing out cases for the group sizes.
I'm trying to set the order of columns when building a table with SQLAlchemy, as of right now the columns are appearing in alphabetical order, I currently have:
def data_frame(query, columns):
def make_row(x):
return dict([(c, getattr(x, c)) for c in columns])
return pd.DataFrame([make_row(x) for x in query])
PackL = create_engine('mssql+pyodbc://u:pass#Server/db1?driver=SQL Server', echo=False)
Fabr = create_engine('mssql+pyodbc://u:pass#Server/db2?driver=SQL Server', echo=False)
Session = sessionmaker(bind=PackL)
session = Session()
Base = declarative_base()
metadata = MetaData()
class Tranv(Base):
__tablename__= "Transactions"
__table_args__= {'autoload': True, 'autoload_with':PackL}
newvarv = session.query(Tranv).filter_by(status='SCRAP').filter(Tranv.time_stamp.
between('2015-10-01', '2015-10-09'))
session.close()
dfx = data_frame(newvarv, ['action', 'employee_number', 'time_stamp', 'qty',
'part_number', 'card_number'])
Current dfx has the columns in alphabetical order, but I want it to order them by the order in which I define the columns when I create the data frame dfx. Therefore the order would be action, employee_number, time_stamp, qty, part_number, card_number. I can easily do this with Pandas, but that seems like extra (and unnecessary) steps.
I've searched the documentation, google, & stackoverflow but nothing really seems to fit my needs. As I'm still new with SQLAlchemy I appreciate any help. Am I right in thinking that because I'm autoloading the table, I can not easily define the order of my columns (I'm sure there is a workaround, but don't have a clue where in the documentation that might be found)?
The reason your columns are not in the order you specify, has nothing to do with the sql query or sqlalchemy. This is caused by the fact that you convert the query output to a dictionary, which you then feed to DataFrame.
As a dictionary has no order in python, pandas will sort it alphabetically to have a predictable output.
Using the current approach of the dict, you can always change the order of the columns afterwards by doing dfx.reindex(columns=['action', ..., 'card_number'])
Apart from the explanation why it is not ordered in your case, there are maybe better approaches to tackle this:
Use the builtin pd.read_sql_query. When working with sessions and Query objects, you can pass the selectable attribute to read_sql_query to convert it to a DataFrame:
query = session.query(Table)...
df = pd.read_sql_query(query.selectable, engine)
Do not convert to a dictionary, but keep the output as tuples which you feed to DataFrame: this will keep the order of the query output.
I have a database table with multiple fields which I am querying and pulling out all data which meets certain parameters. I am using psycopg2 for python with the following syntax:
cur.execute("SELECT * FROM failed_inserts where insertid='%s' AND site_failure=True"%import_id)
failed_sites= cur.fetchall()
This returns the correct values as a list with the data's integrity and order maintained. However I want to query the list returned somewhere else in my application and I only have this list of values, i.e. it is not a dictionary with the fields as the keys for these values. Rather than having to do
desiredValue = failed_sites[13] //where 13 is an arbitrary number of the index for desiredValue
I want to be able to query by the field name like:
desiredValue = failed_sites[fieldName] //where fieldName is the name of the field I am looking for
Is there a simple way and efficient way to do this?
Thank you!
cursor.description will give your the column information (http://www.python.org/dev/peps/pep-0249/#cursor-objects). You can get the column names from it and use them to create a dictionary.
cursor.execute('SELECT ...')
columns = []
for column in cursor.description:
columns.append(column[0].lower())
failed_sites = {}
for row in cursor:
for i in range(len(row)):
failed_sites[columns[i]] = row[i]
if isinstance(row[i], basestring):
failed_sites[columns[i]] = row[i].strip()
The "Dictionary-like cursor", part of psycopg2.extras, seems what you're looking for.
All I want is the count from TableA grouped by a column from TableB, but of course I need the item from TableB each count is associated with. Better explained with code:
TableA and B are Model objects.
I'm trying to follow this syntax as best I can.
Trying to run this query:
sq = session.query(TableA).join(TableB).\
group_by(TableB.attrB).subquery()
countA = func.count(sq.c.attrA)
groupB = func.first(sq.c.attrB)
print session.query(countA, groupB).all()
But it gives me an AttributeError (sq does not have attrB)
I'm new to SA and I find it difficult to learn. (links to recommended educational resources welcome!)
When you make a subquery out of a select statement, the columns that can be accessed from it must be in the columns clause. Take for example a statement like:
select x, y from mytable where z=5
If we wanted to make a subquery, then GROUP BY 'z', this would not be legal SQL:
select * from (select x, y from mytable where z=5) as mysubquery group by mysubquery.z
Because 'z' is not in the columns clause of "mysubquery" (it's also illegal since 'x' and 'y' should be in the GROUP BY as well, but that's a different issue).
SQLAlchemy works the same exact way. When you say query(..).subquery(), or use the alias() function on a core selectable construct, it means you're wrapping your SELECT statement in parenthesis, giving it a (usually generated) name, and giving it a new .c. collection that has only those columns that are in the "columns" clause, just like real SQL.
So here you'd need to ensure that TableB, at least the column you're dealing with externally, is available. You can also limit the columns clause to just those columns you need:
sq = session.query(TableA.attrA, TableB.attrB).join(TableB).\
group_by(TableB.attrB).subquery()
countA = func.count(sq.c.attrA)
groupB = func.first(sq.c.attrB)
print session.query(countA, groupB).all()
Note that the above query probably only works on MySQL, as in general SQL it's illegal to reference any columns that aren't part of an aggregate function, or part of the GROUP BY, when grouping is used. MySQL has a more relaxed (and sloppy) system in this regard.
edit: if you want the results without the zeros:
import collections
letter_count = collections.defaultdict(int)
for count, letter in session.query(func.count(MyClass.id), MyClass.attr).group_by(MyClass.attr):
letter_count[letter] = count
for letter in ["A", "B", "C", "D", "E", ...]:
print "Letter %s has %d elements" % letter_count[letter]
note letter_count[someletter] defaults to zero if otherwise not populated.
Is it possible to do SELECT * in SQLAlchemy?
Specifically, SELECT * WHERE foo=1?
Is no one feeling the ORM love of SQLAlchemy today? The presented answers correctly describe the lower-level interface that SQLAlchemy provides. Just for completeness, this is the more-likely (for me) real-world situation where you have a session instance and a User class that is ORM mapped to the users table.
for user in session.query(User).filter_by(name='jack'):
print(user)
# ...
And this does an explicit select on all columns.
The following selection works for me in the core expression language (returning a RowProxy object):
foo_col = sqlalchemy.sql.column('foo')
s = sqlalchemy.sql.select(['*']).where(foo_col == 1)
If you don't list any columns, you get all of them.
query = users.select()
query = query.where(users.c.name=='jack')
result = conn.execute(query)
for row in result:
print row
Should work.
You can always use a raw SQL too:
str_sql = sql.text("YOUR STRING SQL")
#if you have some args:
args = {
'myarg1': yourarg1
'myarg2': yourarg2}
#then call the execute method from your connection
results = conn.execute(str_sql,args).fetchall()
Where Bar is the class mapped to your table and session is your sa session:
bars = session.query(Bar).filter(Bar.foo == 1)
Turns out you can do:
sa.select('*', ...)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(rows, columns=rows.keys())
For joins if columns are not defined manually, only columns of target table are returned. To get all columns for joins(User table joined with Group Table:
sql = User.select(from_obj(Group, User.c.group_id == Group.c.id))
# Add all coumns of Group table to select
sql = sql.column(Group)
session.connection().execute(sql)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(dict(zip(r.keys(), r)) for r in rows)
If you're using the ORM, you can build a query using the normal ORM constructs and then execute it directly to get raw column values:
query = session.query(User).filter_by(name='jack')
for cols in session.connection().execute(query):
print cols
every_column = User.__table__.columns
records = session.query(*every_column).filter(User.foo==1).all()
When a ORM class is passed to the query function, e.g. query(User), the result will be composed of ORM instances. In the majority of cases, this is what the dev wants and will be easiest to deal with--demonstrated by the popularity of the answer above that corresponds to this approach.
In some cases, devs may instead want an iterable sequence of values. In these cases, one can pass the list of desired column objects to query(). This answer shows how to pass the entire list of columns without hardcoding them, while still working with SQLAlchemy at the ORM layer.