Is it possible to do SELECT * in SQLAlchemy?
Specifically, SELECT * WHERE foo=1?
Is no one feeling the ORM love of SQLAlchemy today? The presented answers correctly describe the lower-level interface that SQLAlchemy provides. Just for completeness, this is the more-likely (for me) real-world situation where you have a session instance and a User class that is ORM mapped to the users table.
for user in session.query(User).filter_by(name='jack'):
print(user)
# ...
And this does an explicit select on all columns.
The following selection works for me in the core expression language (returning a RowProxy object):
foo_col = sqlalchemy.sql.column('foo')
s = sqlalchemy.sql.select(['*']).where(foo_col == 1)
If you don't list any columns, you get all of them.
query = users.select()
query = query.where(users.c.name=='jack')
result = conn.execute(query)
for row in result:
print row
Should work.
You can always use a raw SQL too:
str_sql = sql.text("YOUR STRING SQL")
#if you have some args:
args = {
'myarg1': yourarg1
'myarg2': yourarg2}
#then call the execute method from your connection
results = conn.execute(str_sql,args).fetchall()
Where Bar is the class mapped to your table and session is your sa session:
bars = session.query(Bar).filter(Bar.foo == 1)
Turns out you can do:
sa.select('*', ...)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(rows, columns=rows.keys())
For joins if columns are not defined manually, only columns of target table are returned. To get all columns for joins(User table joined with Group Table:
sql = User.select(from_obj(Group, User.c.group_id == Group.c.id))
# Add all coumns of Group table to select
sql = sql.column(Group)
session.connection().execute(sql)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(dict(zip(r.keys(), r)) for r in rows)
If you're using the ORM, you can build a query using the normal ORM constructs and then execute it directly to get raw column values:
query = session.query(User).filter_by(name='jack')
for cols in session.connection().execute(query):
print cols
every_column = User.__table__.columns
records = session.query(*every_column).filter(User.foo==1).all()
When a ORM class is passed to the query function, e.g. query(User), the result will be composed of ORM instances. In the majority of cases, this is what the dev wants and will be easiest to deal with--demonstrated by the popularity of the answer above that corresponds to this approach.
In some cases, devs may instead want an iterable sequence of values. In these cases, one can pass the list of desired column objects to query(). This answer shows how to pass the entire list of columns without hardcoding them, while still working with SQLAlchemy at the ORM layer.
Related
I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:
rows = session.query(MyModel) \
.order_by(MyModel.published_date.desc()) \
.group_by(MyModel.category_id) \
.all()
It works fine with SQLite, but for MySQL I get this error:
[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.
What's the proper solution with SQLAlchemy?
Thanks in advance
One way to form the greatest-n-per-group query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:
my_model_alias = aliased(MyModel)
rows = session.query(MyModel).\
outerjoin(my_model_alias,
and_(my_model_alias.category_id == MyModel.category_id,
my_model_alias.published_date > MyModel.published_date)).\
filter(my_model_alias.id == None).\
all()
This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:
sq = session.query(
MyModel,
func.row_number().
over(partition_by=MyModel.category_id,
order_by=MyModel.published_date.desc()).label('rn')).\
subquery()
my_model_alias = aliased(MyModel, sq)
rows = session.query(my_model_alias).\
filter(sq.c.rn == 1).\
all()
Of course you could use GROUP BY as well, if you then use the results in a join:
max_pub_dates = session.query(
MyModel.category_id,
func.max(MyModel.published_date).label('published_date')).\
group_by(MyModel.category_id).\
subquery()
rows = session.query(MyModel).\
join(max_pub_dates,
and_(max_pub_dates.category_id == MyModel.category_id,
max_pub_dates.published_date == MyModel.published_date)).\
all()
I have the following SQLAlchemy db query
test_query = Unit.query.filter(Unit.id_1.in_(('3D0U|1|A|G|1', '3D0U|1|A|C|160')))
I would like the result of this query to be a list of lists with rows corresponding to each element in the in clause as a separate list. Currently I'm getting all the rows in a single list
This is what I've tried
result = []
for row in test_query:
result.append[(row.id_2)]
When I print out the results, this is what I get
["3D0X|1|A|C|160", "4ERJ|1|A|C|160", "4ERL|1|A|C|160", "3D0X|1|A|G|1", "4ERJ|1|A|G|1", "4ERL|1|A|G|1"]
The desired output is:
[["3D0X|1|A|C|160", "4ERJ|1|A|C|160", "4ERL|1|A|C|160"], ["3D0X|1|A|G|1", "4ERJ|1|A|G|1", "4ERL|1|A|G|1"]]
Sample data from the Unit table is shown below
"id_1","chain_1","pdb_id_1","id_2","chain_2","pdb_id_2"
"3D0U|1|A|G|1","A","3D0U","3D0X|1|A|G|1","A","3D0X"
"3D0U|1|A|G|1","A","3D0U","4ERJ|1|A|G|1","A","4ERJ"
"3D0U|1|A|G|1","A","3D0U","4ERL|1|A|G|1","A","4ERL"
"3D0U|1|A|C|160","A","3D0U","3D0X|1|A|C|160","A","3D0X"
"3D0U|1|A|C|160","A","3D0U","4ERJ|1|A|C|160","A","4ERJ"
"3D0U|1|A|C|160","A","3D0U","4ERL|1|A|C|160","A","4ERL"
Any help is appreciated. Thanks
This could be solved in the following way: first order by Unit.id_1 in the query, and then
result = [[unit.id_2 for unit in units] for id_1, units in itertools.groupby(test_query, lambda x: x.id_1)]
In case of postgresql there is also sqlalchemy.func.array_agg that could be used to construct an array of id_2 grouped by id_1.
For reference, python documentation for itertools describes what itertools.groupby does.
I'm loading some data from SQL database to Python, but I need to apply some criteria from Python Dataframe, to be simplified, see example below:
some_sql = """
select column1,columns2
from table
where a between '{}' and '{}'
or a between '{}' and '{}'
or a between '{}' and '{}'
""".format(date1,date2,date3,date4,date5,date6)
date1,date2,date3,date4,date5,date6 are sourced from Python Dataframe. I can manually specify all 6 parameters, but I do have over 20 in fact...
df = DataFrame({'col1':['date1','date3','date5'],
'col2':['date2','date4','date6']})
is there a way I am able to do a loop here to be more efficient
Setup
# Create a dummy dataframe
df = pd.DataFrame({'col1':['date1','date3','date5'],
'col2':['date2','date4','date6']})
# Prepare the SQL (conditions will be added later)
some_sql = """
select column1,columns2
from table
where """
First approach
conditions = []
for row in df.iterrows():
# Ignore the index
data = row[1]
conditions.append(f"or a between '{data['col1']}' and '{data['col2']}'")
some_sql += '\n'.join(conditions)
By using iterrows() we can iterate through the dataframe, rows by row.
Alternative
some_sql += '\nor '.join(df.apply(lambda x: f"a between '{x['col1']}' and '{x['col2']}'", axis=1).tolist())
Using apply() should be faster that iterrows():
Although apply() also inherently loops through rows, it does so much
more efficiently than iterrows() by taking advantage of a number of
internal optimizations, such as using iterators in Cython.
source
Another alternative
some_sql += '\nor '.join([f"a between '{row['col1']}' and '{row['col2']}'" for row in df.to_dict('records')])
This converts the dataframe to a list of dicts, and then applies a list comprehension to create the conditions.
Result
select column1,columns2
from table
where a between 'date1' and 'date2'
or a between 'date3' and 'date4'
or a between 'date5' and 'date6'
As a secondary note to Kristof's answer above, I would note that even as an analyst one should probably be careful about things like SQL injection, so inlining data is something to be avoided.
If possible you should define your query once with placeholders and then create a param list to go with the placeholders. This also saves on the formatting too.
So in your case your query looks like:
some_sql = """
select column1,columns2
from table
where a between ? and ?
or a between ? and ?
or a between ? and ?
And our param list generation is going to look like:
conditions = []
for row in df.iterrows():
# Ignore the index
data = row[1]
conditions.append(data['col1'])
conditions.append(data['col2'])
Then execute your SQL with placeholder syntax and params list as placeholders.
I have:
res = db.engine.execute('select count(id) from sometable')
The returned object is sqlalchemy.engine.result.ResultProxy.
How do I get count value from res?
Res is not accessed by index but I have figured this out as:
count=None
for i in res:
count = res[0]
break
There must be an easier way right? What is it? I didn't discover it yet.
Note: The db is a postgres db.
While the other answers work, SQLAlchemy provides a shortcut for scalar queries as ResultProxy.scalar():
count = db.engine.execute('select count(id) from sometable').scalar()
scalar() fetches the first column of the first row and closes the result set, or returns None if no row is present. There's also Query.scalar(), if using the Query API.
what you are asking for called unpacking, ResultProxy is an iterable, so we can do
# there will be single record
record, = db.engine.execute('select count(id) from sometable')
# this record consist of single value
count, = record
The ResultProxy in SQLAlchemy (as documented here http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=execute#sqlalchemy.engine.ResultProxy) is an iterable of the columns returned from the database. For a count() query, simply access the first element to get the column, and then another index to get the first element (and only) element of that column.
result = db.engine.execute('select count(id) from sometable')
count = result[0][0]
If you happened to be using the ORM of SQLAlchemy, I would suggest using the Query.count() method on the appropriate model as shown here: http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=count#sqlalchemy.orm.query.Query.count
I'm trying to set the order of columns when building a table with SQLAlchemy, as of right now the columns are appearing in alphabetical order, I currently have:
def data_frame(query, columns):
def make_row(x):
return dict([(c, getattr(x, c)) for c in columns])
return pd.DataFrame([make_row(x) for x in query])
PackL = create_engine('mssql+pyodbc://u:pass#Server/db1?driver=SQL Server', echo=False)
Fabr = create_engine('mssql+pyodbc://u:pass#Server/db2?driver=SQL Server', echo=False)
Session = sessionmaker(bind=PackL)
session = Session()
Base = declarative_base()
metadata = MetaData()
class Tranv(Base):
__tablename__= "Transactions"
__table_args__= {'autoload': True, 'autoload_with':PackL}
newvarv = session.query(Tranv).filter_by(status='SCRAP').filter(Tranv.time_stamp.
between('2015-10-01', '2015-10-09'))
session.close()
dfx = data_frame(newvarv, ['action', 'employee_number', 'time_stamp', 'qty',
'part_number', 'card_number'])
Current dfx has the columns in alphabetical order, but I want it to order them by the order in which I define the columns when I create the data frame dfx. Therefore the order would be action, employee_number, time_stamp, qty, part_number, card_number. I can easily do this with Pandas, but that seems like extra (and unnecessary) steps.
I've searched the documentation, google, & stackoverflow but nothing really seems to fit my needs. As I'm still new with SQLAlchemy I appreciate any help. Am I right in thinking that because I'm autoloading the table, I can not easily define the order of my columns (I'm sure there is a workaround, but don't have a clue where in the documentation that might be found)?
The reason your columns are not in the order you specify, has nothing to do with the sql query or sqlalchemy. This is caused by the fact that you convert the query output to a dictionary, which you then feed to DataFrame.
As a dictionary has no order in python, pandas will sort it alphabetically to have a predictable output.
Using the current approach of the dict, you can always change the order of the columns afterwards by doing dfx.reindex(columns=['action', ..., 'card_number'])
Apart from the explanation why it is not ordered in your case, there are maybe better approaches to tackle this:
Use the builtin pd.read_sql_query. When working with sessions and Query objects, you can pass the selectable attribute to read_sql_query to convert it to a DataFrame:
query = session.query(Table)...
df = pd.read_sql_query(query.selectable, engine)
Do not convert to a dictionary, but keep the output as tuples which you feed to DataFrame: this will keep the order of the query output.