SQLAlchemy: Suffix table name to output columns - python

I have a query where I join multiple tables with similar column names. To disambiguate them, I want to suffix the table name to the column name like: <column_name>_<table_name>. There are hundreds of columns in each table, so I would like to do it programmatically.
Is there a way to do something like?
sa.select([
table1.c.suffix('_1'),
table2.c.suffix('_2')]).
select_from(table1.join(table2, table1.c.id == table2.c.id))

You want to use the label keyword:
sa.select([
table1.c.column_name.label('_1'),
table2.c.column_name.label('_2')]).
select_from(table1.join(table2, table1.c.id == table2.c.id))
This will allow you to have the same column name from different tables.
If you have a table that is dynamic, or tons of columns, your best bet will be to do something like this:
pseudo code:
select * from information_schema.columns where table_name = 'my_table"
get the results from a query
return_columns = []
counter = 0
for r in results:
counter += 1
return_columns.append("`table_name`.`" + r.column_name + "` as col_{}".format(counter))
Creating dynamic sql will require you to do a bit of building out. I do this in my application all the time. Except I don't use information schema. I have a table which has my column names in it.
This should lead you in the right direction.

Related

python/mysql: SELECTing from multiple tables overwriting duplicate column in resulting dictionary

I am coding with Python 3.8.5, and mysql Ver 15.1 Distrib 10.4.11-MariaDB.
I have three tables, customer, partner and customer_partner,
customer has columns customer_id, customer_name, address;
partner has columns partner_id, partner_name, address; (note the address column appears in both tables, but obviously different content)
customer_partner has columns customer_id, partner_id, describing the partnership between one customer and one partner;
I am trying to fetch joined columns of customer and partner for customers whose customer_id is in a list with following python code and SQL statement:
db = connect(...)
cur = db.cursor(dictionary=True)
customer_id_tuple = (1, 2, 3)
sql = f"""SELECT *
FROM customer, partner, customer_partner
WHERE
customer.customer_id in ({','.join(['%s' for _ in range(len(customer_id_list))])})
AND customer.customer_id=customer_partner.customer_id
AND customer_partner.partner_id=partner.partner_id
"""
cur.execute(sql, customer_id_tuple))
data = cur.fetchall()
In the result dictionary data, I only see one address column. Obviously, address from partner table overwrites the one from customer table.
Besides modifying the column names, do I have a more decent way to avoid such overwriting behavior? Like automatically inserting the table name in front of the column name, like customer.address and partner.address?
SELECT * ... may lead to ambiguities when there are conflicting column names.
You should set aliases for conflicting column names.
Also set short aliases for the table names that can shorten the code and make it more readable and use them to qualify all the column names.
The implicit join syntax that you use has been replaced, since many years, by explicit join syntax.
Your code should be written like this:
sql = f"""
SELECT c.customer_id, c.customer_name, c.address customer_address,
p.partner_id, p.partner_name, p.address partner_address
FROM customer c
INNER JOIN customer_partner cp ON c.customer_id = cp.customer_id
INNER JOIN partner p ON cp.partner_id = p.partner_id
WHERE c.customer_id IN ({','.join(['%s' for _ in range(len(customer_id_list))])})
"""
I left out all the columns of customer_partner from the SELECT list because they are not needed.

How to concatenate ORDER BY in sql

I am working on a GUI gird and sql. I have two GUI buttons that can be clicked depending on what order the user wants the information. The order can be by Employee Last_name or First_name, but not both. I am not sure how to use that. I am suppose to use concatenation, but am not sure how to.
Below is what I tried to do:
def sort_employees(self, column):
try:
cursor = db.cursor()
display="SELECT * FROM company ORDER BY '%" + column + "%' "
cursor.execute(display)
entry = cursor.fetchall()
self.display_rows(entry)
Also, the code works fine if I only have on entry:
display="SELECT * FROM company ORDER BY Last_name"
Not sure why you have % in your query string, it's possible you're confusing it with the %s syntax for string formatting.
display = "SELECT * FROM company ORDER BY '%" + column + "%' "
It seems what you want is more like this:
display = "SELECT * FROM company ORDER BY " + column
Or, as I prefer:
display = 'SELECT * FROM company ORDER BY {column}'.format(column=column)
Of course be careful creating queries like this, you're exposed to SQL security vulnerabilities.
It's better to use a parametrised query instead of string interpolation/concatenation, but I don't know which database interface you're using, but it's easy to find that by searching the docs.
In SQL, the ORDER BY clause takes, as arguments, a list of column names:
--correct
ORDER BY firstname, lastname, age
It can also take function outputs:
--correct, sort names beginning with a Z first
ORDER BY CASE WHEN first name LIKE 'Z%' THEN 1 ELSE 2 END, firstname
In some db, putting an integer ordinal on will sort by that column, numbered from the left, starting with 1:
--correct, sort by 3rd column then first
ORDER BY 3,1
It does not take a list of strings that happen to contain column names:
--incorrect - not necessarily a syntax error but will not sort by any named column
ORDER BY 'firstname', 'lastname', 'age'
Nor does it take a string of csv column names:
--incorrect - again not necessarily a syntax error but won't sort on any of the named columns
ORDER BY 'firstname, lastname, age'
Your code falls into the latter categories: you're turning the column name into a string. This is wrong. The "not working sql" and the "working sql" are very different. Print the result of he concatenation to screen and look at them if you're having a hard time seeing it from the code

Running SQL in Python and apply parameters from Python Dataframe

I'm loading some data from SQL database to Python, but I need to apply some criteria from Python Dataframe, to be simplified, see example below:
some_sql = """
select column1,columns2
from table
where a between '{}' and '{}'
or a between '{}' and '{}'
or a between '{}' and '{}'
""".format(date1,date2,date3,date4,date5,date6)
date1,date2,date3,date4,date5,date6 are sourced from Python Dataframe. I can manually specify all 6 parameters, but I do have over 20 in fact...
df = DataFrame({'col1':['date1','date3','date5'],
'col2':['date2','date4','date6']})
is there a way I am able to do a loop here to be more efficient
Setup
# Create a dummy dataframe
df = pd.DataFrame({'col1':['date1','date3','date5'],
'col2':['date2','date4','date6']})
# Prepare the SQL (conditions will be added later)
some_sql = """
select column1,columns2
from table
where """
First approach
conditions = []
for row in df.iterrows():
# Ignore the index
data = row[1]
conditions.append(f"or a between '{data['col1']}' and '{data['col2']}'")
some_sql += '\n'.join(conditions)
By using iterrows() we can iterate through the dataframe, rows by row.
Alternative
some_sql += '\nor '.join(df.apply(lambda x: f"a between '{x['col1']}' and '{x['col2']}'", axis=1).tolist())
Using apply() should be faster that iterrows():
Although apply() also inherently loops through rows, it does so much
more efficiently than iterrows() by taking advantage of a number of
internal optimizations, such as using iterators in Cython.
source
Another alternative
some_sql += '\nor '.join([f"a between '{row['col1']}' and '{row['col2']}'" for row in df.to_dict('records')])
This converts the dataframe to a list of dicts, and then applies a list comprehension to create the conditions.
Result
select column1,columns2
from table
where a between 'date1' and 'date2'
or a between 'date3' and 'date4'
or a between 'date5' and 'date6'
As a secondary note to Kristof's answer above, I would note that even as an analyst one should probably be careful about things like SQL injection, so inlining data is something to be avoided.
If possible you should define your query once with placeholders and then create a param list to go with the placeholders. This also saves on the formatting too.
So in your case your query looks like:
some_sql = """
select column1,columns2
from table
where a between ? and ?
or a between ? and ?
or a between ? and ?
And our param list generation is going to look like:
conditions = []
for row in df.iterrows():
# Ignore the index
data = row[1]
conditions.append(data['col1'])
conditions.append(data['col2'])
Then execute your SQL with placeholder syntax and params list as placeholders.

Obtaining data from PostgreSQL as Dictionary

I have a database table with multiple fields which I am querying and pulling out all data which meets certain parameters. I am using psycopg2 for python with the following syntax:
cur.execute("SELECT * FROM failed_inserts where insertid='%s' AND site_failure=True"%import_id)
failed_sites= cur.fetchall()
This returns the correct values as a list with the data's integrity and order maintained. However I want to query the list returned somewhere else in my application and I only have this list of values, i.e. it is not a dictionary with the fields as the keys for these values. Rather than having to do
desiredValue = failed_sites[13] //where 13 is an arbitrary number of the index for desiredValue
I want to be able to query by the field name like:
desiredValue = failed_sites[fieldName] //where fieldName is the name of the field I am looking for
Is there a simple way and efficient way to do this?
Thank you!
cursor.description will give your the column information (http://www.python.org/dev/peps/pep-0249/#cursor-objects). You can get the column names from it and use them to create a dictionary.
cursor.execute('SELECT ...')
columns = []
for column in cursor.description:
columns.append(column[0].lower())
failed_sites = {}
for row in cursor:
for i in range(len(row)):
failed_sites[columns[i]] = row[i]
if isinstance(row[i], basestring):
failed_sites[columns[i]] = row[i].strip()
The "Dictionary-like cursor", part of psycopg2.extras, seems what you're looking for.

SELECT * in SQLAlchemy?

Is it possible to do SELECT * in SQLAlchemy?
Specifically, SELECT * WHERE foo=1?
Is no one feeling the ORM love of SQLAlchemy today? The presented answers correctly describe the lower-level interface that SQLAlchemy provides. Just for completeness, this is the more-likely (for me) real-world situation where you have a session instance and a User class that is ORM mapped to the users table.
for user in session.query(User).filter_by(name='jack'):
print(user)
# ...
And this does an explicit select on all columns.
The following selection works for me in the core expression language (returning a RowProxy object):
foo_col = sqlalchemy.sql.column('foo')
s = sqlalchemy.sql.select(['*']).where(foo_col == 1)
If you don't list any columns, you get all of them.
query = users.select()
query = query.where(users.c.name=='jack')
result = conn.execute(query)
for row in result:
print row
Should work.
You can always use a raw SQL too:
str_sql = sql.text("YOUR STRING SQL")
#if you have some args:
args = {
'myarg1': yourarg1
'myarg2': yourarg2}
#then call the execute method from your connection
results = conn.execute(str_sql,args).fetchall()
Where Bar is the class mapped to your table and session is your sa session:
bars = session.query(Bar).filter(Bar.foo == 1)
Turns out you can do:
sa.select('*', ...)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(rows, columns=rows.keys())
For joins if columns are not defined manually, only columns of target table are returned. To get all columns for joins(User table joined with Group Table:
sql = User.select(from_obj(Group, User.c.group_id == Group.c.id))
# Add all coumns of Group table to select
sql = sql.column(Group)
session.connection().execute(sql)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(dict(zip(r.keys(), r)) for r in rows)
If you're using the ORM, you can build a query using the normal ORM constructs and then execute it directly to get raw column values:
query = session.query(User).filter_by(name='jack')
for cols in session.connection().execute(query):
print cols
every_column = User.__table__.columns
records = session.query(*every_column).filter(User.foo==1).all()
When a ORM class is passed to the query function, e.g. query(User), the result will be composed of ORM instances. In the majority of cases, this is what the dev wants and will be easiest to deal with--demonstrated by the popularity of the answer above that corresponds to this approach.
In some cases, devs may instead want an iterable sequence of values. In these cases, one can pass the list of desired column objects to query(). This answer shows how to pass the entire list of columns without hardcoding them, while still working with SQLAlchemy at the ORM layer.

Categories

Resources