python/mysql: SELECTing from multiple tables overwriting duplicate column in resulting dictionary - python

I am coding with Python 3.8.5, and mysql Ver 15.1 Distrib 10.4.11-MariaDB.
I have three tables, customer, partner and customer_partner,
customer has columns customer_id, customer_name, address;
partner has columns partner_id, partner_name, address; (note the address column appears in both tables, but obviously different content)
customer_partner has columns customer_id, partner_id, describing the partnership between one customer and one partner;
I am trying to fetch joined columns of customer and partner for customers whose customer_id is in a list with following python code and SQL statement:
db = connect(...)
cur = db.cursor(dictionary=True)
customer_id_tuple = (1, 2, 3)
sql = f"""SELECT *
FROM customer, partner, customer_partner
WHERE
customer.customer_id in ({','.join(['%s' for _ in range(len(customer_id_list))])})
AND customer.customer_id=customer_partner.customer_id
AND customer_partner.partner_id=partner.partner_id
"""
cur.execute(sql, customer_id_tuple))
data = cur.fetchall()
In the result dictionary data, I only see one address column. Obviously, address from partner table overwrites the one from customer table.
Besides modifying the column names, do I have a more decent way to avoid such overwriting behavior? Like automatically inserting the table name in front of the column name, like customer.address and partner.address?

SELECT * ... may lead to ambiguities when there are conflicting column names.
You should set aliases for conflicting column names.
Also set short aliases for the table names that can shorten the code and make it more readable and use them to qualify all the column names.
The implicit join syntax that you use has been replaced, since many years, by explicit join syntax.
Your code should be written like this:
sql = f"""
SELECT c.customer_id, c.customer_name, c.address customer_address,
p.partner_id, p.partner_name, p.address partner_address
FROM customer c
INNER JOIN customer_partner cp ON c.customer_id = cp.customer_id
INNER JOIN partner p ON cp.partner_id = p.partner_id
WHERE c.customer_id IN ({','.join(['%s' for _ in range(len(customer_id_list))])})
"""
I left out all the columns of customer_partner from the SELECT list because they are not needed.

Related

Creating a table in MariaDB using a list of column names in Python

I am trying to create a table in mariadb using python. I have all the column names stored in a list as shown below.
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
This is just the sample list. Actual list has 200 items in the list. I am trying to create a table using the above collist elements as columns and the datatype for the columns is VARCHAR.
This is the code I am using to create a table
for p in collist:
cur.execute('CREATE TABLE IF NOT EXISTS table1 ({} VARCHAR(45))'.format(p)
The above code is executing but only the first element of the list is being added as a column in the table and I cannot see the remaining elements. I'd really appreciate if I can get a help with this.
You can build the string in 3 parts and then .join() those together. The middle portion is the column definitions, joining each of the item in the original list. This doesn't seem particularly healthy; both in the number of columns and the fact that everything is VARCHAR(45) but that's your decision:
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
query = ''.join(["(CREATE TABLE IF NOT EXISTS table1 ",
' VARCHAR(45), '.join(collist),
' VARCHAR(45))'])
Because we used join, you need to specify the last column type separately (the third item in the list) to correctly close the query.
NOTE: If the input data comes from user input then this would be susceptible to SQL injection since you are just formatting unknown strings in, to be executed. I am assuming the list of column names is internal to your program.

How to unpack result of sub-query into list-type field to result of original query in peewee?

How to make peewee put ids of related table rows into additional list-like field into resulting query?
I want to make duplicates detecting manager for media files. For each file on my PC I have record in database with fields like
File name, Size, Path, SHA3-512, Perceptual hash, Tags, Comment, Date added, Date changed, etc...
Depending on the situation I want to use different patterns to be used to consider records in table as duplicates.
In the most simple case I want just to see all records having the same hash, so I
subq = Record.select(Record.SHA).group_by(Record.SHA).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA)).order_by(Record.SHA)
for r in q:
process_record_in_some_way(r)
and everything is fine.
But there are lot of cases when I want to use different sets of table columns as grouping patterns. So in the worst case I use all of them except id and "Date added" column to detect exact duplicating rows in database, when I just readded the same file for few times which leads to the monster like
subq = Record.select(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA and Record.Name == q.c.Name and Record.Date == q.c.Date and Record.Size == q.c.Size and Record.Tags == q.c.Tags)).order_by(Record.SHA)
for r in q:
process_record_in_some_way(r)
and this is not the full list of my fields, just example.
Same thing I have to do for other patterns of sets of fields, i.e. duplicating it's list 3 times in select clause, grouping clause of subquery and then listing them all again in joining clause.
I wish I could just group the records with appropriate pattern and peewee would just list ids of all the members of each group into new list field like
q=Record.select(Record, SOME_MAJIC.alias('duplicates')).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1).SOME_ANOTHER_MAJIC
for r in q:
process_group_of_records(r) # r.duplicates == [23, 44, 45, 56, 100], for example
How can I do this? Listing the same parameters trice I really feel like I do something wrong.
You can use GROUP_CONCAT (or for postgres, array_agg) to group and concatenate a list of ids/filenames, whatever.
So for files with the same hash:
query = (Record
.select(Record.sha, fn.GROUP_CONCAT(Record.id).alias('id_list'))
.group_by(Record.sha)
.having(fn.COUNT(Record.id) > 1))
This is a relational database. So you're dealing all the time, everywhere, with tables consisting of rows and columns. There's no "nesting". GROUP_CONCAT is about as close as you can get.

SQLAlchemy: Suffix table name to output columns

I have a query where I join multiple tables with similar column names. To disambiguate them, I want to suffix the table name to the column name like: <column_name>_<table_name>. There are hundreds of columns in each table, so I would like to do it programmatically.
Is there a way to do something like?
sa.select([
table1.c.suffix('_1'),
table2.c.suffix('_2')]).
select_from(table1.join(table2, table1.c.id == table2.c.id))
You want to use the label keyword:
sa.select([
table1.c.column_name.label('_1'),
table2.c.column_name.label('_2')]).
select_from(table1.join(table2, table1.c.id == table2.c.id))
This will allow you to have the same column name from different tables.
If you have a table that is dynamic, or tons of columns, your best bet will be to do something like this:
pseudo code:
select * from information_schema.columns where table_name = 'my_table"
get the results from a query
return_columns = []
counter = 0
for r in results:
counter += 1
return_columns.append("`table_name`.`" + r.column_name + "` as col_{}".format(counter))
Creating dynamic sql will require you to do a bit of building out. I do this in my application all the time. Except I don't use information schema. I have a table which has my column names in it.
This should lead you in the right direction.

Python Sqlite, not able to print first line

Sqlite table structure:
id sno
1 100
2 200
3 300
4 400
conn=sqlite3.connect('test.sqlite')
c=conn.cursor()
c.execute("select * from abc")
mysel=c.execute("select * from abc where [id] = 1 ")
out put is:
1 100
its not printing id and sno i.e the First line of the table
how i can print First Line of table along with any kind of selection
please help
ID and sno are not data, they are part of your table structure (the column names).
If you want to get the names of the columns you need to do something like
connection = sqllite3.connect('test.sqlite')
cursor = connection.execute('select * from abc')
names = list(map(lambda x: x[0], cursor.description))
There isn't really a 'first line' containing the column names, that's just something the command line client prints out by default to help you read the returned records.
A dbapi2 conforming cursor has an attribute description, which is a list of tuples containing information about the data returned by the last query. The first element of each tuple will be the name of the column, so to print the column names, you can do something similar to:
c.execute("select * from abc")
print(tuple(d[0] for d in c.description))
for row in c:
print(row)
This will just print a tuple representation of the names and the records.
If you want to obtain details on the table you can use the following statement
PRAGMA table_info('[your table name]')
This will return a list of tuple with each tuple containing informations about a column
You will still have to add it to the data collected using the SELECT statement
When you write ... WHERE id = 1, you get only that particular record.
If you want to also get the first record, you have to tell SQLite that you want it:
SELECT id, sno FROM abc WHERE id = 'id'
UNION ALL
SELECT id, sno FROM abc WHERE id = 1
And when you already know what this particular subquery returns, you do not even need to bother with searching the table (and thus do not need to actually store the column names in the table):
SELECT 'id', 'sno'
UNION ALL
SELECT id, sno FROM abc WHERE id = 1

How to find duplicates in MySQL

Suppose I have many columns. If 2 columns match and are exactly the same, then they are duplicates.
ID | title | link | size | author
Suppose if link and size are similar for 2 rows or more, then those rows are duplicates.
How do I get those duplicates into a list and process them?
Will return all records that have dups:
SELECT theTable.*
FROM theTable
INNER JOIN (
SELECT link, size
FROM theTable
GROUP BY link, size
HAVING count(ID) > 1
) dups ON theTable.link = dups.link AND theTable.size = dups.size
I like the subquery b/c I can do things like select all but the first or last. (very easy to turn into a delete query then).
Example: select all duplicate records EXCEPT the one with the max ID:
SELECT theTable.*
FROM theTable
INNER JOIN (
SELECT link, size, max(ID) as maxID
FROM theTable
GROUP BY link, size
HAVING count(ID) > 1
) dups ON theTable.link = dups.link
AND theTable.size = dups.size
AND theTable.ID <> dups.maxID
Assuming that none of id, link or size can be NULL, and id field is the primary key. This gives you the id's of duplicate rows. Beware that same id can be in the results several times, if there are three or more rows with identical link and size values.
select a.id, b.id
from tbl a, tbl b
where a.id < b.id
and a.link = b.link
and a.size = b.size
After you remove the duplicates from the MySQL table, you can add a unique index
to the table so no more duplicates can be inserted:
create unique index theTable_index on theTable (link,size);
If you want to do it exclusively in SQL, some kind of self-join of the table (on equality of link and size) is required, and can be accompanied by different kinds of elaboration. Since you mention Python as well, I assume you want to do the processing in Python; in that case, simplest is to build an iterator on a 'SELECT * FROM thetable ORDER BY link, size, and process withitertools.groupbyusing, as key, theoperator.itemgetter` for those two fields; this will present natural groupings of each bunch of 1+ rows with identical values for the fields in question.
I can elaborate on either option if you clarify where you want to do your processing and ideally provide an example of the kind of processing you DO want to perform!

Categories

Resources