I would like to calculate the average in specific column of my table grouping by another column, but it doesn't work, I have a problem whith the group by.
My code:
import sqlite3
conn = sqlite3.connect("ma_base.db")
cur = conn.cursor()
cur.execute("UPDATE test_centrale set avg_price = avg(prix) group by test_centrale.version ")
conn.commit()
print('done')
cur.close()
If I got your question right. Use subquery to find the second average and update using following query
cur.execute("UPDATE test_centrale set avg_price = (select avg(prix) from test_centrale group by test_centrale.version )")
Related
I just started learning python and am trying out projects. I'm having a with my python code. What I want to do is to go through the rows in my table one after another and perform a specific task on each row.
With an example, I have rows with multiple columns and three of the columns are con_access, exam and total. Now I want to get sum of the values in con_access and exam columns and then put it in the total column. This calculation should done one by one.
The problem that am having is that the program goes to the last row, takes the total variable and populate every other with it
Below is my code
def result_total():
mdb = mysql.connector.connect(
host="localhost",
user="root",
passwd="**************",
database="majesty"
)
mycursor = mdb.cursor()
# mycursor.execute("SELECT con_access, exam FROM students")
mycursor.execute("SELECT CAST(con_access AS UNSIGNED) as con_access, CAST(exam AS UNSIGNED) as exam FROM students")
rows = mycursor.fetchall()
for row in rows:
if row:
total = row['con_access'] + row['exam']
sql = "UPDATE students SET total = {}".format(total)
mycursor.execute(sql)
mdb.commit()
find total like below
total = row['con_access'] + row['exam']
If your datatype of con_access and exam in mysql table is varchar, you should cast it. Change the select query as follows.
SELECT CAST(con_access AS UNSIGNED) as con_access, CAST(exam AS UNSIGNED) as exam FROM students
Why my code only inserts one line?
thewholeenchilada = ("SELECT SUBSTR(email, (SELECT INSTR(email,'#'))) AS org, SUM(count) as count FROM Em GROUP BY org ORDER BY count DESC")
for salida in cur.execute(thewholeenchilada):
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, ?)''', (salida[0],row[1]))
print((str(salida[0]), salida[1]))
conn.commit()
Avoid the loop and run one INSERT INTO ... SELECT query. Right now you re-use same cursor outside and inside loop causing issues with processing. Either use two different cursors or efficiently combine and have database engine run action query:
sql = '''INSERT INTO Counts (org, [count])
SELECT SUBSTR(email, INSTR(email, '#')+1) AS org,
SUM(count) as [count]
FROM Em
GROUP BY org
ORDER BY count DESC
'''
cur.execute(sql)
conn.commit()
I have the following code to calculate a value in specific rows of my table:
cursor.execute("SELECT * FROM restaurants WHERE license_type_code='20' ORDER BY general_score DESC;")
group_size = cursor.rowcount
for record in cursor:
index = cursor.rownumber
percentile = 100*(index - 0.5)/group_size
print percentile
What I need to do is to add the percentile result to the respective column score_percentile of each record I got with the SELECT query.
I thought about an UPDATE query like this:
cursor.execute("UPDATE restaurants SET score_percentile="+str(percentile)+" WHERE license_type_code IN (SELECT * FROM restaurants WHERE license_type_code='20' ORDER BY general_score DESC)")
But I don't know if that query is correct or if there's a more efficient and less silly way to do that (I'm sure there has to be).
Could you help me, please?
I'm new with SQL so any help or advice is highly appreciated.
Thanks!
You don't need the loop at all. Just one update query
cursor.execute("UPDATE restaurants SET score_percentile = 100*(rownumber - 0.5)/group_size FROM (SELECT COUNT (*) as group_size FROM restaurants WHERE license_type_code='20') as t WHERE restaurants.license_type_code='20'")
As Thomas said, I just needed an update query with the following syntax:
cursor.execute("UPDATE restaurants f SET score_percentile = ROUND(100*(f2.rownumber - 0.5)/"+str(group_size)+",3) FROM (SELECT f2.*,row_number() OVER (ORDER BY general_score DESC) as rownumber FROM restaurants f2 WHERE license_type_code='20') f2 WHERE f.license_type_code='20' AND f2.license_number=f.license_number;")
And I got the group_size by:
cursor.execute("SELECT COUNT(*) FROM restaurants WHERE license_type_code='20'")
group_size = cursor.fetchone()
group_size = group_size[0]
That worked perfect for my case
I am trying to take a dataframe and convert it into sql. I am creating the table first to set the unique indexing to allow for a rolling update with out having duplicates if there happens to be two A. Rods over time. Though I can't seem to shake this table column error and i don't know why.
import pandas as pd
import sqlite3 as sq
conn = sq.connect('test.db')
c = conn.cursor()
def set_table():
c.execute("""CREATE TABLE IF NOT EXISTS players(
"#" INTEGER,
" " REAL,
"Named" TEXT,
"B/T" TEXT,
"Ht" TEXT,
"Wt" TEXT,
"DOB" TEXT);""")
conn.commit()
def set_index_table():
c.execute(""" CREATE UNIQUE INDEX index_unique
ON players (Named, DOB)""")
conn.commit()
set_table()
set_index_table()
roster_active = pd.read_html('http://m.yankees.mlb.com/roster',index_col=0)
df = roster_active[0]
df = df.rename(columns={'Name': 'Named'})
df.to_sql('players', conn, if_exists='append')
conn.commit()
conn.close()
sqlite3.OperationalError: table players has no column named
Thank you for your time.
So I am not completely sure why this doesn't work but I found how I could get it to work. I believe it had something to do with the dataframe index. So I defined what columns I wanted to select for the dataframe and that worked.
df = df[['Named','B/T', 'Ht','Wt','DOB']]
I have 6 tables in my SQLite database, each table with 6 columns(Date, user, NormalA, specialA, contact, remarks) and 1000+ rows.
How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row?
Assuming this is your model:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(DateTime)
user = Column(String)
# do not really care of columns other than `id` and `date`
# important here is the fact that `id` is a PK
following are two ways to delete you data:
Find duplicates, mark them for deletion and commit the transaction
Create a single SQL query which will perform deletion on the database directly.
For both of them a helper sub-query will be used:
# helper subquery: find first row (by primary key) for each unique date
subq = (
session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
.group_by(MyTable.date)
) .subquery('date_min_id')
Option-1: Find duplicates, mark them for deletion and commit the transaction
# query to find all duplicates
q_duplicates = (
session
.query(MyTable)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
)
for x in q_duplicates:
print("Will delete %s" % x)
session.delete(x)
session.commit()
Option-2: Create a single SQL query which will perform deletion on the database directly
sq = (
session
.query(MyTable.id)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
).subquery("subq")
dq = (
session
.query(MyTable)
.filter(MyTable.id.in_(sq))
).delete(synchronize_session=False)
Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates:
query = session.query(
MyTable
).\
having(func.count(MyTable.date) > 1).\
group_by(MyTable.date).all()
If you only want to show unique dates; distinct on is what you might need
While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL.
And since the records don't have a key, we need the row number (_ROWID_) to delete the targeted records and I don't think the API provides it.
So first we connect to the database:
from sqlalchemy import create_engine
db = create_engine(r'sqlite:///C:\temp\example.db')
eng = db.engine
Then to list all the records:
for row in eng.execute("SELECT * FROM TableA;") :
print row
And to display all the duplicated records where the dates are identical:
for row in eng.execute("""
SELECT * FROM {table}
WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
ORDER BY {field};
""".format(table="TableA", field="Date")) :
print row
Now that we identified all the duplicates, they probably need to be fixed if the other fields are different:
eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
eng.execute("UPDATE TableA SET NormalA=4, specialA=8 WHERE Date = '2015-18-12' ;");
And finnally to keep the first inserted record and delete the most recent duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
Or to keep the last inserted record and delete the other duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount