I am quite new to SQL, but trying to bugfix the output of an SQL-Query. However this question does not concern the bug, but rather why SQLite3 does not yield an error when it should.
I have query string that looks like:
QueryString = ("SELECT e.event_id, "
"count(e.event_id), "
"e.state, "
"MIN(e.boot_time) AS boot_time, "
"e.time_occurred, "
"COALESCE(e.info, 0) AS info "
"FROM events AS e "
"JOIN leg ON leg.id = e.leg_id "
"GROUP BY e.event_id "
"ORDER BY leg.num_leg DESC, "
"e.event_id ASC;\n"
)
This yields an output with no errors.
What I dont understand, is why there is no error when I GROUP BY e.event_id and e.state and e.time_occurred does not contain aggregate-functions and is not part of the GROUP BY statement?
e.state is a string column. e.time_occurred is an integer column.
I am using the QueryString in Python.
In a misguided attempt to be compatible with MySQL, this is allowed. (The non-aggregated column values come from some random row in the group.)
Since SQLite 3.7.11, using min() or max() guarantees that the values in the non-aggregated columns come from the row that has the minimum/maximum value in the group.
SQLite and MySQL allow bare columns in an aggregation query. This is explained in the documentation:
In the query above, the "a" column is part of the GROUP BY clause and
so each row of the output contains one of the distinct values for "a".
The "c" column is contained within the sum() aggregate function and so
that output column is the sum of all "c" values in rows that have the
same value for "a". But what is the result of the bare column "b"? The
answer is that the "b" result will be the value for "b" in one of the
input rows that form the aggregate. The problem is that you usually do
not know which input row is used to compute "b", and so in many cases
the value for "b" is undefined.
Your particular query is:
SELECT e.event_id, count(e.event_id), e.state, MIN(e.boot_time) AS boot_time,
e.time_occurred, COALESCE(e.info, 0) AS info
FROM events AS e JOIN
leg
ON leg.id = e.leg_id "
GROUP BY e.event_id
ORDER BY leg.num_leg DESC, e.event_id ASC;
If e.event_id is the primary key in events, then this syntax is even supported by the ANSI standard, because event_id is sufficient to uniquely define the other columns in a row in events.
If e.event_id is a PRIMARY or UNIQUE key of the table then e.time_occurred is called "functionally dependent" and would not even throw an error in other SQL compliant DBMSs.
However, SQLite has not implemented functional dependency. In the case of SQLite (and MySQL) no error is thrown even for columns that are not functionally dependent on the GROUP BY columns.
SQLite (and MySQL) simply select a random row from the result set to fill the (in SQLite lingo) "bare column", see this.
Related
I am using sqlalchemy with postgresql,
Tables
Shape[id, name, timestamp, user_id]#user_id referring id column in user table
User[id, name]
this query -
query1 = self.session.query(Shape.timestamp, Shape.name, User.id,
extract('day', Shape.timestamp).label('day'),
extract('hour', Shape.timestamp).label('hour'),
func.count(Shape.id).label("total"),
)\
.join(User, User.id==Shape.user_id)\
.filter(Shape.timestamp.between(from_datetime, to_datetime))\
.group_by(Shape.user_id)\
.group_by('hour')\
.all()
this works well in sqlite3+sqlalchemy, but it is not working in postgresql+sqlalchemy
I got this error -> (psycopg2.errors.GroupingError) column "Shape.timestamp" must appear in the GROUP BY clause or be used in an aggregate function
I need to group only by the user_id and the hour in the timestamp, where the Shape.timestamp is the DateTime python object
but, the error saying to add the Shape.timestamp in the group_by function also,
If i add the Shape.timestamp in the group_by, then it shows all the records
If i need to use some function on other columns, then how i will get the other column actual data, is there any way to get the column data as it is without adding in group_by or using some function
How to solve this
This is a basic SQL issue, what if in your group, there is several timestamp values ?
You either need to use an aggregator function (COUNT, MIN, MAX, AVG) or specify it in your GROUP BY.
NB. SQLite allows ungrouped columns in GROUP BY, in which case "it is evaluated against a single arbitrarily chosen row from within the group." (SQLite doc section 2.4)
Try changing the 9 line ->
query1 = self.session.query(Shape.timestamp, Shape.name, User.id,
extract('day', Shape.timestamp).label('day'),
extract('hour', Shape.timestamp).label('hour'),
func.count(Shape.id).label("total"),
)\
.join(User, User.id==Shape.user_id)\
.filter(Shape.timestamp.between(from_datetime, to_datetime))\
.group_by(Shape.user_id)\
.group_by(extract('hour', Shape.timestamp))\
.all()
I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:
con = lite.connect('test_score.db')
with con:
cur = con.cursor()
cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
item = cur.fetchone()
on.commit()
cur.close()
con.close()
I get table "scores" with following data:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
Two different users (userA and userB) copy test_score.db and code to their computer and use it separately.
I get back two db test_score.db but now with different content:
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15
I was trying to use
insert into AuditRecords select * from toMerge.AuditRecords;
to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing.
I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.
Result should be something like this:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09
I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.
Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.
You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?
Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:
insert into
AuditRecords
select
id,
first_name,
last_name,
max(score) as score
from
toMerge.AuditRecords
group by
id,
first_name,
last_name;
For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:
CREATE TABLE AuditRecords(
id INTEGER PRIMARY KEY AUTOINCREMENT,
first TEXT,
last TEXT,
score INTEGER,
UNIQUE(first, last, score)
);
Now you can use INSERT OR IGNORE to merge the tables:
INSERT OR IGNORE INTO AuditRecords(first, last, score)
SELECT first, last, score
FROM toMerge.AuditRecords;
Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.
Another way to do it without defining the UNIQUE constraint is to use EXCEPT:
INSERT INTO AuditRecords(first, last, score)
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords
I am trying to create a table in mariadb using python. I have all the column names stored in a list as shown below.
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
This is just the sample list. Actual list has 200 items in the list. I am trying to create a table using the above collist elements as columns and the datatype for the columns is VARCHAR.
This is the code I am using to create a table
for p in collist:
cur.execute('CREATE TABLE IF NOT EXISTS table1 ({} VARCHAR(45))'.format(p)
The above code is executing but only the first element of the list is being added as a column in the table and I cannot see the remaining elements. I'd really appreciate if I can get a help with this.
You can build the string in 3 parts and then .join() those together. The middle portion is the column definitions, joining each of the item in the original list. This doesn't seem particularly healthy; both in the number of columns and the fact that everything is VARCHAR(45) but that's your decision:
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
query = ''.join(["(CREATE TABLE IF NOT EXISTS table1 ",
' VARCHAR(45), '.join(collist),
' VARCHAR(45))'])
Because we used join, you need to specify the last column type separately (the third item in the list) to correctly close the query.
NOTE: If the input data comes from user input then this would be susceptible to SQL injection since you are just formatting unknown strings in, to be executed. I am assuming the list of column names is internal to your program.
Im using Python to query a SQL database. I'm fairly new with databases. I've tried looking up this question, but I can't find a similar enough question to get the right answer.
I have a table with multiple columns/rows. I want to find the MAX of a single column, I want ALL columns returned (the entire ROW), and I want only one instance of the MAX. Right now I'm getting ten ROWS returned, because the MAX is repeated ten times. I only want one ROW returned.
The query strings I've tried so far:
sql = 'select max(f) from cbar'
# this returns one ROW, but only a single COLUMN (a single value)
sql = 'select * from cbar where f = (select max(f) from cbar)'
# this returns all COLUMNS, but it also returns multiple ROWS
I've tried a bunch more, but they returned nothing. They weren't right somehow. That's the problem, I'm too new to find the middle ground between my two working query statements.
In SQLite 3.7.11 or later, you can just retrieve all columns together with the maximum value:
SELECT *, max(f) FROM cbar;
But your Python might be too old. In the general case, you can sort the table by that column, and then just read the first row:
SELECT * FROM cbar ORDER BY f DESC LIMIT 1;
All I want is the count from TableA grouped by a column from TableB, but of course I need the item from TableB each count is associated with. Better explained with code:
TableA and B are Model objects.
I'm trying to follow this syntax as best I can.
Trying to run this query:
sq = session.query(TableA).join(TableB).\
group_by(TableB.attrB).subquery()
countA = func.count(sq.c.attrA)
groupB = func.first(sq.c.attrB)
print session.query(countA, groupB).all()
But it gives me an AttributeError (sq does not have attrB)
I'm new to SA and I find it difficult to learn. (links to recommended educational resources welcome!)
When you make a subquery out of a select statement, the columns that can be accessed from it must be in the columns clause. Take for example a statement like:
select x, y from mytable where z=5
If we wanted to make a subquery, then GROUP BY 'z', this would not be legal SQL:
select * from (select x, y from mytable where z=5) as mysubquery group by mysubquery.z
Because 'z' is not in the columns clause of "mysubquery" (it's also illegal since 'x' and 'y' should be in the GROUP BY as well, but that's a different issue).
SQLAlchemy works the same exact way. When you say query(..).subquery(), or use the alias() function on a core selectable construct, it means you're wrapping your SELECT statement in parenthesis, giving it a (usually generated) name, and giving it a new .c. collection that has only those columns that are in the "columns" clause, just like real SQL.
So here you'd need to ensure that TableB, at least the column you're dealing with externally, is available. You can also limit the columns clause to just those columns you need:
sq = session.query(TableA.attrA, TableB.attrB).join(TableB).\
group_by(TableB.attrB).subquery()
countA = func.count(sq.c.attrA)
groupB = func.first(sq.c.attrB)
print session.query(countA, groupB).all()
Note that the above query probably only works on MySQL, as in general SQL it's illegal to reference any columns that aren't part of an aggregate function, or part of the GROUP BY, when grouping is used. MySQL has a more relaxed (and sloppy) system in this regard.
edit: if you want the results without the zeros:
import collections
letter_count = collections.defaultdict(int)
for count, letter in session.query(func.count(MyClass.id), MyClass.attr).group_by(MyClass.attr):
letter_count[letter] = count
for letter in ["A", "B", "C", "D", "E", ...]:
print "Letter %s has %d elements" % letter_count[letter]
note letter_count[someletter] defaults to zero if otherwise not populated.