I am querying MySQL database (fintech_16) through Python (pymysql) to get UNIQUE values of a column (Trend). When I ran the following query:
cursor.execute ("SELECT DISTINCT `Trend` FROM `fintech_16` ")
cursor.fetchall()
I got the following result:
((u'Investments',),
(u'Expansion',),
(u'New Products',),
(u'Collaboration',),
(u'New Products,Investments',),
(u'New Products,Expansion',),
(u'Expansion,Investments',),
(u'New Products,Collaboration',),
(u'Regulations',),
(u'Investments,New Products',),
(u'Investments,Expansion',),
(u'Collaboration,Investments',),
(u'Expansion,New Products',),
(u'Collaboration,New Products',))
Now. since some of the ids had more than one trend, the DB is counting them as a separate trend.
How should I tweak my query to get only the 5 trends (Investments, Expansion, New Products, Collaboration, Regulations) along with their counts?
Though these are only 5, I can use the LIKE %Investments% to get the count manually, but I want the code/query to do it.
TIA
First approach is to use SET data type to define exact values, you have 5 - ['New Products', 'Investments', 'Expansion'...].
Then you could use FIND_IN_SET function to count values you need:
SELECT COUNT(*) FROM `fintech_16` WHERE FIND_IN_SET('New Products', `Trend`) > 0;
Related
I am using sqlalchemy with postgresql,
Tables
Shape[id, name, timestamp, user_id]#user_id referring id column in user table
User[id, name]
this query -
query1 = self.session.query(Shape.timestamp, Shape.name, User.id,
extract('day', Shape.timestamp).label('day'),
extract('hour', Shape.timestamp).label('hour'),
func.count(Shape.id).label("total"),
)\
.join(User, User.id==Shape.user_id)\
.filter(Shape.timestamp.between(from_datetime, to_datetime))\
.group_by(Shape.user_id)\
.group_by('hour')\
.all()
this works well in sqlite3+sqlalchemy, but it is not working in postgresql+sqlalchemy
I got this error -> (psycopg2.errors.GroupingError) column "Shape.timestamp" must appear in the GROUP BY clause or be used in an aggregate function
I need to group only by the user_id and the hour in the timestamp, where the Shape.timestamp is the DateTime python object
but, the error saying to add the Shape.timestamp in the group_by function also,
If i add the Shape.timestamp in the group_by, then it shows all the records
If i need to use some function on other columns, then how i will get the other column actual data, is there any way to get the column data as it is without adding in group_by or using some function
How to solve this
This is a basic SQL issue, what if in your group, there is several timestamp values ?
You either need to use an aggregator function (COUNT, MIN, MAX, AVG) or specify it in your GROUP BY.
NB. SQLite allows ungrouped columns in GROUP BY, in which case "it is evaluated against a single arbitrarily chosen row from within the group." (SQLite doc section 2.4)
Try changing the 9 line ->
query1 = self.session.query(Shape.timestamp, Shape.name, User.id,
extract('day', Shape.timestamp).label('day'),
extract('hour', Shape.timestamp).label('hour'),
func.count(Shape.id).label("total"),
)\
.join(User, User.id==Shape.user_id)\
.filter(Shape.timestamp.between(from_datetime, to_datetime))\
.group_by(Shape.user_id)\
.group_by(extract('hour', Shape.timestamp))\
.all()
I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:
con = lite.connect('test_score.db')
with con:
cur = con.cursor()
cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
item = cur.fetchone()
on.commit()
cur.close()
con.close()
I get table "scores" with following data:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
Two different users (userA and userB) copy test_score.db and code to their computer and use it separately.
I get back two db test_score.db but now with different content:
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15
I was trying to use
insert into AuditRecords select * from toMerge.AuditRecords;
to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing.
I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.
Result should be something like this:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09
I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.
Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.
You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?
Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:
insert into
AuditRecords
select
id,
first_name,
last_name,
max(score) as score
from
toMerge.AuditRecords
group by
id,
first_name,
last_name;
For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:
CREATE TABLE AuditRecords(
id INTEGER PRIMARY KEY AUTOINCREMENT,
first TEXT,
last TEXT,
score INTEGER,
UNIQUE(first, last, score)
);
Now you can use INSERT OR IGNORE to merge the tables:
INSERT OR IGNORE INTO AuditRecords(first, last, score)
SELECT first, last, score
FROM toMerge.AuditRecords;
Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.
Another way to do it without defining the UNIQUE constraint is to use EXCEPT:
INSERT INTO AuditRecords(first, last, score)
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords
I want to insert values to SQL Server from python. Here's my code:
for value in rows:
cursor.execute ("""INSERT INTO Table ([ColumnOne]) VALUES (?)""", value)
cnxn.commit()
In rows , it contains lists (iteration) of rows, something like this:
row 1 contains of lists of float numbers
1.0
2.0
1.5
1.75
..... (in total of 1000 values in a row/column),
And it goes the same with row2, row3, and so on.
But, when I tried to run the code, I have this error
pyodbc.ProgrammingError: ('The SQL contains 1 parameter markers, but
1000 parameters were supplied', 'HY000')
Is there any way that I could do so the float values are not treated individually or to fix this problem?
I think maybe I should use ','.join statement to make it as string?
Considering I am not good at explaining and new to python, please correct me if i have some mistakes. Thank you.
When you attempt to insert multiple table rows in one query, you need to supply a list of values for each row.
For example, the following query would insert two rows:
INSERT INTO Table (ColumnOne) VALUES (1.0), (2.0);
So your python code needs to prepare the correct VALUES part of the query:
for row in rows:
values = ", ".join(("(?)",) * len(row))
cursor.execute(f"INSERT INTO Table (ColumnOne) VALUES {values}", row)
I am trying to create a table in mariadb using python. I have all the column names stored in a list as shown below.
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
This is just the sample list. Actual list has 200 items in the list. I am trying to create a table using the above collist elements as columns and the datatype for the columns is VARCHAR.
This is the code I am using to create a table
for p in collist:
cur.execute('CREATE TABLE IF NOT EXISTS table1 ({} VARCHAR(45))'.format(p)
The above code is executing but only the first element of the list is being added as a column in the table and I cannot see the remaining elements. I'd really appreciate if I can get a help with this.
You can build the string in 3 parts and then .join() those together. The middle portion is the column definitions, joining each of the item in the original list. This doesn't seem particularly healthy; both in the number of columns and the fact that everything is VARCHAR(45) but that's your decision:
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
query = ''.join(["(CREATE TABLE IF NOT EXISTS table1 ",
' VARCHAR(45), '.join(collist),
' VARCHAR(45))'])
Because we used join, you need to specify the last column type separately (the third item in the list) to correctly close the query.
NOTE: If the input data comes from user input then this would be susceptible to SQL injection since you are just formatting unknown strings in, to be executed. I am assuming the list of column names is internal to your program.
Im using Python to query a SQL database. I'm fairly new with databases. I've tried looking up this question, but I can't find a similar enough question to get the right answer.
I have a table with multiple columns/rows. I want to find the MAX of a single column, I want ALL columns returned (the entire ROW), and I want only one instance of the MAX. Right now I'm getting ten ROWS returned, because the MAX is repeated ten times. I only want one ROW returned.
The query strings I've tried so far:
sql = 'select max(f) from cbar'
# this returns one ROW, but only a single COLUMN (a single value)
sql = 'select * from cbar where f = (select max(f) from cbar)'
# this returns all COLUMNS, but it also returns multiple ROWS
I've tried a bunch more, but they returned nothing. They weren't right somehow. That's the problem, I'm too new to find the middle ground between my two working query statements.
In SQLite 3.7.11 or later, you can just retrieve all columns together with the maximum value:
SELECT *, max(f) FROM cbar;
But your Python might be too old. In the general case, you can sort the table by that column, and then just read the first row:
SELECT * FROM cbar ORDER BY f DESC LIMIT 1;