I Have two tables in mysql that store amounts. For instance, we have amounts0 and amounts1. Each table stores a name and an amount. Let's say that in amounts0 we have: "'James', 50". We also have "'Paul', 75". In the other table, we have: "'James', 25" and "'Paul' 50" and "'James', 10". I have a text file and I want a result like this:
What this is doing: It is collecting the amounts from the first table and subtracting them from the sum of the amounts in the second table grouped by name. For example in the picture above we have: James: 50 - sum(25, 10) = 25 - 35 so 15.
Can someone help me to create a query that does that with the 2 tables and then write them to the text file?
I think you can use this
SELECT NAME,
Abs(Sum(t1.amount) - t2.amount) AS amount
FROM table1 AS t1
JOIN (SELECT Sum( ` amount ` ) AS amount,
` NAME ` as t2name
FROM table2
GROUP BY ` NAME ` ) AS t2
ON t2.t2name = t1.NAME
GROUP BY NAME
Related
I am trying to do a simple select all query in python using the Cx_oracle module. When I do a select all for the first ten rows in a table I am able to print our the output. However when I do a select all for the first ten rows for a specific date in the table all that gets printed out is a blank list like this: [].
Here is the query select all query that prints out all the results:
sql_query = "select * from table_name fetch first 10 rows only"
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)
The above query works and is able to print out the results.
Here is the query that I am having trouble with and this query below works in sql developer:
sql_query = "select * from table_name where requested_time = '01-jul-2021' fetch first 10 rows only"
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)
I also tried this way where I define the date outside of the query.
specific_date = '01-jul-2021'
sql_query = "select * from table_name where requested_time = '{0}' fetch first 10 rows only".format(specific_date)
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)
Oracle dates have a time portion. The query
select * from table_name where requested_time = '01-jul-2021' fetch first 10 rows only
Will only give you the rows for which the value for the column requested_time is 01-jul-2021 00:00. Chances are that you have other rows for which there is a time portion as well.
To cut off the time portion there are several options. Note that I explicitly added the a TO_DATE function to the date - you're assuming that the database is expecting a dd-mon-yyyy format and successfully will do the implicit conversion but it's safer to let the database know.
TRUNC truncate the column - this will remove the time portion
SELECT *
FROM table_name
WHERE TRUNC(requested_time) = TO_DATE('01-jul-2021','DD-mon-YYYY')
FETCH FIRST 10 ROWS ONLY
Format the column date to the same format as the date you supplied and compare the resulting string:
SELECT *
FROM table_name
WHERE TO_CHAR(requested_time,'DD-mon-YYYY') = '01-jul-2021'
FETCH FIRST 10 ROWS ONLY
Example:
pdb1--KOEN>create table test_tab(requested_time DATE);
Table TEST_TAB created.
pdb1--KOEN>BEGIN
2 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 00:00','DD-MON-YYYY HH24:MI'));
3 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 01:00','DD-MON-YYYY HH24:MI'));
4 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 02:10','DD-MON-YYYY HH24:MI'));
5 END;
6 /
PL/SQL procedure successfully completed.
pdb1--KOEN>SELECT COUNT(*) FROM test_tab WHERE requested_time = TO_DATE('08-AUG-2021','DD-MON-YYYY');
COUNT(*)
----------
1
--only 1 row. That is the rows with time 00:00. Other rows are ignored
pdb1--KOEN>SELECT COUNT(*) FROM test_tab WHERE TRUNC(requested_time) = TO_DATE('08-AUG-2021','DD-MON-YYYY');
-- all rows
COUNT(*)
----------
3
I have the following table in my database, which represents the shifts of a working day.
When a new product is added to another table 'Products' I want to assign a shift to it based on the start_timestamp.
So when I insert into Products its takes start_timestamp and looks in table ProductionPlan and looks for a result (ProductionPlan.name) where it is between the start and end timestamp of that shift.
On that way I can assign a shift to the product.
I hope somebody can help me out with this!
Table ProductionPlan
name
start_timestamp
end_timestamp
shift 1
2021-05-10T07:00:00
2021-05-10T11:00:00
shift 2
2021-05-10T11:00:00
2021-05-10T15:00:00
shift 3
2021-05-10T15:00:00
2021-05-10T19:00:00
shift 1
2021-05-11T07:00:00
2021-05-11T11:00:00
shift 2
2021-05-11T11:00:00
2021-05-11T15:00:00
shift 3
2021-05-11T15:00:00
2021-05-11T19:00:00
Table Products
id
name
start_timestamp
end_timestamp
shift
1
Schroef
2021-05-10T08:09:05
2021-05-10T08:19:05
2
Bout
2021-05-10T08:20:08
2021-04-28T08:30:11
3
Schroef
2021-05-10T12:09:12
2021-04-28T12:30:15
I have the following code to insert into Products:
def insertNewProduct(self, log):
"""
This function is used to insert a new product into the database.
#param log : a object to log
#return None.
"""
debug("Class: SQLite, function: insertNewProduct")
self.__openDB()
timestampStart = datetime.fromtimestamp(int(log.startTime)).isoformat()
queryToExecute = "INSERT INTO Products (name, start_timestamp) VALUES('{0}','{1}')".format(log.summary,
timestampStart)
self.cur.execute(queryToExecute)
self.__closeDB()
return self.cur.lastrowid
It's just a simple INSERT INTO but I want to add a query or even extend this query to fill in the column shift.
You can use a SELECT inside an INSERT.
queryToExecute = """INSERT INTO Products (name, start_timestamp, shift)
SELECT :1, :2, name FROM ProductionPlan pp
WHERE :2 BETWEEN pp.start_timestamp and pp.end_timestamp"""
self.cur.execute(queryToExecute, (log.summary, timestampStart))
In above code I have used a parameterized query because I hate inserting parameters as strings inside a query. It was the cause of too many SQL injection attacks...
I have a Hive table wherein data looks like this -
Each customer has corresponding accounts and the objective is to make intra-customer pair.
Pairs are based on whether the accounts have same year of birth or their first 3 characters of name are same.
Eg - Sam and Samuel.
The output looks like this -
Ideally same account pair like AA, XX etc should not get created.
Also a pair AC and CA are both same hence only one entry of such pairs is needed. A pair can be formed on Name as well Year of Birth key but here also only one entry is required (can be anyone).
How should I approach this problem.
Test data for check -
create table customer_account(
customer INT NOT NULL,
accounts VARCHAR(100) NOT NULL,
name VARCHAR(40) NOT NULL,
yob DATE,
);
INSERT INTO
customer_account(customer,accounts,name,yob)
VALUES
(1,"A","John",2001),
(1,"X","Tom",1996),
(1,"C","Harry",2001),
(2,"D","Sam",1994),
(2,"F","Samuel",1995),
(3,"Z","Jake",)1994,
(3,"G","Drake",1998),
(3,"H","Arnold",1993),
(3,"K","Yang",1990)
;
You should be able to use substrings for your join in the HIVE language. The logic should be sound though you may need to tune it for your needs a bit.
What you're trying to do is a unary (or self) join. Below is an example of a type of query that can be passed. You're essentially joining with an OR condition and testing that condition with a case statement to get the "Pair_Key". I used an inner join assuming you want only instances where matches occur.
SELECT
t1.customer as Customer1,
t2.customer as Customer2,
t1.Accounts as Accounts1,
t2.Accounts as Accounts2,
CONCAT(t1.Accounts, t2.Accounts) as Pair_No,
t1.Name as Name1,
t2.Name as Name2,
t1.YOB as YOB1,
t2.YOB as YOB2,
CASE
WHEN t1.YOB = t2.YOB THEN 'YOB'
WHEN SUBSTR(t1.Name, 3) = SUBSTR(t2.Name, 3) THEN 'Name'
else 'Issue'
END as Pair_Key
FROM (SELECT * FROM Table1) as t1
inner join (SELECT * FROM Table1) as t2 --instance 2 of the same table
on (SUBSTR(t1.Name, 3) = SUBSTR(t2.Name, 3) OR t1.YOB = t2.YOB)
Without test data or more details of how far along you are, this is a start.
If the customer number needs to be the same simply adjust to:
on (t1.Customer = t2.Customer) and (SUBSTR(t1.Name, 3) = SUBSTR(t2.Name, 3) OR t1.YOB = t2.YOB)
This does what you describe:
select t1.*, t2.name, t2.yob
from t t1 join
t t2
on t2.customer = t1.customer and
(t2.yob = t1.yob or
substr(t2.name, 1, 3) = substr(t1.name, 1, 3)
) and
t2.account > t1.account;
There is no need to fetch customer twice. If you want "identical" pairs, then change the last condition to >=.
I have 2 tables in MySQL.
One has transactions with important columns where each row has Debit account ID and Credit account ID. I have second table which contains Account name and special number associated to Account ID. I want somehow to try sql query which will take data from transactions table and assign account name and account number from second table.
I tried doing everything using two query , one would get transactions and second one would get account details and then I did iterate over dataframe and assigned everything one by one which doesn't seem to be good idea
query = "SELECT tr_id, tr_date, description, dr_acc, cr_acc, amount, currency, currency_rate, document, comment FROM transactions WHERE " \
"company_id = {} {} and deleted = 0 {} LIMIT {}, {}".format(
company_id, filter, sort, sn, en)
df = ncon.getDF(query)
df.insert(4, 'dr_name', '')
df.insert(6, 'cr_name', '')
data = tuple(list(set(df['dr_acc'].values.tolist() + df['cr_acc'].values.tolist())))
query = "SELECT account_number, acc_id, account_name FROM tb_accounts WHERE company_id = {} and deleted = 0 and acc_id in {}".format(
company_id, data)
df_accs = ncon.getDF(query)
for index, row in df_accs.iterrows():
acc = str(row['acc_id'])
ac = row['account_number']
nm = row['account_name']
indx = df.index[df['dr_acc'] == acc].tolist()
df.at[indx, 'dr_acc'] = ac
df.at[indx, 'dr_name'] = nm
indx = df.index[df['cr_acc'] == acc].tolist()
df.at[indx, 'cr_acc'] = ac
df.at[indx, 'cr_name'] = nm
What you're looking for, I think, is a SQL JOIN statement.
Taking a crack at writing a query that might work based on your code:
query = '''
SELECT transactions.tr_id,
transactions.tr_date,
transactions.description,
transactions.dr_acc,
transactions.cr_acc,
transactions.amount,
transactions.currency,
transactions.currency_rate,
transactions.document,
transactions.comment
FROM transactions INNER JOIN tb_accounts ON tb_accounts.acc_id = transactions.acc_id
WHERE
transactions.company_id = {} AND
tb_accounts.company_id = {} AND
transactions.deleted = 0 AND
tb_accounts.deleted = 0
ORDER BY transactions.tr_id
LIMIT 10;'''
The above query will, roughly, present query results with all the fields listed from the two tables for each pair of rows where the acc_id is the same.
NOTE, the query above will probably not have very good performance. SQL JOIN statements must be written with care, but I wrote it above in a way that's easy to understand, so as to illustrate the power of the JOIN.
You should as a matter of habit NEVER try to program something when you could use a join instead. As long as you take care to write a join properly so that it can be efficient, the MySQL engine will beat your python code for performance almost every time.
sort two dataframe and use merge for merging 2data frame
df1 = df1.sort_values(['dr_acc'], ascending=True)
df2 = df2.sort_values(['acc_id'], ascending=True)
merge2df = pd.merge(df1, df2, how='outer',
left_on=['dr_acc'], right_on=['acc_id'])
I assumed df1 is 1st query data set and df2 is 2nd query data set
sql query
'''SELECT tr_id, tr_date,
description,
dr_acc, cr_acc,
amount, currency,
currency_rate,
document,
account_number, acc_id, account_name
comment FROM transactions left join
tb_accounts on transactions.dr_acc=tb_accounts.account_number'''
I have the following code to calculate a value in specific rows of my table:
cursor.execute("SELECT * FROM restaurants WHERE license_type_code='20' ORDER BY general_score DESC;")
group_size = cursor.rowcount
for record in cursor:
index = cursor.rownumber
percentile = 100*(index - 0.5)/group_size
print percentile
What I need to do is to add the percentile result to the respective column score_percentile of each record I got with the SELECT query.
I thought about an UPDATE query like this:
cursor.execute("UPDATE restaurants SET score_percentile="+str(percentile)+" WHERE license_type_code IN (SELECT * FROM restaurants WHERE license_type_code='20' ORDER BY general_score DESC)")
But I don't know if that query is correct or if there's a more efficient and less silly way to do that (I'm sure there has to be).
Could you help me, please?
I'm new with SQL so any help or advice is highly appreciated.
Thanks!
You don't need the loop at all. Just one update query
cursor.execute("UPDATE restaurants SET score_percentile = 100*(rownumber - 0.5)/group_size FROM (SELECT COUNT (*) as group_size FROM restaurants WHERE license_type_code='20') as t WHERE restaurants.license_type_code='20'")
As Thomas said, I just needed an update query with the following syntax:
cursor.execute("UPDATE restaurants f SET score_percentile = ROUND(100*(f2.rownumber - 0.5)/"+str(group_size)+",3) FROM (SELECT f2.*,row_number() OVER (ORDER BY general_score DESC) as rownumber FROM restaurants f2 WHERE license_type_code='20') f2 WHERE f.license_type_code='20' AND f2.license_number=f.license_number;")
And I got the group_size by:
cursor.execute("SELECT COUNT(*) FROM restaurants WHERE license_type_code='20'")
group_size = cursor.fetchone()
group_size = group_size[0]
That worked perfect for my case