SQL multiple table conditional retrieving statement - python

I'm trying to retrieve data from an sqlite table conditional on another sqlite table within the same database. The formats are the following:
master table
--------------------------------------------------
| id | TypeA | TypeB | ...
--------------------------------------------------
| 2020/01/01 | ID_0-0 |ID_0-1 | ...
--------------------------------------------------
child table
--------------------------------------------------
| id | Attr1 | Attr2 | ...
--------------------------------------------------
| ID_0-0 | 112.04 |-3.45 | ...
--------------------------------------------------
I want to write an sqlite3 query that takes:
A date D present in master.id
A list of types present in master's columns
A list of attributes present in child's columns
and returns a dataframe with rows the types and columns the attributes. Of course, I could just read the tables into pandas and do the work there, but I think it would be more computationally intensive and I want to learn more SQL syntax!
UPDATE
So far I've tried:
"""SELECT *
FROM child
WHERE EXISTS
(SELECT *
FROM master
WHERE master.TypeX = Type_input
WHERE master.dateX = date_input)"""
and then concatenate the strings over all required TypeX and dateX and execute with:
cur.executescripts(script)

Related

Python using LOAD DATA LOCAL INFILE IGNORE 1 LINES to update existing table

I have loaded with succes large CSV files (with 1 header row) to a mysql table from Python with the command:
LOAD DATA LOCAL INFILE 'file.csv' INTO TABLE 'table" FIELDS TERMINATED BY ';' IGNORE 1 LINES (#vone, #vtwo, #vthree) SET DatumTijd = #vone, Debiet = NULLIF(#vtwo,''),Boven = NULLIF(#vthree,'')
The file contains historic data back to 1970. Every month I get an update with roughly 4320 rows that need to be added to the existing table. Sometimes there is an overlap with the existing table, so I would like to use REPLACE. But this does not seem to work in combination with IGNORE 1 LINES. The primary key is DatumTijd, which follows the mysql datetime format.
I tried several combinations of REPLACE and IGNORE in different order, before the INTO TABLE "table" and behind FIELDS TERMINATED part.
Any suggestions how to solve this?
Apart from the possible typo of enclosing the table name in single quotes rather than backticks the load statement works fine on my windows device given the following data
one;two;three
2023-01-01;1;1
2023-01-02;2;2
2023-01-01;3;3
2022-01-04;;;
note I prefer coalesce to nullif and have included an auto_increment id to demonstrate what replace actually does , ie delete and insert.
drop table if exists t;
create table t(
id int auto_increment primary key,
DatumTijd date,
Debiet varchar(1),
boven varchar(1),
unique index key1(DatumTijd)
);
LOAD DATA INFILE 'C:\\Program Files\\MariaDB 10.1\\data\\sandbox\\data.txt'
replace INTO TABLE t
FIELDS TERMINATED BY ';'
IGNORE 1 LINES (#vone, #vtwo, #vthree)
SET DatumTijd = #vone,
Debiet = coalesce(#vtwo,''),
Boven = coalesce(#vthree,'')
;
select * from t;
+----+------------+--------+-------+
| id | DatumTijd | Debiet | boven |
+----+------------+--------+-------+
| 2 | 2023-01-02 | 2 | 2 |
| 3 | 2023-01-01 | 3 | 3 |
| 4 | 2022-01-04 | | |
+----+------------+--------+-------+
3 rows in set (0.001 sec)
It should not matter that replace in effect creates a new record but if it does to you consider loading to a staging table then insert..on duplicate key to target

Splitting a CSV table into SQL Tables with Foreign Keys

Say I have the following CSV file
Purchases.csv
+--------+----------+
| Client | Item |
+--------+----------+
| Mark | Computer |
| Mark | Lamp |
| John | Computer |
+--------+----------+
What is the best practice, in Python, to split this table into two separate tables and join them in a bridge table using foreign key, i.e.
Client table
+----------+--------+
| ClientID | Client |
+----------+--------+
| 1 | Mark |
| 2 | John |
+----------+--------+
Item table
+--------+----------+
| ItemID | Item |
+--------+----------+
| 1 | Computer |
| 2 | Lamp |
+--------+----------+
Item Client Bridge Table
+----------+--------+
| ClientID | ItemID |
+----------+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
+----------+--------+
I should mention here that it possible for records to already exist in the tables, i.e., if the Client Name in the CSV already has an assigned ID in the Client Table, this ID should be used in the Bridge table. This is because I have to do a one-time batch upload of a million line of data, and then insert a few thousands line of data daily.
I have also already created the tables, they are in the database, just empty at the moment
You would do this in the database (or via database commands in Python). The data never needs to be loaded into Python.
Load the purchases.csv table into a staging table in the database. Then be sure you have your tables defined:
create table clients (
clientId int generated always as identity primary key,
client varchar(255)
);
create table items (
itemId int generated always as identity primary key,
item varchar(255)
);
create table clientItems (
clientItemId int generated always as identity primary key,
clientId int references clients(clientId),
itemId int references items(itemId)
);
Note that the exact syntax for these depends on the database. Then load the tables:
insert into clients (client)
select distinct s.client
from staging s
where not exists (select 1 from clients c where c.client = s.client);
insert into items (item)
select distinct s.item
from staging s
where not exists (select 1 from items i where i.item = s.item);
I'm not sure if you need to take duplicates into account for ClientItems:
insert into ClientItems (clientId, itemId)
select c.clientId, i.itemId
from staging s join
clients c
on s.client = c.client join
items i
on s.item = i.item;
If you need to prevent duplicates here, then:
where not exists (select 1
from clientitems ci join
clients c
on c.clientid = ci.clientid join
items i
on i.itemid = ci.itemid
where c.client = s.client and i.item = s.item
);

How do I use SQL like I use Python by duplicating rows while changing one value in that row where that value is derived from a different column?

I'm at an impasse (not being any better than a novice in SQL). I want to build a POC in SQL that will do something that I have already, very easily, done in Python. Here's the meat-
TEXT | PID | PATCHES
------------------------------------------------------
'abc' | 123456 | '839264, 129361, 927301'
'def' | 234567 | '736492\n 928537'
'ghi' | 284554 | 'Text here #490238Text here #110299'
'jkl' | 776493 | '290389'
What I need is the following-
TEXT | PID
--------------
'abc' | 123456
'abc' | 839264
'abc' | 129361
'abc' | 927301
'def' | 234567
'def' | 736492
'def' | 928537
'ghi' | 284554
'ghi' | 490238
'ghi' | 110299
'jkl' | 776493
'jkl' | 290389
As you can see, I've copied the values in TEXT for each value that pops up in the PATCHES column while maintaining that the PID values are still in the new table.
In Python you can use split() and find() and append to a list, etc. but this is wayyy over my head right now.
Any help is appreciated. Thank you! By the way, feel free to tell me that it's best done in Python and I will gladly forward that information to my team.
you can do the following:
Create A Temporary Table
Put all the values from main table to temporary table (With Identity(1,1))
Use a loop, upto the length of the 'temp' table
Pass every value/row into a stored procedure, and perform your operations.
4.1 You can perform String_Split in SQL Server, by changing database comparability to 130.
4.2 You can return result after performing operations.
Here, is sample demonstration:
Create Table #Temp
(
Entity Int Identity(1,1),
Text VarChar(3),
Strings VarChar(100)
)
Insert #Temp
Select Text, Pid+ '-'+Patches From dbo.Errors
--Select * From dbo.#Temp;
Declare #Count As Int = (Select Count(*) From dbo.Errors)
Declare #i as Int = 1
While #i <= #Count
Begin
Declare #Text As VarChar(3), #Strings As VarChar(100)
--select value one by one, and pass it into a stored procedure
Select #Text = Text, #Strings = Strings From dbo.#Temp Where Entity = #i;
Exec dbo.ErrorsProc #Text, #Strings
Set #i = #i +1
End

Trouble with SQL join, where, having clause

I'm having trouble understanding how to make a query that will show me 'the three most popular articles' in terms of views ('Status: 200 OK').
There are 2 tables I'm currently dealing with.
A Log table
An Articles table
The columns in these tables:
Table "public.log"
Column | Type | Modifiers
--------+--------------------------+--------------------------------------------------
path | text |
ip | inet |
method | text |
status | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('log_id_seq'::regclass)
Indexes:
and
Table "public.articles"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
author | integer | not null
title | text | not null
slug | text | not null
lead | text |
body | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('articles_id_seq'::regclass)
Indexes:
.
So far, I've written this query based on my level and current understanding of SQL...
SELECT articles.title, log.status
FROM articles join log
WHERE articles.title = log.path
HAVING status = “200 OK”
GROUP BY title, status
Obviously, this is incorrect. I want to be able to pull the three most popular articles from the database and I know that 'matching' the 200 OK's with the "article title" will show or count in for me one "view" or hit. My thought process is like, I need to determine how many times that article.title=log.path (1 unique) shows up in the log database (with a status of 200 OK) by creating a query. My assignment is actually to write a program that will print the results with "[my code getting] the database to do the heavy lifting by using joins, aggregations, and the where clause.. doing minimal "post-processing" in the Python code itself."
Any explanation, idea, a tip is appreciated all of StackOverflow...
Perhaps the following is what you have in mind:
SELECT
a.title,
COUNT(*) AS cnt
FROM articles a
INNER JOIN log l
ON a.title = l.path
WHERE
l.lstatus = '200 OK'
GROUP BY
a.title
ORDER BY
COUNT(*) DESC
LIMIT 3;
This would return the three article titles having the highest status 200 hit counts. This answer assumes that you are using MySQL.

SQLite: Transposing results of a GROUP BY and filling in IDs with names

My question is rather specific, if you have a better title please suggest one. Also, formatting is bad - didn't know how to combine lists and codeblocks.
I have an SQLite3 database with the following (relevant parts of the) .schema:
CREATE TABLE users (id INTEGER PRIMARY KEY NOT NULL, user TEXT UNIQUE);
CREATE TABLE locations (id INTEGER PRIMARY KEY NOT NULL, name TEXT UNIQUE);
CREATE TABLE purchases (location_id INTEGER, user_id INTEGER);
CREATE TABLE sales (location_id integer, user_id INTEGER);
purchases has about 4.5mil entries, users about 300k, sales about 100k, and locations about 250 - just to gauge the data volume.
My desired use would be to generate a JSON object to be handed off to another application, very much condensed in volume by doing the following:
-GROUPing both purchases and sales into one common table BY location_id,user_id - IOW, getting the number of "actions" per user per location. That I can do, result is something like
loc | usid | loccount
-----------------------
1 | 1246 | 123
1 | 2345 | 1
13 | 1246 | 46
13 | 8732 | 4
27 | 2345 | 41
(At least it looks good, always hard to tell with such volumes; query:
select location_id,user_id,count(location_id) from
(select location_id,user_id from purchases
union all
select location_id,user_id from sales)
group by location_id,user_id order by user_id`
)
-Then, transposing that giant table such that I would get:
usid | loc1 | loc13 | loc27
---------------------------
1246 | 123 | 46 | 0
2345 | 1 | 0 | 41
8732 | 0 | 4 | 0
That I cannot do, and it's my absolutely crucial point for this question. I tried some things I found online, especially here, but I just started SQLite a little while ago and don't understand many queries.
-Lastly, translate the table into plain text in order to write it to JSON:
user | AAAA | BBBBB | CCCCC
---------------------------
zeta | 123 | 46 | 0
beta | 1 | 0 | 41
iota | 0 | 4 | 0
That I probably could do with quite a bit of experimentation and inner join, although I'm always very unsure what way is the best approach to handle such data volumes, hence I wouldn't mind a pointer.
The whole thing is written in Python's sqlite3 interface, if it matters. In the end, I'd love to have something I could just do a "for" loop per user over in order to generate the JSON, which would then of course be very simple. It doesn't matter if the query takes a long time (<10min would be nice), it's only run twice per day as a sort of backup. I've only got a tiny VPS available, but being limited to a single core the performance is as good as on my reasonably powerful desktop. (i5-3570k running Debian.)
The table headers are just examples because I wasn't quite sure if I could use integers for them (didn't discover the syntax if so), as long as I'm somehow able to look up the numeric part in the locations table I'm fine. Same for translating the user IDs into names. The number of columns is known beforehand - they're after all just INTEGER PRIMARY KEYs and I have a list() of them from some other operation. The number of rows can be determined reasonably quickly, ~3s, if need be.
Consider using subqueries to achieve your desired transposed output:
SELECT DISTINCT m.usid,
IFNULL((SELECT t1.loccount FROM tablename t1
WHERE t1.usid = m.usid AND t1.loc=1),0) AS Loc1,
IFNULL((SELECT t2.loccount FROM tablename t2
WHERE t2.usid = m.usid AND t2.loc=13),0) AS Loc13,
IFNULL((SELECT t3.loccount FROM tablename t3
WHERE t3.usid = m.usid AND t3.loc=27),0) AS Loc27
FROM tablename As m
Alternatively, you can use nested IF statements (or in the case of SQLite that uses CASE/WHEN) as derived table:
SELECT temp.usid, Max(temp.loc1) As Loc1,
Max(temp.loc13) As Loc13, Max(temp.loc27) As Loc27
FROM
(SELECT tablename.usid,
CASE WHEN loc=1 THEN loccount ELSE 0 As Loc1 END,
CASE WHEN loc=13 THEN loccount ELSE 0 As Loc13 END,
CASE WHEN loc=27 THEN loccount ELSE 0 As Loc27 END
FROM tablename) AS temp
GROUP BY temp.usid

Categories

Resources