Python Regex applying split to a capturing group - python

I'm trying to get create statement from ctas statement by adding limit 0 to all tables as I don't want to load the table. Using python for the same
Input :
create table new_table as select
mytable.column1, mytable2.column2
from schema.mytable left join mytable2 ;
Expected Output:
create table new_table IF NOT EXISTS as select
mytable.column1, mytable2.column2
from (select * from mytable limit 0) mytable join (select * from mytable2 limit 0) mytable2
I have to replace all the tables in from and join clause to (select * from tablename limit 0) and alias.
However I'm only able to generate below, not able to get the table name and add it as alias. Also not able to change the last name in the join clause. If the input has alias explicitly mentioned, I'm able to generate it. I'm very new to using regex and feel very overwhelmed. Appreciate support from the experts here.
Output obtained:
create table new_table as select
mytable.column1, mytable2.column2
from (select * from schema.mytable limit 0) join mytable2 ;
Code I tried: (First tried to capture if there's already alias and put it in capture group 4. I would like to generate alias when tables do not have alias explicitly mentioned. Capture group 2 would get the schema_name.table_name. If I can use python function split to the capturing group. Also the last table in the sql I'm not able to translate
import re
sql = """
create table new_table as select
mytable.column1, mytable2.column2
from schema.mytable left join mytable2 ;"""
rgxsubtable = re.compile(r"\b((?:from|join)\s+)([\w.\"]+)([\)\s]+)(\bleft|on|cross|join|inner\b)",re.MULTILINE|re.IGNORECASE) # look for table names in from and join clauses
rgxalias = re.compile(r"\b((?:from|join)\s+)([\w.\"]+)(\s+)(\b(?!left|on|cross|join|inner\b)\w+)",re.MULTILINE|re.IGNORECASE) # look for table names in from and join clauses but with aliases
sql = rgxalias.sub(r"\1 (select * from \2 limit 0) \4 ", sql)
sql = rgxsubtable.sub(r"\1 (select * from \2 limit 0) ", sql)

Related

How to update a column based on counts from another table?

I need to update the table actor, column numCharacters, depending on how many times each actor's actorID shows up on the characters table.
I have the following code:
cursor = connection.cursor()
statement = 'UPDATE actor SET numCharacters = (SELECT count(*) FROM characters GROUP BY actorID)';
cursor.execute(statement);
connection.commit()
Does anyone know how I could complete it?
I think your problem comes from your sub query will return multiple row, so the update statement won't know which row to update. Try updating your query to this:
UPDATE actor a
SET a.numCharacter = (
SELECT count(*)
from characters c
WHERE actorId = a.id
Group by actorId
);
db fiddle link

Pyspark SQL WHERE NOT IN?

I'm trying to find everything that is in nodes2 but not nodes1.
spark.sql("""
SELECT COUNT(*) FROM
(SELECT * FROM nodes2 WHERE NOT IN
(SELECT * FROM nodes1))
""").show()
Getting the following error:
"cannot resolve 'NOT' given input columns: [nodes2.~id, nodes2.~label];
Is it possible to do this sort of set difference operation in Pyspark?
Matching single column with NOT IN:
Do you need to define some columns with where? which you trying to match for NOT operator?
If that is the case, then, for example, you want to check id
spark.sql("""
SELECT COUNT(*) FROM
(SELECT * FROM nodes2 WHERE id NOT IN
(SELECT id FROM nodes1))
""").show()
Matching multiple columns (or complete row) with NOT IN:
Or if you really want to match complete row (all columns), use something like concat on all columns to match
spark.sql("""
SELECT COUNT(*) FROM
(SELECT * FROM nodes2 as WHERE CONCAT(id,label) NOT IN (SELECT CONCAT(id,label) FROM nodes1))
""").show()
or with alias
spark.sql("""
SELECT COUNT(*) FROM
(SELECT * FROM nodes2 n2 as WHERE CONCAT(n2.id,n2.label) NOT IN (SELECT CONCAT(n1.id,n1.label) FROM nodes1 n1))
""").show()

How to use an SQL query with join on the result of another SQL query executed before?

I am trying to use an SQL query on the result of a previous SQL query but I'm not able to.
I am creating a python script and using postgresql.
I have 3 tables from which I need to match different columns and join the data but using only 2 tables at a time.
For example:
I have table1 where I have a codecolumn and there is a same column of codes in table2
Now I am matching the values of both the columns and joining a column 'area' from table 2 which corresponds to codes and a column 'pincode' from table1.
For this I used the following query which is working:
'''
select
table1.code,table2.code,table2.area,table1.pincode
from
table1 left join table2
ON
table1.code=table2.code
order by table1.row_num '''
I am getting the result but in this data there is some data in which the area value is returned as None
Wherever I am getting the area as None when matching code columns, I need to use the pincode column in table1 and pincode column in table3 to again find the corresponding area from table3.area.
So I used the following Query:
'''
select
table1.code,table3.area,table1.pincode
from
table1 left join table3
ON
table1.pincode=table3.pincode
IN (
select
table1.code,table2.code,table2.area,table1.pincode
from
table1 left join table2
ON
table1.code=table2.code
where table2.area is NULL
order by table1.row_num '''
and I got the following error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) subquery has too many columns
My python code is as follows:
import psycopg2
from sqlalchemy import create_engine
engine=create_engine('postgresql+psycopg2://credentials')
conn=engine.connect()
query = '''
select
table1.code,table2.code,table2.area,table1.pincode
from
table1 left join table2
ON
table1.code=table2.code
order by table1.row_num '''
area=conn.execute(query)
area_x=area.fetchall()
for i in area_x:
print(i)
query2 = select
table1.code,table3.area,table1.pincode
from
table1 left join table3
ON
table1.pincode=table3.pincode
IN (
select
table1.code,table2.code,table2.area,table1.pincode
from
table1 left join table2
ON
table1.code=table2.code
where table2.area is NULL
order by table1.row_num '''
area=conn.execute(query2)
area_x=area.fetchall()
for i in area_x:
print(i)
This is how my first query is returning the data:
Wherever I am not able to match the code columns I get None value in area column from table 2 and whenever the area value is None I have to apply another query to find this data
Now i have to match data in table1.pincode with data in table3.pincode to find table3.area and replace the None value with table3.area
These are the 2 ways to find the area
The desired result should be:
What could be the correct solution??
Thank You
it looks that your query2 needs a where clause and the subquery, as per error message should be reduced to the column you are trying to pass to the outer query. query2 should be something like this:
query2 =
select
table1.code,table3.area,table1.pincode
from
table1 left join table3
ON
table1.pincode=table3.pincode
WHERE table1.code
IN (
select
table1.code
from
table1 left join table2
ON
table1.code=table2.code
where table2.area is NULL

Declare a variable in oracle in a python SQL query

I am porting code from SQL Server to Oracle.
DECLARE #DispatchCount int,
#HeadCount int;
SELECT #DispatchCount = (
select 200 -- business query would replace sample
);
select #HeadCount = (
select 50 -- business query would replace sample
);
select #DispatchCount / #HeadCount;
I tried oracle declare syntax.
DECLARE
head_count INTEGER;
BEGIN
select 100 as DUMMY into head_count from dual;
dbms_output.put_line(head_count);
END;
I am querying a read only oracle 11g database in python from cx_oracle:
cursor.execute(sql)
rows = cursor.fetchall()
This throws the error: 'not a query' on the fetchall statement.
Is there a way to declare variables in oracle that will work in a SQL query?
In oracle, One way is to use the CTE as follows:
with cte as
(select 200 as dispatchcount, 50 as headcount from dual)
select dispatchcount/headcount from cte
If you want both variables in different table then use the multiple cte as follows:
with cte1 as
(select 200 as dispatchcount from dual),
cte2 as (select 50 as headcount from dual)
select dispatchcount/headcount from cte1 , cte2
Yes, you can use prepared SQL statement along with dummy dual table, including placeholders for parameters those are stated within a tuple, through use of cursor.fetchone() considering your case needs to return only one row such as
DispatchCount=200
HeadCount=50
sql = """
SELECT :1 / :2
FROM dual
"""
cursor.execute(sql,(DispatchCount,HeadCount,))
rows = cursor.fetchone()
print(rows[0])
For a table with multiple rows you can replace fetchone() with fetchall() too.

Using an Alias for 2 Queries - SQLite [duplicate]

I am trying to INSERT INTO a table using the input from another table. Although this is entirely feasible for many database engines, I always seem to struggle to remember the correct syntax for the SQL engine of the day (MySQL, Oracle, SQL Server, Informix, and DB2).
Is there a silver-bullet syntax coming from an SQL standard (for example, SQL-92) that would allow me to insert the values without worrying about the underlying database?
Try:
INSERT INTO table1 ( column1 )
SELECT col1
FROM table2
This is standard ANSI SQL and should work on any DBMS
It definitely works for:
Oracle
MS SQL Server
MySQL
Postgres
SQLite v3
Teradata
DB2
Sybase
Vertica
HSQLDB
H2
AWS RedShift
SAP HANA
Google Spanner
Claude Houle's answer: should work fine, and you can also have multiple columns and other data as well:
INSERT INTO table1 ( column1, column2, someInt, someVarChar )
SELECT table2.column1, table2.column2, 8, 'some string etc.'
FROM table2
WHERE table2.ID = 7;
I've only used this syntax with Access, SQL 2000/2005/Express, MySQL, and PostgreSQL, so those should be covered. It should also work with SQLite3.
To get only one value in a multi value INSERT from another table I did the following in SQLite3:
INSERT INTO column_1 ( val_1, val_from_other_table )
VALUES('val_1', (SELECT val_2 FROM table_2 WHERE val_2 = something))
Both the answers I see work fine in Informix specifically, and are basically standard SQL. That is, the notation:
INSERT INTO target_table[(<column-list>)] SELECT ... FROM ...;
works fine with Informix and, I would expect, all the DBMS. (Once upon 5 or more years ago, this is the sort of thing that MySQL did not always support; it now has decent support for this sort of standard SQL syntax and, AFAIK, it would work OK on this notation.) The column list is optional but indicates the target columns in sequence, so the first column of the result of the SELECT will go into the first listed column, etc. In the absence of the column list, the first column of the result of the SELECT goes into the first column of the target table.
What can be different between systems is the notation used to identify tables in different databases - the standard has nothing to say about inter-database (let alone inter-DBMS) operations. With Informix, you can use the following notation to identify a table:
[dbase[#server]:][owner.]table
That is, you may specify a database, optionally identifying the server that hosts that database if it is not in the current server, followed by an optional owner, dot, and finally the actual table name. The SQL standard uses the term schema for what Informix calls the owner. Thus, in Informix, any of the following notations could identify a table:
table
"owner".table
dbase:table
dbase:owner.table
dbase#server:table
dbase#server:owner.table
The owner in general does not need to be quoted; however, if you do use quotes, you need to get the owner name spelled correctly - it becomes case-sensitive. That is:
someone.table
"someone".table
SOMEONE.table
all identify the same table. With Informix, there's a mild complication with MODE ANSI databases, where owner names are generally converted to upper-case (informix is the exception). That is, in a MODE ANSI database (not commonly used), you could write:
CREATE TABLE someone.table ( ... )
and the owner name in the system catalog would be "SOMEONE", rather than 'someone'. If you enclose the owner name in double quotes, it acts like a delimited identifier. With standard SQL, delimited identifiers can be used many places. With Informix, you can use them only around owner names -- in other contexts, Informix treats both single-quoted and double-quoted strings as strings, rather than separating single-quoted strings as strings and double-quoted strings as delimited identifiers. (Of course, just for completeness, there is an environment variable, DELIMIDENT, that can be set - to any value, but Y is safest - to indicate that double quotes always surround delimited identifiers and single quotes always surround strings.)
Note that MS SQL Server manages to use [delimited identifiers] enclosed in square brackets. It looks weird to me, and is certainly not part of the SQL standard.
Two approaches for insert into with select sub-query.
With SELECT subquery returning results with One row.
With SELECT subquery returning results with Multiple rows.
1. Approach for With SELECT subquery returning results with one row.
INSERT INTO <table_name> (<field1>, <field2>, <field3>)
VALUES ('DUMMY1', (SELECT <field> FROM <table_name> ),'DUMMY2');
In this case, it assumes SELECT Sub-query returns only one row of result based on WHERE condition or SQL aggregate functions like SUM, MAX, AVG etc. Otherwise it will throw error
2. Approach for With SELECT subquery returning results with multiple rows.
INSERT INTO <table_name> (<field1>, <field2>, <field3>)
SELECT 'DUMMY1', <field>, 'DUMMY2' FROM <table_name>;
The second approach will work for both the cases.
To add something in the first answer, when we want only few records from another table (in this example only one):
INSERT INTO TABLE1
(COLUMN1, COLUMN2, COLUMN3, COLUMN4)
VALUES (value1, value2,
(SELECT COLUMN_TABLE2
FROM TABLE2
WHERE COLUMN_TABLE2 like "blabla"),
value4);
Instead of VALUES part of INSERT query, just use SELECT query as below.
INSERT INTO table1 ( column1 , 2, 3... )
SELECT col1, 2, 3... FROM table2
Most of the databases follow the basic syntax,
INSERT INTO TABLE_NAME
SELECT COL1, COL2 ...
FROM TABLE_YOU_NEED_TO_TAKE_FROM
;
Every database I have used follow this syntax namely, DB2, SQL Server, MY SQL, PostgresQL
This can be done without specifying the columns in the INSERT INTO part if you are supplying values for all columns in the SELECT part.
Let's say table1 has two columns. This query should work:
INSERT INTO table1
SELECT col1, col2
FROM table2
This WOULD NOT work (value for col2 is not specified):
INSERT INTO table1
SELECT col1
FROM table2
I'm using MS SQL Server. I don't know how other RDMS work.
This is another example using values with select:
INSERT INTO table1(desc, id, email)
SELECT "Hello World", 3, email FROM table2 WHERE ...
Just use parenthesis for SELECT clause into INSERT. For example like this :
INSERT INTO Table1 (col1, col2, your_desired_value_from_select_clause, col3)
VALUES (
'col1_value',
'col2_value',
(SELECT col_Table2 FROM Table2 WHERE IdTable2 = 'your_satisfied_value_for_col_Table2_selected'),
'col3_value'
);
Simple insertion when table column sequence is known:
Insert into Table1
values(1,2,...)
Simple insertion mentioning column:
Insert into Table1(col2,col4)
values(1,2)
Bulk insertion when number of selected columns of a table(#table2) are equal to insertion table(Table1)
Insert into Table1 {Column sequence}
Select * -- column sequence should be same.
from #table2
Bulk insertion when you want to insert only into desired column of a table(table1):
Insert into Table1 (Column1,Column2 ....Desired Column from Table1)
Select Column1,Column2..desired column from #table2
from #table2
Here is another example where source is taken using more than one table:
INSERT INTO cesc_pf_stmt_ext_wrk(
PF_EMP_CODE ,
PF_DEPT_CODE ,
PF_SEC_CODE ,
PF_PROL_NO ,
PF_FM_SEQ ,
PF_SEQ_NO ,
PF_SEP_TAG ,
PF_SOURCE)
SELECT
PFl_EMP_CODE ,
PFl_DEPT_CODE ,
PFl_SEC ,
PFl_PROL_NO ,
PF_FM_SEQ ,
PF_SEQ_NO ,
PFl_SEP_TAG ,
PF_SOURCE
FROM cesc_pf_stmt_ext,
cesc_pfl_emp_master
WHERE pfl_sep_tag LIKE '0'
AND pfl_emp_code=pf_emp_code(+);
COMMIT;
Here's how to insert from multiple tables. This particular example is where you have a mapping table in a many to many scenario:
insert into StudentCourseMap (StudentId, CourseId)
SELECT Student.Id, Course.Id FROM Student, Course
WHERE Student.Name = 'Paddy Murphy' AND Course.Name = 'Basket weaving for beginners'
(I realise matching on the student name might return more than one value but you get the idea. Matching on something other than an Id is necessary when the Id is an Identity column and is unknown.)
You could try this if you want to insert all column using SELECT * INTO table.
SELECT *
INTO Table2
FROM Table1;
I actually prefer the following in SQL Server 2008:
SELECT Table1.Column1, Table1.Column2, Table2.Column1, Table2.Column2, 'Some String' AS SomeString, 8 AS SomeInt
INTO Table3
FROM Table1 INNER JOIN Table2 ON Table1.Column1 = Table2.Column3
It eliminates the step of adding the Insert () set, and you just select which values go in the table.
This worked for me:
insert into table1 select * from table2
The sentence is a bit different from Oracle's.
INSERT INTO yourtable
SELECT fielda, fieldb, fieldc
FROM donortable;
This works on all DBMS
For Microsoft SQL Server, I will recommend learning to interpret the SYNTAX provided on MSDN. With Google it's easier than ever, to look for syntax.
For this particular case, try
Google: insert site:microsoft.com
The first result will be http://msdn.microsoft.com/en-us/library/ms174335.aspx
scroll down to the example ("Using the SELECT and EXECUTE options to insert data from other tables") if you find it difficult to interpret the syntax given at the top of the page.
[ WITH <common_table_expression> [ ,...n ] ]
INSERT
{
[ TOP ( expression ) [ PERCENT ] ]
[ INTO ]
{ <object> | rowset_function_limited
[ WITH ( <Table_Hint_Limited> [ ...n ] ) ]
}
{
[ ( column_list ) ]
[ <OUTPUT Clause> ]
{ VALUES ( { DEFAULT | NULL | expression } [ ,...n ] ) [ ,...n ]
| derived_table <<<<------- Look here ------------------------
| execute_statement <<<<------- Look here ------------------------
| <dml_table_source> <<<<------- Look here ------------------------
| DEFAULT VALUES
}
}
}
[;]
This should be applicable for any other RDBMS available there. There is no point in remembering all the syntax for all products IMO.
INSERT INTO FIRST_TABLE_NAME (COLUMN_NAME)
SELECT COLUMN_NAME
FROM ANOTHER_TABLE_NAME
WHERE CONDITION;
Best way to insert multiple records from any other tables.
INSERT INTO dbo.Users
( UserID ,
Full_Name ,
Login_Name ,
Password
)
SELECT UserID ,
Full_Name ,
Login_Name ,
Password
FROM Users_Table
(INNER JOIN / LEFT JOIN ...)
(WHERE CONDITION...)
(OTHER CLAUSE)
select *
into tmp
from orders
Looks nice, but works only if tmp doesn't exists (creates it and fills). (SQL sever)
To insert into existing tmp table:
set identity_insert tmp on
insert tmp
([OrderID]
,[CustomerID]
,[EmployeeID]
,[OrderDate]
,[RequiredDate]
,[ShippedDate]
,[ShipVia]
,[Freight]
,[ShipName]
,[ShipAddress]
,[ShipCity]
,[ShipRegion]
,[ShipPostalCode]
,[ShipCountry] )
select * from orders
set identity_insert tmp off
IF you want to insert some data into a table without want to write column name.
INSERT INTO CUSTOMER_INFO
(SELECT CUSTOMER_NAME,
MOBILE_NO,
ADDRESS
FROM OWNER_INFO cm)
Where the tables are:
CUSTOMER_INFO || OWNER_INFO
----------------------------------------||-------------------------------------
CUSTOMER_NAME | MOBILE_NO | ADDRESS || CUSTOMER_NAME | MOBILE_NO | ADDRESS
--------------|-----------|--------- || --------------|-----------|---------
A | +1 | DC || B | +55 | RR
Result:
CUSTOMER_INFO || OWNER_INFO
----------------------------------------||-------------------------------------
CUSTOMER_NAME | MOBILE_NO | ADDRESS || CUSTOMER_NAME | MOBILE_NO | ADDRESS
--------------|-----------|--------- || --------------|-----------|---------
A | +1 | DC || B | +55 | RR
B | +55 | RR ||
If you go the INSERT VALUES route to insert multiple rows, make sure to delimit the VALUES into sets using parentheses, so:
INSERT INTO `receiving_table`
(id,
first_name,
last_name)
VALUES
(1002,'Charles','Babbage'),
(1003,'George', 'Boole'),
(1001,'Donald','Chamberlin'),
(1004,'Alan','Turing'),
(1005,'My','Widenius');
Otherwise MySQL objects that "Column count doesn't match value count at row 1", and you end up writing a trivial post when you finally figure out what to do about it.
If you create table firstly you can use like this;
select * INTO TableYedek From Table
This metot insert values but differently with creating new copy table.
In informix it works as Claude said:
INSERT INTO table (column1, column2)
VALUES (value1, value2);
Postgres supports next:
create table company.monitor2 as select * from company.monitor;

Categories

Resources