I need to parition a table on 2 columns, and insert records to an already existing partition of a Postgres table using Python (Psycopg2).
I am very new to Python and Postgres, hence struggling a bit with a challenging requirement. I searched the internet and found that Postgres does not support Partitioning by List on multiple columns.
I have 2 tables - "cust_details_curr" & "cust_details_hist". Both the tables will have the same structure. However the "_hist" table needs to be partitioned on 2 columns - 'area_code' and 'eff_date'
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
); -- Needs to be partitioned on area_code and eff_date
The "area_code" is passed as an argument to the process.
The column "eff_date" is supposed to contain the current process run
date.
There are multiple "area_codes" to be passed as an argument to the program - (there are 5 values - A501, A502, A503, A504, X101) all of which will run sequentially on the same day (i.e eff_date will be the same for all the runs).
The requirement is that whenever the "curr" table is being loaded for a specific "area_code", the program must first copy the data already existing in the "curr" table (for that specific area_code) into a partition of "eff_date" and that specific "area_code" of the "_hist" table. Next, the data pertaining to the same area_code in "curr" table must be deleted, and new data for that area_code will be loaded with the current process date in the eff_date column.
However, the process should run for 1 area_code at a time and hence the process will run for multiple area_codes on the same day. (which means they will all have eff_date = same current date)
So my question is -
how to partition the _hist table by 2 columns - area_code and
eff_date ?
Also, once a partition of the eff_date is created (assume 2022-08-01)
and loaded in the _hist table for one of the area_codes (assume
A501), the next job in the sequence will need to load the data for
another area_code (say A502) to load into the same eff_date partition
(since eff_date is same for both the process instances as they are
executed on the same day ) How can I insert data into the existing
partition ?
I devised the following (crude) way to handle the requirement when it was only for a single partition column - "eff_date". For that I would execute the sql queries below in order to somewhat implement the initial requirements for a single eff_date and area_code value.
However, I am struggling to figure out how to implement the same with multiple area_codes as a second partition column in the _hist table, And how to insert data into an already existing date partition (eff_dt), loaded by a previous area_code instance.
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
) PARTITIONED BY LIST (eff_dt); -- Partitioned by List
table_name = "cust_details_curr"
table_name_hist = table_name + '_hist'
e = datetime.now()
eff_date = e.strftime("%Y-%m-%d")
dttime = e.strftime("%Y%m%d_%H%M%S")
table_name_curr_part = table_name_part + '_' + str(dttime)
query_count = f"SELECT count(*) as cnt from {table_name} where area_code = '{area_code}'; "
query_date = f"SELECT distinct eff_date as eff_dt from {table_name} where area_code = '{area_code}';"
cur.execute(quey_date)
eff_date = cur.fetchone()[0]
query_crt = f"CREATE TABLE {table_name_curr_part} LIKE {table_name_part} INCLUDING DEFAULTS);"
query_ins_part = f"INSERT INTO {table_name_curr_part} SELECT * FROM {table_name} where area_code = '{area_code}' AND eff_dt = '{eff_date}';"
query_add_part = f"ALTER TABLE {table_name_part} ATTACH PARTITION {table_name_curr_part} FOR VALUES IN (DATE '{eff_date}') ;"
query_del = f"DELETE FROM {table_name} WHERE area_code = '{area_code}';"
query_ins_curr = f"INSERT INTO {table_name} (cust_id, area_code, cust_name, cust_age, eff_dt) VALUES %s"
cur.execute(....)
# Program trimmed in the interest of space
Can anyone please help me how to implement a workaround for the above requirements with multiple partition columns. How can I load data to an already existing partition ?
Happy to provide additional information. Any help is appreciated.
I'd like to insert an Order_ID to make each row unique using python and pyodbc to SQL Server.
Currently, my code is:
name = input("Your name")
def connectiontoSQL(order_id,name):
query = f'''\
insert into order (Order_ID, Name)
values('{order_id}','{name}')'''
return (execute_query_commit(conn,query))
If my table in SQL database is empty and I'd like it to add a order_ID by 1 every time I execute,
How should I code order_id in Python such that it will automatically create the first order_ID as OD001, and if I execute another time, it would create OD002?
You can create a INT Identity column as your primary key and add a computed column that has the order number that you display in your application.
create table Orders
(
[OrderId] [int] IDENTITY(0,1) NOT NULL,
[OrderNumber] as 'OD'+ right( '00000' + cast(OrderId as varchar(6)) , 6) ,
[OrderDate] date,
PRIMARY KEY CLUSTERED
(
[OrderId] ASC
)
)
I am learning python and trying to replicate what online tutorials do. I am trying to create a python desktop app where data is store in Postgresql. code is added below,
`cur.execute("CREATE TABLE IF NOT EXISTS book (id INTEGER PRIMARY KEY, title text, author text, year integer, isbn integer)")`
problem is with (id INTEGER PRIMARY KEY), when i execute the code its showing none in place of 1st index. i want to show numbers.
please help
this is for Python 3.7.3, psycopg2==2.8.3,
def connect():
conn=sqlite3.connect("books.db")
cur=conn.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS book (id INTEGER PRIMARY KEY,
title text, author text, year integer, isbn integer)")
conn.commit()
conn.close()
the result I am expecting is auto increment of numbers in 1st index where as presently it shows NONE.
below is the present and expected result again.
none title auther year isbn
01 title auther year isbn
Trying to use a Cursor execute will not work for a CREATE statement and hence the NONE. See below for an example.
Re Indexes :-
There will be no specific index as column_name INTEGER PRIMARY KEY is special in that it defines the column as an alias of the rowid column which is a special intrinsic index using the underlying B-tree storage engine.
When a row is inserted then if no value is specified for the column (e.g. INSERT INTO book (title,author, year, isbn) VALUES ('book1','The Author','1999','1234567890') then id will be 1 and typically (but not certainly) the next row inserted will have an id of 2 and so on.
If after adding some rows you use SELECT * FROM book, then the rows will be ordered according to the id as no other index is specified/used.
Perhaps have a look at Rowid Tables.
Example
Perhaps consider the following example :-
DROP TABLE IF EXISTS book;
CREATE TABLE IF NOT EXISTS book (id INTEGER PRIMARY KEY, title text, author text, year integer, isbn integer);
INSERT INTO book (title,author, year, isbn) VALUES
('book1','The Author','1999','1234567890'),
('book2','Author 2','1899','2234567890'),
('book3','Author 3','1799','3234567890')
;
INSERT INTO book VALUES (100,'book10','Author 10','1999','4234567890'); --<<<<<<<<<< specific ID
INSERT INTO book (title,author, year, isbn) VALUES
('book11','Author 11','1999','1234567890'),
('book12','Author 12','1899','2234567890'),
('book13','Author 13','1799','3234567890')
;
INSERT INTO book VALUES (10,'book10','Author 10','1999','4234567890'); --<<<<<<<<<< specific ID
SELECT * FROM book;
This :-
DROPs the book table (to make it easily re-runable)
CREATEs the book table.
INSERTs 3 books with the id not specified (typpical)
INSERTs a fourth book but with a specific id of 100
INSERTs another 3 books (not that these will be 101-103 as 100 is the highest id before the inserts)
INSERTs a last row BUT with a specific id of 10.
SELECTs all rows with all columns from the book table ordered, as no ORDER BY has been specified, according to the hidden index based upon the id. NOTE although id 10 was the last inserted it is the 4th row.
Result
In Python :-
conn = sqlite3.connect("books.db")
conn.execute("DROP TABLE IF EXISTS book")
conn.execute("CREATE TABLE IF NOT EXISTS book (id INTEGER PRIMARY KEY,title text, author text, year integer, isbn integer)")
conn.execute("INSERT INTO book (title,author, year, isbn) "
"VALUES('book1','The Author','1999','1234567890'), "
"('book2','Author 2','1899','2234567890'), "
"('book3','Author 3','1799','3234567890');")
conn.execute("INSERT INTO book VALUES (100,'book10','Author 10','1999','4234567890'); --<<<<<<<<<< specific ID")
conn.execute("INSERT INTO book (title,author, year, isbn) VALUES ('book11','Author 11','1999','1234567890'),('book12','Author 12','1899','2234567890'),('book13','Author 13','1799','3234567890');")
conn.execute("INSERT INTO book VALUES (10,'book10','Author 10','1999','4234567890'); --<<<<<<<<<< specific ID")
cur = conn.cursor()
cur.execute("SELECT * FROM book")
for each in cur:
print("{0:<20} {1:<20} {2:<20} {3:<20} {4:<20}".format(each[0],each[1],each[2],each[3],each[4]))
conn.commit()
conn.close()
The results in :-
1 book1 The Author 1999 1234567890
2 book2 Author 2 1899 2234567890
3 book3 Author 3 1799 3234567890
10 book10 Author 10 1999 4234567890
100 book10 Author 10 1999 4234567890
101 book11 Author 11 1999 1234567890
102 book12 Author 12 1899 2234567890
103 book13 Author 13 1799 3234567890
because you say I am inserting the data manually then instead of
def connect():
conn=sqlite3.connect("books.db")
cur=conn.cursor()
cur.execute("CREATE TABLE IF NOT EXISTS book (id INTEGER PRIMARY KEY,
title text, author text, year integer, isbn integer)")
conn.commit()
conn.close()
Try to use
def connect():
conn=sqlite3.connect("books.db")
cur=conn.cursor()
cur.execute("SELECT * FROM books")
for row in cur:
print(row[0],row[1],row[2],row[3],row[4])
conn.commit()
conn.close()
I have data format in tab separated
State:ca city:california population:1M
I want to create DB, when I do insert I should ignore "state:" , "city:" and "poulation" and I want to insert state into state database with population and city into city table with population.
There will be 2 tables then one with state and population the other with city and population
CREATE EXTERNAL TABLE IF NOT EXISTS CSP.original
(
st STRING COMMENT 'State',
ct STRING COMMENT 'City',
po STRING COMMENT 'Population'
)
COMMENT 'Original Table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
this didn't work. It added comment but it didn't ignore.
And I also I want to create 2 tables for state and city. Can anyone please help me?
You would have to create external table first.
Step1:
CREATE EXTERNAL TABLE all_info (state STRING, population INT) PARTITIONED BY (date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t;
Step2:
CREATE TABLE IF NOT EXISTS state (state string, population INT) PARTITIONED BY (date string);
CREATE TABLE IF NOT EXISTS city (city string, population INT) PARTITIONED BY (date string);
Step3:
INSERT OVERWRITE TABLE state
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
instr(state, ‘state:’) = 1;
INSERT OVERWRITE TABLE city
PARTITION (date = ‘201707076’)
SELECT *
FROM all_info
WHERE date = ‘20170706’ AND
instr(state, ‘city:’) = 1;
I am trying to add values to a 'pending application table'. This is what I have so far:
appdata = [(ID,UNIQUE_IDENTIFIER,(time.strftime("%d/%m/%Y"),self.amount_input.get(),self.why_input.get())]
self.c.execute('INSERT into Pending VALUES (?,?,?,?,?)', appdata)
self.conn.commit()
I need to set a value for 'UNIQUE_IDENTIFIER', which is a primary key in a sqlite database.
How can I generate a unquie number for this value?
CREATE TABLE Pending (
ID STRING REFERENCES StaffTable (ID),
PendindID STRING PRIMARY KEY,
RequestDate STRING,
Amount TEXT,
Reason TEXT
);
two ways to do that:
1-First
in python you can use uuid module example:
>>> import uuid
>>> str(uuid.uuid4()).replace('-','')
'5f202bf198e24242b6a11a569fd7f028'
note : a small chance to get the same str so check for object exist with the same primary key in the table before saving
this method uuid.uuid4() each time return new random
for example:
>>> ID=str(uuid.uuid4()).replace('-','')
>>>cursor.execute("SELECT * FROM Pending WHERE PendindID = ?", (ID,))
>>>if len(data)==0:
#then save new object as there is no row with the same id
else:
#create new ID
2-second
in sqlite3 make a composite primary key according to sqlite doc
CREATE TABLE Pending (
column1,
column2,
column3,
PRIMARY KEY (column1, column2)
);
Then make sure of uniqueness throw unique(column1, column2)