I'd like to insert an Order_ID to make each row unique using python and pyodbc to SQL Server.
Currently, my code is:
name = input("Your name")
def connectiontoSQL(order_id,name):
query = f'''\
insert into order (Order_ID, Name)
values('{order_id}','{name}')'''
return (execute_query_commit(conn,query))
If my table in SQL database is empty and I'd like it to add a order_ID by 1 every time I execute,
How should I code order_id in Python such that it will automatically create the first order_ID as OD001, and if I execute another time, it would create OD002?
You can create a INT Identity column as your primary key and add a computed column that has the order number that you display in your application.
create table Orders
(
[OrderId] [int] IDENTITY(0,1) NOT NULL,
[OrderNumber] as 'OD'+ right( '00000' + cast(OrderId as varchar(6)) , 6) ,
[OrderDate] date,
PRIMARY KEY CLUSTERED
(
[OrderId] ASC
)
)
Related
I need to parition a table on 2 columns, and insert records to an already existing partition of a Postgres table using Python (Psycopg2).
I am very new to Python and Postgres, hence struggling a bit with a challenging requirement. I searched the internet and found that Postgres does not support Partitioning by List on multiple columns.
I have 2 tables - "cust_details_curr" & "cust_details_hist". Both the tables will have the same structure. However the "_hist" table needs to be partitioned on 2 columns - 'area_code' and 'eff_date'
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
); -- Needs to be partitioned on area_code and eff_date
The "area_code" is passed as an argument to the process.
The column "eff_date" is supposed to contain the current process run
date.
There are multiple "area_codes" to be passed as an argument to the program - (there are 5 values - A501, A502, A503, A504, X101) all of which will run sequentially on the same day (i.e eff_date will be the same for all the runs).
The requirement is that whenever the "curr" table is being loaded for a specific "area_code", the program must first copy the data already existing in the "curr" table (for that specific area_code) into a partition of "eff_date" and that specific "area_code" of the "_hist" table. Next, the data pertaining to the same area_code in "curr" table must be deleted, and new data for that area_code will be loaded with the current process date in the eff_date column.
However, the process should run for 1 area_code at a time and hence the process will run for multiple area_codes on the same day. (which means they will all have eff_date = same current date)
So my question is -
how to partition the _hist table by 2 columns - area_code and
eff_date ?
Also, once a partition of the eff_date is created (assume 2022-08-01)
and loaded in the _hist table for one of the area_codes (assume
A501), the next job in the sequence will need to load the data for
another area_code (say A502) to load into the same eff_date partition
(since eff_date is same for both the process instances as they are
executed on the same day ) How can I insert data into the existing
partition ?
I devised the following (crude) way to handle the requirement when it was only for a single partition column - "eff_date". For that I would execute the sql queries below in order to somewhat implement the initial requirements for a single eff_date and area_code value.
However, I am struggling to figure out how to implement the same with multiple area_codes as a second partition column in the _hist table, And how to insert data into an already existing date partition (eff_dt), loaded by a previous area_code instance.
CREATE TABLE cust_details_curr
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
);
CREATE TABLE cust_details_hist
(
cust_id int,
area_code varchar(5),
cust_name varchar(20),
cust_age int
eff_date date
) PARTITIONED BY LIST (eff_dt); -- Partitioned by List
table_name = "cust_details_curr"
table_name_hist = table_name + '_hist'
e = datetime.now()
eff_date = e.strftime("%Y-%m-%d")
dttime = e.strftime("%Y%m%d_%H%M%S")
table_name_curr_part = table_name_part + '_' + str(dttime)
query_count = f"SELECT count(*) as cnt from {table_name} where area_code = '{area_code}'; "
query_date = f"SELECT distinct eff_date as eff_dt from {table_name} where area_code = '{area_code}';"
cur.execute(quey_date)
eff_date = cur.fetchone()[0]
query_crt = f"CREATE TABLE {table_name_curr_part} LIKE {table_name_part} INCLUDING DEFAULTS);"
query_ins_part = f"INSERT INTO {table_name_curr_part} SELECT * FROM {table_name} where area_code = '{area_code}' AND eff_dt = '{eff_date}';"
query_add_part = f"ALTER TABLE {table_name_part} ATTACH PARTITION {table_name_curr_part} FOR VALUES IN (DATE '{eff_date}') ;"
query_del = f"DELETE FROM {table_name} WHERE area_code = '{area_code}';"
query_ins_curr = f"INSERT INTO {table_name} (cust_id, area_code, cust_name, cust_age, eff_dt) VALUES %s"
cur.execute(....)
# Program trimmed in the interest of space
Can anyone please help me how to implement a workaround for the above requirements with multiple partition columns. How can I load data to an already existing partition ?
Happy to provide additional information. Any help is appreciated.
I need to execute a SQL query that deletes the duplicated rows based on one column and keep the last record. Noting that it's a large table so Django ORM takes very long time so I need SQL query instead. the column name is customer_number and table name is pages_dataupload. I'm using sqlite.
Update: I tried this but it gives me no such column: row_num
cursor = connection.cursor()
cursor.execute(
'''WITH cte AS (
SELECT
id,
customer_number ,
ROW_NUMBER() OVER (
PARTITION BY
id,
customer_number
ORDER BY
id,
customer_number
) row_num
FROM
pages.dataupload
)
DELETE FROM pages_dataupload
WHERE row_num > 1;
'''
)
You can work with an Exists subquery [Django-doc] to determine efficiently if there is a younger DataUpload:
from django.db.models import Exists, OuterRef
DataUpload.objects.filter(Exists(
DataUpload.objects.filter(
pk__gt=OuterRef('pk'), customer_number=OuterRef('customer_number')
)
)).delete()
This will thus check for each DataUpload if there exists a DataUpload with a larger primary key that has the same customer_number. If that is the case, we will remove that DataUpload.
I have solved the problem with the below query, is there any way to reset the id field after removing the duplicate?
cursor = connection.cursor()
cursor.execute(
'''
DELETE FROM pages_dataupload WHERE id not in (
SELECT Max(id) FROM pages_dataupload Group By Dial
)
'''
)
How to create variable columns in a table according to the user input?
In other words, I have a table that contains the ID of students, but I need to create variable columns for weeks according to user's choice. For example if the number of weeks chosen by the user is 2 then we create a table like this
cur.execute("""CREATE TABLE Attendance
(
Week1 int,
Week2 int,
ID int primary key ,
)""")
You can just build the column defs as a string in Python:
num_weeks = 4
week_column_defs = ', '.join("Week{} int".format(week_num) for week_num in range(1, num_weeks+1))
command = """CREATE TABLE Attendance
(
{weeks} ,
ID int primary key ,
)""".format(weeks=week_column_defs)
cur.execute(command)
I am trying to add values to a 'pending application table'. This is what I have so far:
appdata = [(ID,UNIQUE_IDENTIFIER,(time.strftime("%d/%m/%Y"),self.amount_input.get(),self.why_input.get())]
self.c.execute('INSERT into Pending VALUES (?,?,?,?,?)', appdata)
self.conn.commit()
I need to set a value for 'UNIQUE_IDENTIFIER', which is a primary key in a sqlite database.
How can I generate a unquie number for this value?
CREATE TABLE Pending (
ID STRING REFERENCES StaffTable (ID),
PendindID STRING PRIMARY KEY,
RequestDate STRING,
Amount TEXT,
Reason TEXT
);
two ways to do that:
1-First
in python you can use uuid module example:
>>> import uuid
>>> str(uuid.uuid4()).replace('-','')
'5f202bf198e24242b6a11a569fd7f028'
note : a small chance to get the same str so check for object exist with the same primary key in the table before saving
this method uuid.uuid4() each time return new random
for example:
>>> ID=str(uuid.uuid4()).replace('-','')
>>>cursor.execute("SELECT * FROM Pending WHERE PendindID = ?", (ID,))
>>>if len(data)==0:
#then save new object as there is no row with the same id
else:
#create new ID
2-second
in sqlite3 make a composite primary key according to sqlite doc
CREATE TABLE Pending (
column1,
column2,
column3,
PRIMARY KEY (column1, column2)
);
Then make sure of uniqueness throw unique(column1, column2)
I have 6 tables in my SQLite database, each table with 6 columns(Date, user, NormalA, specialA, contact, remarks) and 1000+ rows.
How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row?
Assuming this is your model:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(DateTime)
user = Column(String)
# do not really care of columns other than `id` and `date`
# important here is the fact that `id` is a PK
following are two ways to delete you data:
Find duplicates, mark them for deletion and commit the transaction
Create a single SQL query which will perform deletion on the database directly.
For both of them a helper sub-query will be used:
# helper subquery: find first row (by primary key) for each unique date
subq = (
session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
.group_by(MyTable.date)
) .subquery('date_min_id')
Option-1: Find duplicates, mark them for deletion and commit the transaction
# query to find all duplicates
q_duplicates = (
session
.query(MyTable)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
)
for x in q_duplicates:
print("Will delete %s" % x)
session.delete(x)
session.commit()
Option-2: Create a single SQL query which will perform deletion on the database directly
sq = (
session
.query(MyTable.id)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
).subquery("subq")
dq = (
session
.query(MyTable)
.filter(MyTable.id.in_(sq))
).delete(synchronize_session=False)
Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates:
query = session.query(
MyTable
).\
having(func.count(MyTable.date) > 1).\
group_by(MyTable.date).all()
If you only want to show unique dates; distinct on is what you might need
While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL.
And since the records don't have a key, we need the row number (_ROWID_) to delete the targeted records and I don't think the API provides it.
So first we connect to the database:
from sqlalchemy import create_engine
db = create_engine(r'sqlite:///C:\temp\example.db')
eng = db.engine
Then to list all the records:
for row in eng.execute("SELECT * FROM TableA;") :
print row
And to display all the duplicated records where the dates are identical:
for row in eng.execute("""
SELECT * FROM {table}
WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
ORDER BY {field};
""".format(table="TableA", field="Date")) :
print row
Now that we identified all the duplicates, they probably need to be fixed if the other fields are different:
eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
eng.execute("UPDATE TableA SET NormalA=4, specialA=8 WHERE Date = '2015-18-12' ;");
And finnally to keep the first inserted record and delete the most recent duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
Or to keep the last inserted record and delete the other duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount