I am creating one table per user in my database and later storing data specific to that user. Since I have 100+ users, I was looking to automate the table creation process in my Python code.
Much like how I can automate a row insertion in a table, I tried to automate table insertion.
Row insertion code:
PAYLOAD_TEMPLATE = (
"INSERT INTO metadata "
"(to_date, customer_name, subdomain, internal_users)"
"VALUES (%s, %s, %s, %s)"
)
How I use it:
connection = mysql.connector.connect(**config)
cursor = connection.cursor()
# Opening csv table to feed data
with open('/csv-table-path', 'r') as weeklyInsight:
reader = csv.DictReader(weeklyInsight)
for dataDict in reader:
# Changing date to %m/%d/%Y format
to_date = dataDict['To'][:5] + "20" + dataDict['To'][5:]
payload_data = (
datetime.strptime(to_date, '%m/%d/%Y'),
dataDict['CustomerName'],
dataDict['Subdomain'],
dataDict['InternalUsers']
)
cursor.execute(PAYLOAD_TEMPLATE, payload_data)
How can I create a 'TABLE_TEMPLATE' that can be executed in a similar way to create a table?
I wish to create it such that I can execute the template code from my cursor after replacing certain fields with others.
TABLE_TEMPLATE = (
" CREATE TABLE '{customer_name}' (" # Change customer_name for new table
"'To' DATE NOT NULL,"
"'Users' INT(11) NOT NULL,"
"'Valid' VARCHAR(3) NOT NULL"
") ENGINE=InnoDB"
)
There is no technical¹ need to create a separate table for each client. It is simpler and cleaner to have a single table, e.g.
-- A simple users table; you probably already have something like this
create table users (
id integer not null auto_increment,
name varchar(50),
primary key (id)
);
create table weekly_numbers (
id integer not null auto_increment,
-- By referring to the id column of our users table we link each
-- row with a user
user_id integer references users(id),
`date` date not null,
user_count integer(11) not null,
primary key (id)
);
Let's add some sample data:
insert into users (id, name)
values (1, 'Kirk'),
(2, 'Picard');
insert into weekly_numbers (user_id, `date`, user_count)
values (1, '2017-06-13', 5),
(1, '2017-06-20', 7),
(2, '2017-06-13', 3),
(1, '2017-06-27', 10),
(2, '2017-06-27', 9),
(2, '2017-06-20', 12);
Now let's look at Captain Kirk's numbers:
select `date`, user_count
from weekly_numbers
-- By filtering on user_id we can see one user's numbers
where user_id = 1
order by `date` asc;
¹There may be business reasons to keep your users' data separate. A common use case would be isolating your clients' data, but in that case a separate database per client seems like a better fit.
Related
I'm comparing images between thousands of users who can have between 1 and 12 photos. The comparison between two photos returns a score that I need to be stored so I don't make the comparison twice.
What is the best way of storing it?
I thought about storing in a table with one photo per row/column but this can quickly get out of hand
Multi-indexed pandas if you want to work in memory, like so:
df = pd.DataFrame(index=[['Alice/foo.png'], ['Bob/bar.png']], columns=['user1', 'user2', 'score'], data=[['Alice', 'Bob', 42.0]])
df.index.names = ['photo1', 'photo2']
df
user1 user2 score
photo1 photo2
user1/foo.png user2/bar.png Alice Bob 42.0
SQLite if you want to work on disk, like so
import sqlite3
# The important part: defining the table
conn = sqlite3.Connection('photos.sqlite')
c = conn.cursor()
c.execute('CREATE TABLE IF NOT EXISTS photos (photoid INTEGER PRIMARY KEY, photopath TEXT UNIQUE, userid INT)')
c.execute('CREATE TABLE IF NOT EXISTS photoscores (photoid1 INTEGER, photoid2 INTEGER, userid1 INT, userid2 INT, score REAL)')
c.execute('CREATE UNIQUE INDEX photopair on photoscores (photoid1, photoid2)')
conn.commit()
# Example of populating the table
sql = ('INSERT INTO photos(photopath, userid) VALUES ("foo.png", 1)',
'INSERT INTO photos(photopath, userid) VALUES ("bar.png", 2)')
for statement in sql:
c.execute(statement)
conn.commit()
sql = 'SELECT photoid FROM PHOTOS'
c.execute(sql)
values = [v[0] for v in enumerate(c.fetchall())]
# This is more complicated than it needs to be, especially since
# values will always be sorted if this code is run, but I'm just
# emphasizing the need to keep the photo ids and user ids aligned
sql = ('INSERT OR REPLACE INTO photoscores VALUES (%s, %s, %s, %s, 42.0)'
% (tuple(sorted(values)) + ((1, 2) if values == sorted(values) else (2, 1))))
c.execute(sql)
conn.commit()
import pandas as pd
pd.read_sql('SELECT * FROM photoscores', conn)
photoid1 photoid2 userid1 userid2 score
0 0 1 1 2 42.0
If you use the SQL code, I'd suggest always sorting the pair of photo IDs you compare.
The key thing is that you want something with an underlying hash map that will quickly tell you whether an existing pair of photos has already been compared.
I'd like to insert an Order_ID to make each row unique using python and pyodbc to SQL Server.
Currently, my code is:
name = input("Your name")
def connectiontoSQL(order_id,name):
query = f'''\
insert into order (Order_ID, Name)
values('{order_id}','{name}')'''
return (execute_query_commit(conn,query))
If my table in SQL database is empty and I'd like it to add a order_ID by 1 every time I execute,
How should I code order_id in Python such that it will automatically create the first order_ID as OD001, and if I execute another time, it would create OD002?
You can create a INT Identity column as your primary key and add a computed column that has the order number that you display in your application.
create table Orders
(
[OrderId] [int] IDENTITY(0,1) NOT NULL,
[OrderNumber] as 'OD'+ right( '00000' + cast(OrderId as varchar(6)) , 6) ,
[OrderDate] date,
PRIMARY KEY CLUSTERED
(
[OrderId] ASC
)
)
Given the schema
CREATE TABLE `test` (
`name` VARCHAR(255) NOT NULL,
`text` TEXT NOT NULL,
PRIMARY KEY(`name`)
)
I would like to insert new data in such a way that if a given name exists, the name I am trying to insert is changed. I've checked the SQLite docs, and all I could find is INSERT OR REPLACE, which would change the text of the existing name instead of creating a new element.
The only solution I can think of is
def merge_or_edit(curr, *data_tuples):
SELECT = """SELECT COUNT(1) FROM `test` WHERE `name`=?"""
INSERT = """INSERT INTO `test` (`name`, `text`) VALUES (?, ?)"""
to_insert = []
for t in data_tuples:
while curr.execute(SELECT, (t[0],)).fetchone()[0] == 1:
t = (t[0] + "_", t[1])
to_insert.append(t)
curr.executemany(INSERT, to_insert)
But this solution is extremely slow for large sets of data (and will crash if the rename takes its name to more than 255 chars.)
What I would like to know is if this functionality is even possible using raw SQLite code.
I have 6 tables in my SQLite database, each table with 6 columns(Date, user, NormalA, specialA, contact, remarks) and 1000+ rows.
How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row?
Assuming this is your model:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(DateTime)
user = Column(String)
# do not really care of columns other than `id` and `date`
# important here is the fact that `id` is a PK
following are two ways to delete you data:
Find duplicates, mark them for deletion and commit the transaction
Create a single SQL query which will perform deletion on the database directly.
For both of them a helper sub-query will be used:
# helper subquery: find first row (by primary key) for each unique date
subq = (
session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
.group_by(MyTable.date)
) .subquery('date_min_id')
Option-1: Find duplicates, mark them for deletion and commit the transaction
# query to find all duplicates
q_duplicates = (
session
.query(MyTable)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
)
for x in q_duplicates:
print("Will delete %s" % x)
session.delete(x)
session.commit()
Option-2: Create a single SQL query which will perform deletion on the database directly
sq = (
session
.query(MyTable.id)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
).subquery("subq")
dq = (
session
.query(MyTable)
.filter(MyTable.id.in_(sq))
).delete(synchronize_session=False)
Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates:
query = session.query(
MyTable
).\
having(func.count(MyTable.date) > 1).\
group_by(MyTable.date).all()
If you only want to show unique dates; distinct on is what you might need
While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL.
And since the records don't have a key, we need the row number (_ROWID_) to delete the targeted records and I don't think the API provides it.
So first we connect to the database:
from sqlalchemy import create_engine
db = create_engine(r'sqlite:///C:\temp\example.db')
eng = db.engine
Then to list all the records:
for row in eng.execute("SELECT * FROM TableA;") :
print row
And to display all the duplicated records where the dates are identical:
for row in eng.execute("""
SELECT * FROM {table}
WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
ORDER BY {field};
""".format(table="TableA", field="Date")) :
print row
Now that we identified all the duplicates, they probably need to be fixed if the other fields are different:
eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
eng.execute("UPDATE TableA SET NormalA=4, specialA=8 WHERE Date = '2015-18-12' ;");
And finnally to keep the first inserted record and delete the most recent duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
Or to keep the last inserted record and delete the other duplicated records :
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
I was creating a database by using SQLite3.
and my python version is 2.7.5.
Before creating a database I simply created example database and tested it whether it will function well or not.
and It failed that the id wasn't incremented even I
declared it as a serial type.
I created simple database:
import sqlite3
con = sqlite3.connect('sample.db')
cur = con.cursor()
cur.execute("""CREATE TABLE sample(id serial,test real)""")
cur.execute("""INSERT INTO sample(test) VALUES(?)""",(3,))
cur.execute("""INSERT INTO sample(test) VALUES(?)""",(6,))
cur.execute("""INSERT INTO sample(test) VALUES(?)""",(8,))
con.commit()
Then I fetched all data:
data = cur.execute("""SELECT * from sample""")
t = data.fetchall()
In [33]: t
Out[33]: [(None, 3.0), (None, 6.0), (None, 8.0)]
I expected this:Out[33]: [(1, 3.0), (2, 6.0), (3, 8.0)]
However, As you can see, all of the id element was None
How can I solve this problem ?
I know I can do by just incrementing a variable and Insert it.
like this:
id += 1
cur.execute("""CREATE TABLE sample(id serial,test real)""")
id += 1
cur.execute("""INSERT INTO sample(test) VALUES(?)""",(3,))
id += 1
cur.execute("""INSERT INTO sample(test) VALUES(?)""",(6,))
However, isn't this awful ? I don't want to do it.
I'd like to make my code clear and smart.
There is no serial data type in SQLite. The correct way to create the table is with AUTOINCREMENT in SQLite:
CREATE TABLE sample ( id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
test REAL);
You'd need to mark a column as INTEGER PRIMARY KEY or INTEGER PRIMARY KEY AUTOINCREMENT to get auto-incrementation behaviour.
The difference between those two types is subtle; the latter form will never reuse IDs. See the ROWID and the INTEGER PRIMARY KEY documentation.