MariaDB duplicates being inserted - python

I have the following Python code to check if a MariaDB record exists already, and then inserting. However, I am having duplicates being inserted. Is there something wrong with the code, or is there a better way to do it? I'm new to using Python-MariaDB.
import mysql.connector as mariadb
from hashlib import sha1
mariadb_connection = mariadb.connect(user='root', password='', database='tweets_db')
# The values below are retrieved from Twitter API using Tweepy
# For simplicity, I've provided some sample values
id = '1a23bas'
tweet = 'Clear skies'
longitude = -84.361549
latitude = 34.022003
created_at = '2017-09-27'
collected_at = '2017-09-27'
collection_type = 'stream'
lang = 'us-en'
place_name = 'Roswell'
country_code = 'USA'
cronjob_tag = 'None'
user_id = '23abask'
user_name = 'tsoukalos'
user_geoenabled = 0
user_lang = 'us-en'
user_location = 'Roswell'
user_timezone = 'American/Eastern'
user_verified = 1
tweet_hash = sha1(tweet).hexdigest()
cursor = mariadb_connection.cursor(buffered=True)
cursor.execute("SELECT Count(id) FROM tweets WHERE tweet_hash = %s", (tweet_hash,))
if cursor.fetchone()[0] == 0:
cursor.execute("INSERT INTO tweets(id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)", (id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified))
mariadb_connection.commit()
cursor.close()
else:
cursor.close()
return
Below is the code for the table.
CREATE TABLE tweets (
id VARCHAR(255) NOT NULL,
tweet VARCHAR(255) NOT NULL,
tweet_hash VARCHAR(255) DEFAULT NULL,
longitude FLOAT DEFAULT NULL,
latitude FLOAT DEFAULT NULL,
created_at DATETIME DEFAULT NULL,
collected_at DATETIME DEFAULT NULL,
collection_type enum('stream','search') DEFAULT NULL,
lang VARCHAR(10) DEFAULT NULL,
place_name VARCHAR(255) DEFAULT NULL,
country_code VARCHAR(5) DEFAULT NULL,
cronjob_tag VARCHAR(255) DEFAULT NULL,
user_id VARCHAR(255) DEFAULT NULL,
user_name VARCHAR(20) DEFAULT NULL,
user_geoenabled TINYINT(1) DEFAULT NULL,
user_lang VARCHAR(10) DEFAULT NULL,
user_location VARCHAR(255) DEFAULT NULL,
user_timezone VARCHAR(100) DEFAULT NULL,
user_verified TINYINT(1) DEFAULT NULL
);

Add unique constant to tweet_has filed.
alter table tweets modify tweet_hash varchar(255) UNIQUE ;

Every table should have a PRIMARY KEY. Is id supposed to be that? (The CREATE TABLE is not saying so.) A PK is, by definition, UNIQUE, so that would cause an error on inserting a duplicate.
Meanwhile:
Why have a tweet_hash? Simply index tweet.
Don't say 255 when there are specific limits smaller than that.
user_id and user_name should be in another "lookup" table, not both in this table.
Does user_verified belong with the user? Or with each tweet?
If you are expecting millions of tweets, this table needs to be made smaller and indexed -- else you will run into performance problems.

Related

PyMySql Column Truncated and Duplicate Index Error

Here is my table creation code:
CREATE TABLE `crypto_historical_price2` (
`Ticker` varchar(255) COLLATE latin1_bin NOT NULL,
`Timestamp` varchar(255) COLLATE latin1_bin NOT NULL,
`PerpetualPrice` double DEFAULT NULL,
`SpotPrice` double DEFAULT NULL,
`Source` varchar(255) COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`Ticker`,`Timestamp`,`Source`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
I'm updating stuff in batch with sql statements like the following
sql = "INSERT INTO crypto."+TABLE+"(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s;" % batchdata
where batchdata is just a string of data in "('SOL', '2022-11-03 02:01:00', '31.2725', '31.2875', 'FTX'),('SOL', '2022-11-03 02:02:00', '31.3075', '31.305', 'FTX')
Now my script runs for a bit of time successfully inserting data in to the table but then it barfs with the following errors:
error 1265 data truncated for column PerpetualPrice
and
Duplicate entry 'SOL-2022-11-02 11:00:00-FTX' for key 'primary'
I've tried to solve the second error with
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker), Timestamp = VALUES(Timestamp), PerpetualPrice = VALUES(PerpetualPrice), SpotPrice = VALUES(SpotPrice), Source = VALUES(Source);" % batchdata
and
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker),Timestamp = VALUES(Timestamp),Source = VALUES(Source);" % batchdata
The above 2 attempted remedy runs and doesn't throw an duplicate entry error but it doesn't update the table at all.
If I pause my script a couple of minutes and re-run, the error duplicate error goes away and it updates which even confuses me EVEN more lol.
Any ideas?

SQLAlchemy - Data long long for column email

I use a simple MySQL Database with a SQLAlchemy Model:
from tokenize import String
from sqlalchemy import Column, Integer, String
from .database import Base
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True, index=True)
email = Column(String(256), unique=True, index=True)
hashed_password = Column(String(256))
It works fine, but when the character length exceeds a number of characters which is much smaller than 256, I get the following error:
fastapi | sqlalchemy.exc.DataError: (MySQLdb._exceptions.DataError) (1406, "Data too long for column 'email' at row 1")
fastapi | [SQL: INSERT INTO users (email, hashed_password) VALUES (%s, %s)]
fastapi | [parameters: ('sdasdasdasdasdasda', 'sdasdasdasdasdasdsadadasdasd')]
I know there is a "strict-mode" in MySQL, but I would rather like to understand the error here and make it work in the python code, since I run the Database in a Container.
SHOW CREATE TABLE users:
| users | CREATE TABLE `users` (
`id` int NOT NULL AUTO_INCREMENT,
`email` varchar(256) DEFAULT NULL,
`hashed_password` varchar(256) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ix_users_email` (`email`),
KEY `ix_users_id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |

sqlalchemy define calculated column

how can I define in sqlalchemy a calculated column?
The date column should be calculated from the timestamp column (which has a default, but can be set also by client)
Here is my table definition (mysql):
Create Table: CREATE TABLE `events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`timestamp` datetime DEFAULT CURRENT_TIMESTAMP,
`date` date GENERATED ALWAYS AS (cast(`timestamp` as date)) STORED,
PRIMARY KEY (`id`)
)
here is my model, what should be in the date server default?
class MyModel(db.Model):
__tablename__ = "my_table"
timestamp = db.Column(DateTime(), server_default = func.now())
date = db.Column(Date(), server_default = <??>)
dd
date = db.Column(Date(), server_default = func.date(timestamp))

stored procedure call from sqlalchemy not commit

I have tested my SP in MySQL and it works fine. I was able to insert new entry with it. I try to call it from flask with alchemy and it does run, but insert is not made into the table although it appears to execute the right commands.
My SP checks if there is an existing entry if yes then return 0, if no then insert the entry and return 1. When I send a new query from backend, I got 1 as return value but insert is not made in the table, When I send the same query, the return value is still 1. When I send an existing query that the table holds, the return value is 0.
I have other routes with the same db.connect() and it does fetch information. I read other posts about calling SP with the same execute function to run raw sql. From the doc it does seem execute doesn't require extra commit command to confirm the transaction.
So why can't I insert from the flask server?
This is the backend function
def add_book(info):
try:
connection = db.connect()
title = info['bookTitle']
url = info['bookUrl']
isbn = info['isbn']
author = info['author']
#print("title: " + title + " url: "+ url + " isbn: "+ str(isbn) + " author"+ str(author))
query = 'CALL add_book("{}", "{}", {}, {});'.format(title, url, isbn, author)
#print(query)
query_results = connection.execute(query)
connection.close()
query_results = [x for x in query_results]
result = query_results[0][0]
except Exception as err:
print(type(err))
print(err.args)
return result
This is the table to insert
CREATE TABLE `book` (
`isbn` int(11) DEFAULT NULL,
`review_count` int(11) DEFAULT NULL,
`language_code` varchar(10) DEFAULT NULL,
`avg_rating` int(11) DEFAULT NULL,
`description_text` text,
`formt` varchar(30) DEFAULT NULL,
`link` varchar(200) DEFAULT NULL,
`authors` int(11) DEFAULT NULL,
`publisher` varchar(30) DEFAULT NULL,
`num_pages` int(11) DEFAULT NULL,
`publication_month` int(11) DEFAULT NULL,
`publication_year` int(11) DEFAULT NULL,
`url` varchar(200) DEFAULT NULL,
`image_url` varchar(200) DEFAULT NULL,
`book_id` int(11) NOT NULL AUTO_INCREMENT,
`ratings_count` int(11) DEFAULT NULL,
`work_id` int(11) DEFAULT NULL,
`title` varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
PRIMARY KEY (`book_id`),
KEY `authors` (`authors`),
CONSTRAINT `book_ibfk_2` FOREIGN KEY (`authors`) REFERENCES `author` (`author_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=36485537 DEFAULT CHARSET=utf8;
This is the SP
DELIMITER $$
CREATE DEFINER=`root`#`%` PROCEDURE `add_book`(
IN titleIn VARCHAR(200), urlIn VARCHAR(200), isbnIn INT, authorIn INT)
BEGIN
DECLARE addSucess INT;
DECLARE EXIT HANDLER FOR sqlexception
BEGIN
GET diagnostics CONDITION 1
#p1 = returned_sqlstate, #p2 = message_text;
SELECT #pa1, #p2;
ROLLBACK;
END;
DECLARE exit handler for sqlwarning
BEGIN
GET DIAGNOSTICS CONDITION 1
#p1 = RETURNED_SQLSTATE, #p2 = MESSAGE_TEXT;
SELECT #p1 as RETURNED_SQLSTATE , #p2 as MESSAGE_TEXT;
ROLLBACK;
END;
IF EXISTS (SELECT 1 FROM book WHERE title = titleIn) THEN
SET addSucess = 0;
ELSE
INSERT INTO book (authors, title, url, book_id)
VALUES (authorIn, titleIn, urlIn, null);
SET addSucess = 1;
END IF;
SELECT addSucess;
END$$
DELIMITER ;
My user permission from show grants for current_user
[('GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOC ... (73 characters truncated) ... OW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`#`%` WITH GRANT OPTION',), ('GRANT APPLICATION_PASSWORD_ADMIN,CONNECTION_ADMIN,ROLE_ADMIN,SET_USER_ID,XA_RECOVER_ADMIN ON *.* TO `root`#`%` WITH GRANT OPTION',), ('REVOKE INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES, CREATE VIEW, CREATE ROUTINE, ALTER ROUTINE ON `mysql`.* FROM `root`#`%`',), ('REVOKE INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES, CREATE VIEW, CREATE ROUTINE, ALTER ROUTINE ON `sys`.* FROM `root`#`%`',), ('GRANT INSERT ON `mysql`.`general_log` TO `root`#`%`',), ('GRANT INSERT ON `mysql`.`slow_log` TO `root`#`%`',), ('GRANT `cloudsqlsuperuser`#`%` TO `root`#`%`',)]
I solved it with the Session api instead. If someone is reading, pls tell me a better way of passing the params and parse the return result
def add_book(info):
title = info['bookTitle']
url = info['bookUrl']
isbn = info['isbn']
author = info['author']
with Session(db) as session:
session.begin()
try:
query = 'CALL insert_book("{}", "{}", {}, {});'.format(title, url, isbn, author)
result = session.execute(text(query)).all()
except:
session.rollback()
raise
else:
session.commit()

MySQL datetime column WHERE col IS NULL fails

I cannot get my very basic SQL query to work as it returns 0 values despite the fact that there are clearly nulls
query
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL
DDL
CREATE TABLE closes
(
id INT AUTO_INCREMENT
PRIMARY KEY,
lead_id INT NOT NULL,
close_date DATETIME NULL,
close_type VARCHAR(255) NULL,
primary_agent VARCHAR(255) NULL,
price FLOAT NULL,
gross_commission FLOAT NULL,
company_dollar FLOAT NULL,
address VARCHAR(255) NULL,
city VARCHAR(255) NULL,
state VARCHAR(10) NULL,
zip VARCHAR(10) NULL,
CONSTRAINT closes_ibfk_1
FOREIGN KEY (lead_id) REFERENCES leads (id)
)
ENGINE = InnoDB;
CREATE INDEX lead_id
ON closes (lead_id);
I should mention that I am inserting the data with a python web scraper and SQLAlchemy. If the data is not scraped it will be None on insert.
Here is a screenshot of datagrip showing a null value in the row
EDIT
Alright so I went ahead and ran the following on some of the entries in the table where the value was already <null>
UPDATE closes
SET close_date = NULL
WHERE
lead_id = <INTEGERVAL>
;
What is interesting now is that when running the original query I do actually return the 2 records that I ran the update query for (the expected outcome). This would lead me to believe that the issues is with how my SQLAlchemy model is mapping the values on insert.
models.py
class Close(db.Model, ItemMixin):
__tablename__ = 'closes'
id = db.Column(db.Integer, primary_key=True)
lead_id = db.Column(db.Integer, db.ForeignKey('leads.id'), nullable=False)
close_date = db.Column(db.DateTime)
close_type = db.Column(db.String(255))
primary_agent = db.Column(db.String(255))
price = db.Column(db.Float)
gross_commission = db.Column(db.Float)
company_dollar = db.Column(db.Float)
address = db.Column(db.String(255))
city = db.Column(db.String(255))
state = db.Column(db.String(10))
zip = db.Column(db.String(10))
def __init__(self, item):
self.build_from_item(item)
def build_from_item(self, item):
for k, v in item.items():
setattr(self, k, v)
But I am fairly confident the value is a python None in the event no value is scraped from the website. My understanding is that SQLAlchemy would map a None to NULL on insert and given that nullable=True is the default setting, which can seen on the generated DDL, I am still at a loss as to why it appears to be NULL when in reality it is not behaving that way.
EDIT 2
Only place where I think the issue would be happening is where my spider actually scrapes the data and assigns it to the Item which is shown below
closes.py
# item['close_date'] = None at this point
try:
item['close_date'] = arrow.get(item['close_date'], 'MMM D, YYYY').format('YYYY-MM-DD')
except ParserError as e:
# Maybe item['close_date'] = None here?
spider.logger.error(f'Parse error: {item["close_date"]} - {e}')
In the python code I've written this would appear to be the place where the issue would arise. But if arrow.get throws an exception the value of item['close_date'] should still be None. If that is not the case and even if it is it does not explain why it appears that the record value is NULL even thought it does not behave like it is.
I'm guessing that you're having an issue with the join, not the NULL value. The query below returns 1 result for me. More info about your data, the software used for querying (I tested with SQL Yog), and applicable versions might help.
EDIT
It could be that you're having issues with MySQL's 'zero date'.
https://dev.mysql.com/doc/refman/5.7/en/date-and-time-types.html
MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy
date.” This is in some cases more convenient than using NULL values,
and uses less data and index space. To disallow '0000-00-00', enable
the NO_ZERO_DATE mode.
I've updated the SQL data below to include a zero date in the INSERT and SELECT's WHERE.
DROP TABLE IF EXISTS closes;
DROP TABLE IF EXISTS leads;
CREATE TABLE leads (
id INT(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
) ENGINE=INNODB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
INSERT INTO leads(id) VALUES (1),(2),(3);
CREATE TABLE closes (
id INT(11) NOT NULL AUTO_INCREMENT,
lead_id INT(11) NOT NULL,
close_date DATETIME DEFAULT NULL,
close_type VARCHAR(255) DEFAULT NULL,
primary_agent VARCHAR(255) DEFAULT NULL,
price FLOAT DEFAULT NULL,
gross_commission FLOAT DEFAULT NULL,
company_dollar FLOAT DEFAULT NULL,
address VARCHAR(255) DEFAULT NULL,
city VARCHAR(255) DEFAULT NULL,
state VARCHAR(10) DEFAULT NULL,
zip VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (id),
KEY lead_id (lead_id),
CONSTRAINT closes_ibfk_1 FOREIGN KEY (lead_id) REFERENCES leads (id)
) ENGINE=INNODB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO closes(id,lead_id,close_date,close_type,primary_agent,price,gross_commission,company_dollar,address,city,state,zip)
VALUES
(1,3,'0000-00-0000',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(2,1,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(3,2,'2018-01-09 17:01:44',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL OR c.close_date = '0000-00-00';

Categories

Resources