I have a DB table which looks like
CREATE TABLE `localquotes` (
`id` bigint NOT NULL AUTO_INCREMENT,
`createTime` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`tag` varchar(8) NOT NULL,
`monthNum` int NOT NULL,
`flag` float NOT NULL DEFAULT '0',
`optionType` varchar(1) NOT NULL,
`symbol` varchar(30) NOT NULL,
`bid` float DEFAULT NULL,
`ask` float DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
for which one I have created a trigger
CREATE DEFINER=`user`#`localhost` TRIGGER `localquotes_BEFORE_INSERT` BEFORE INSERT ON `localquotes` FOR EACH ROW BEGIN
SET new.tag=left(symbol,3);
SET new.monthNum=right(left(symbol,5),1);
SET new.optionType=left(right(symbol,11),1);
SET new.flag=right(left(symbol,11),4);
END
which is causing pymysql.err.OperationalError: (1054, "Unknown column 'symbol' in 'field list'") for pymysql on simple INSERT like
insertQuery = "INSERT INTO localquotes (tag,monthNum,flag,optionType,symbol,bid) VALUES (%s,%s,%s,%s,%s,%s)"
insertValues = ('UNKNOWN', d.strftime("%m"), 0, 'X', symbol, bid)
cursor.execute(insertQuery, insertValues)
db.commit()
When I will remove that trigger insert works fine.
Any clue why code is complaining about symbol column, which exists, when there is trigger set for that column value from the insert request?
You must reference all the columns of the row that spawned the trigger with the NEW.* prefix.
SET new.tag=left(new.symbol,3);
And so on.
Related
Here is my table creation code:
CREATE TABLE `crypto_historical_price2` (
`Ticker` varchar(255) COLLATE latin1_bin NOT NULL,
`Timestamp` varchar(255) COLLATE latin1_bin NOT NULL,
`PerpetualPrice` double DEFAULT NULL,
`SpotPrice` double DEFAULT NULL,
`Source` varchar(255) COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`Ticker`,`Timestamp`,`Source`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
I'm updating stuff in batch with sql statements like the following
sql = "INSERT INTO crypto."+TABLE+"(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s;" % batchdata
where batchdata is just a string of data in "('SOL', '2022-11-03 02:01:00', '31.2725', '31.2875', 'FTX'),('SOL', '2022-11-03 02:02:00', '31.3075', '31.305', 'FTX')
Now my script runs for a bit of time successfully inserting data in to the table but then it barfs with the following errors:
error 1265 data truncated for column PerpetualPrice
and
Duplicate entry 'SOL-2022-11-02 11:00:00-FTX' for key 'primary'
I've tried to solve the second error with
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker), Timestamp = VALUES(Timestamp), PerpetualPrice = VALUES(PerpetualPrice), SpotPrice = VALUES(SpotPrice), Source = VALUES(Source);" % batchdata
and
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker),Timestamp = VALUES(Timestamp),Source = VALUES(Source);" % batchdata
The above 2 attempted remedy runs and doesn't throw an duplicate entry error but it doesn't update the table at all.
If I pause my script a couple of minutes and re-run, the error duplicate error goes away and it updates which even confuses me EVEN more lol.
Any ideas?
In the SQLite command line, the command .schema can be used to export a database schema in SQL syntax, and that export can be used to rebuild a database of the same structure:
.output folderpath/schema.sql
.schema
Saves the following to a file named "schema.sql":
CREATE TABLE mytable (id INTEGER NOT NULL, name TEXT NOT NULL, date DATETIME, PRIMARY KEY (id), FOREIGN KEY (name) REFERENCES mytable2 (na ...
Can the same output .sql file be achieved using Python's sqlite3 library without a custom function?
There are several questions on Stack Overflow with similar titles, but I didn't find any that are actually trying to get the full schema (they are actually looking for PRAGMA table_info which does not have the CREATE TABLE, etc. statements in the output).
Well. Rewritten the answer above. It that exactly what you need?
import sqlite3
dbname = 'chinook.db'
with sqlite3.connect(dbname) as con:
cursor = con.cursor()
cursor.execute('select sql from sqlite_master')
for r in cursor.fetchall():
print(r[0])
cursor.close()
With the test sqlite3 database I received the following:
CREATE TABLE "albums"
(
[AlbumId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Title] NVARCHAR(160) NOT NULL,
[ArtistId] INTEGER NOT NULL,
FOREIGN KEY ([ArtistId]) REFERENCES "artists" ([ArtistId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE TABLE sqlite_sequence(name,seq)
CREATE TABLE "artists"
(
[ArtistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Name] NVARCHAR(120)
)
CREATE TABLE "customers"
(
[CustomerId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[FirstName] NVARCHAR(40) NOT NULL,
[LastName] NVARCHAR(20) NOT NULL,
[Company] NVARCHAR(80),
[Address] NVARCHAR(70),
[City] NVARCHAR(40),
[State] NVARCHAR(40),
[Country] NVARCHAR(40),
[PostalCode] NVARCHAR(10),
[Phone] NVARCHAR(24),
[Fax] NVARCHAR(24),
[Email] NVARCHAR(60) NOT NULL,
[SupportRepId] INTEGER,
FOREIGN KEY ([SupportRepId]) REFERENCES "employees" ([EmployeeId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE TABLE "employees"
(
[EmployeeId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[LastName] NVARCHAR(20) NOT NULL,
[FirstName] NVARCHAR(20) NOT NULL,
[Title] NVARCHAR(30),
[ReportsTo] INTEGER,
[BirthDate] DATETIME,
[HireDate] DATETIME,
[Address] NVARCHAR(70),
[City] NVARCHAR(40),
[State] NVARCHAR(40),
[Country] NVARCHAR(40),
[PostalCode] NVARCHAR(10),
[Phone] NVARCHAR(24),
[Fax] NVARCHAR(24),
[Email] NVARCHAR(60),
FOREIGN KEY ([ReportsTo]) REFERENCES "employees" ([EmployeeId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE TABLE "genres"
(
[GenreId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Name] NVARCHAR(120)
)
CREATE TABLE "invoices"
(
[InvoiceId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[CustomerId] INTEGER NOT NULL,
[InvoiceDate] DATETIME NOT NULL,
[BillingAddress] NVARCHAR(70),
[BillingCity] NVARCHAR(40),
[BillingState] NVARCHAR(40),
[BillingCountry] NVARCHAR(40),
[BillingPostalCode] NVARCHAR(10),
[Total] NUMERIC(10,2) NOT NULL,
FOREIGN KEY ([CustomerId]) REFERENCES "customers" ([CustomerId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE TABLE "invoice_items"
(
[InvoiceLineId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[InvoiceId] INTEGER NOT NULL,
[TrackId] INTEGER NOT NULL,
[UnitPrice] NUMERIC(10,2) NOT NULL,
[Quantity] INTEGER NOT NULL,
FOREIGN KEY ([InvoiceId]) REFERENCES "invoices" ([InvoiceId])
ON DELETE NO ACTION ON UPDATE NO ACTION,
FOREIGN KEY ([TrackId]) REFERENCES "tracks" ([TrackId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE TABLE "media_types"
(
[MediaTypeId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Name] NVARCHAR(120)
)
CREATE TABLE "playlists"
(
[PlaylistId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Name] NVARCHAR(120)
)
CREATE TABLE "playlist_track"
(
[PlaylistId] INTEGER NOT NULL,
[TrackId] INTEGER NOT NULL,
CONSTRAINT [PK_PlaylistTrack] PRIMARY KEY ([PlaylistId], [TrackId]),
FOREIGN KEY ([PlaylistId]) REFERENCES "playlists" ([PlaylistId])
ON DELETE NO ACTION ON UPDATE NO ACTION,
FOREIGN KEY ([TrackId]) REFERENCES "tracks" ([TrackId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
None
CREATE TABLE "tracks"
(
[TrackId] INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
[Name] NVARCHAR(200) NOT NULL,
[AlbumId] INTEGER,
[MediaTypeId] INTEGER NOT NULL,
[GenreId] INTEGER,
[Composer] NVARCHAR(220),
[Milliseconds] INTEGER NOT NULL,
[Bytes] INTEGER,
[UnitPrice] NUMERIC(10,2) NOT NULL,
FOREIGN KEY ([AlbumId]) REFERENCES "albums" ([AlbumId])
ON DELETE NO ACTION ON UPDATE NO ACTION,
FOREIGN KEY ([GenreId]) REFERENCES "genres" ([GenreId])
ON DELETE NO ACTION ON UPDATE NO ACTION,
FOREIGN KEY ([MediaTypeId]) REFERENCES "media_types" ([MediaTypeId])
ON DELETE NO ACTION ON UPDATE NO ACTION
)
CREATE INDEX [IFK_AlbumArtistId] ON "albums" ([ArtistId])
CREATE INDEX [IFK_CustomerSupportRepId] ON "customers" ([SupportRepId])
CREATE INDEX [IFK_EmployeeReportsTo] ON "employees" ([ReportsTo])
CREATE INDEX [IFK_InvoiceCustomerId] ON "invoices" ([CustomerId])
CREATE INDEX [IFK_InvoiceLineInvoiceId] ON "invoice_items" ([InvoiceId])
CREATE INDEX [IFK_InvoiceLineTrackId] ON "invoice_items" ([TrackId])
CREATE INDEX [IFK_PlaylistTrackTrackId] ON "playlist_track" ([TrackId])
CREATE INDEX [IFK_TrackAlbumId] ON "tracks" ([AlbumId])
CREATE INDEX [IFK_TrackGenreId] ON "tracks" ([GenreId])
CREATE INDEX [IFK_TrackMediaTypeId] ON "tracks" ([MediaTypeId])
CREATE TABLE sqlite_stat1(tbl,idx,stat)
I'm currently working on writing a web scraper for a website called Mountain Project and have come across an issue while inserting items into a MySQL (MariaDB) database.
The basic flow of my crawler is this:
Get response from link extractor (I'm subclassing CrawlSpider)
Extract data from response and send extracted items down the item pipeline
Items get picked up by the SqlPipeline and inserted into the database
An important note about step 2 is that the crawler is sending multiple items down the pipeline. The first item will be the main resource (either a route or an area), and then any items after that will be other important data about that resource. For areas, those items will be data on different weather conditions, for routes those items will be the different grades assigned to those routes.
My area and route tables look like this:
CREATE TABLE `area` (
`area_id` INT(10) UNSIGNED NOT NULL,
`parent_id` INT(10) UNSIGNED NULL DEFAULT NULL,
`name` VARCHAR(200) NOT NULL DEFAULT '',
`latitude` FLOAT(12) NULL DEFAULT -1,
`longitude` FLOAT(12) NULL DEFAULT -1,
`elevation` INT(11) NULL DEFAULT '0',
`link` VARCHAR(300) NOT NULL,
PRIMARY KEY (`area_id`) USING BTREE
)
CREATE TABLE `route` (
`route_id` INT(10) UNSIGNED NOT NULL,
`parent_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(200) NOT NULL DEFAULT '' COLLATE 'utf8_general_ci',
`link` VARCHAR(300) NOT NULL COLLATE 'utf8_general_ci',
`rating` FLOAT(12) NULL DEFAULT '0',
`types` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8_general_ci',
`pitches` INT(3) NULL DEFAULT '0',
`height` INT(5) NULL DEFAULT '0',
`length` VARCHAR(5) NULL DEFAULT NULL COLLATE 'utf8_general_ci',
PRIMARY KEY (`route_id`) USING BTREE,
INDEX `fk_parent_id` (`parent_id`) USING BTREE,
CONSTRAINT `fk_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `mountainproject`.`area` (`area_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
And here's an example of one of my condition tables:
CREATE TABLE `temp_avg` (
`month` INT(2) UNSIGNED NOT NULL,
`area_id` INT(10) UNSIGNED NOT NULL,
`avg_high` INT(3) NOT NULL,
`avg_low` INT(3) NOT NULL,
PRIMARY KEY (`month`, `area_id`) USING BTREE,
INDEX `fk_area_id` (`area_id`) USING BTREE,
CONSTRAINT `fk_area_id` FOREIGN KEY (`area_id`) REFERENCES `mountainproject`.`area` (`area_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
Here's where things start to get troublesome. If I run my crawler and just extract areas, everything works fine. The area is inserted into the database, and all the conditions data gets inserted without a problem. However, when I try to extract areas and routes, I get foreign key constraint failures when trying to insert routes because the area that the route belongs to (parent_id) doesn't exist. Currently to work around this I've been running my crawler twice. Once to extract area data, and once to extract route data. If I do that, everything goes smoothly.
My best guess as to why this doesn't work currently is that the areas that are being inserted haven't been committed and so when I attempt to add a route that belongs to an uncommitted area, it can't find the parent area. This theory quickly falls apart though because I'm able to insert condition data in the same run that I insert the area that the data belongs to.
My insertion code looks like this:
def insert_item(self, table_name, item):
encoded_vals = [self.sql_encode(val) for val in item.values()]
sql = "INSERT INTO %s (%s) VALUES (%s)" % (
table_name,
", ".join(item.keys()),
", ".join(encoded_vals)
)
logging.debug(sql)
self.cursor.execute(sql)
# EDIT: As suggested by #tadman I have moved to using the built in sql value
# encoding. I'm leaving this here because it doesn't affect the issue
def sql_encode(self, value):
"""Encode provided value and return a valid SQL value
Arguments:
value {Any} -- Value to encode
Returns:
str -- SQL encode value as a str
"""
encoded_val = None
is_empty = False
if isinstance(value, str):
is_empty = len(value) == 0
encoded_val = "NULL" if is_empty or value is None else value
if isinstance(encoded_val, str) and encoded_val is not "NULL":
encode_val = encoded_val.replace("\"", "\\\"")
encoded_val = "\"%s\"" % encoded_val
return str(encoded_val)
The rest of the project lives in a GitHub repo if any more code/context is needed
I'm having an issue with my application causing MySQL table to be locked due to inserts which take a long time, after reviewing online articles, it seems like it's related to auto increment, info below -
Python that inserts data (row at a time unfortunately as I need the auto incremented id for reference in future inserts) -
for i, flightobj in stats[ucid]['flight'].items():
flight_fk = None
# Insert flights
try:
with mysqlconnection.cursor() as cursor:
sql = "insert into cb_flights(ucid,takeoff_time,end_time,end_event,side,kills,type,map_fk,era_fk) values(%s,%s,%s,%s,%s,%s,%s,%s,%s);"
cursor.execute(sql, (
ucid, flightobj['start_time'], flightobj['end_time'], flightobj['end_event'], flightobj['side'],
flightobj['killnum'], flightobj['type'], map_fk, era_fk))
mysqlconnection.commit()
if cursor.lastrowid:
flight_fk = cursor.lastrowid
else:
flight_fk = 0
except pymysql.err.ProgrammingError as e:
logging.exception("Error: {}".format(e))
except pymysql.err.IntegrityError as e:
logging.exception("Error: {}".format(e))
except TypeError as e:
logging.exception("Error: {}".format(e))
except:
logging.exception("Unexpected error:", sys.exc_info()[0])
The above runs every 2 minutes on the same data and is supposed to insert only non duplicates as the MySQL would deny duplicates due to the unique ucid_takeofftime index.
MYSQL info, cb_flights table -
`pk` int(11) NOT NULL AUTO_INCREMENT,
`ucid` varchar(50) NOT NULL,
`takeoff_time` datetime DEFAULT NULL,
`end_time` datetime DEFAULT NULL,
`end_event` varchar(45) DEFAULT NULL,
`side` varchar(45) DEFAULT NULL,
`kills` int(11) DEFAULT NULL,
`type` varchar(45) DEFAULT NULL,
`map_fk` int(11) DEFAULT NULL,
`era_fk` int(11) DEFAULT NULL,
`round_fk` int(11) DEFAULT NULL,
PRIMARY KEY (`pk`),
UNIQUE KEY `ucid_takeofftime` (`ucid`,`takeoff_time`),
KEY `ucid_idx` (`ucid`) /*!80000 INVISIBLE */,
KEY `end_event` (`end_event`) /*!80000 INVISIBLE */,
KEY `side` (`side`)
) ENGINE=InnoDB AUTO_INCREMENT=76023132 DEFAULT CHARSET=utf8;
Now inserts into the table from the Python code, can take sometimes over 60 seconds.
I beleive it might be related to the auto increment that is creating the lock on the table, if so, I'm looking for a workaround.
innodb info -
innodb_autoinc_lock_mode 2
innodb_lock_wait_timeout 50
buffer is used up to 70% more or less.
Appreciate any assistance with this, either from application side or MySQL side.
EDIT
Adding the create statement for the cb_kills table which is also used with inserts but without an issue as far as I can see, this is in response to the comment on the 1st answer.
CREATE TABLE `cb_kills` (
`pk` int(11) NOT NULL AUTO_INCREMENT,
`time` datetime DEFAULT NULL,
`killer_ucid` varchar(50) NOT NULL,
`killer_side` varchar(10) DEFAULT NULL,
`killer_unit` varchar(45) DEFAULT NULL,
`victim_ucid` varchar(50) DEFAULT NULL,
`victim_side` varchar(10) DEFAULT NULL,
`victim_unit` varchar(45) DEFAULT NULL,
`weapon` varchar(45) DEFAULT NULL,
`flight_fk` int(11) NOT NULL,
`kill_id` int(11) NOT NULL,
PRIMARY KEY (`pk`),
UNIQUE KEY `ucid_killid_flightfk_uniq` (`killer_ucid`,`flight_fk`,`kill_id`),
KEY `flight_kills_fk_idx` (`flight_fk`),
KEY `killer_ucid_fk_idx` (`killer_ucid`),
KEY `victim_ucid_fk_idx` (`victim_ucid`),
KEY `time_ucid_killid_uniq` (`time`,`killer_ucid`,`kill_id`),
CONSTRAINT `flight_kills_fk` FOREIGN KEY (`flight_fk`) REFERENCES `cb_flights` (`pk`)
) ENGINE=InnoDB AUTO_INCREMENT=52698582 DEFAULT CHARSET=utf8;
You can check if autocommit is set to 1, this forces to commit every row and disabling it makes it somewhat faster
Instead of committing every insert try to bulk insert.
For that you should check
https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-bulk-data-loading.html
and do something like
data = [
('city 1', 'MAC', 'district 1', 16822),
('city 2', 'PSE', 'district 2', 15642),
('city 3', 'ZWE', 'district 3', 11642),
('city 4', 'USA', 'district 4', 14612),
('city 5', 'USA', 'district 5', 17672),
]
sql = "insert into city(name, countrycode, district, population)
VALUES(%s, %s, %s, %s)"
number_of_rows = cursor.executemany(sql, data)
db.commit()
I want to put in here some of the ways I worked on finding a solution to this problem. I'm not an expert in MySQL but I think these steps can help anyone looking to find out why he has lock wait timeouts.
So the troubleshooting steps I took are as follows -
1- Check if I can find in the MySQL slow log the relevant query that is locking my table. Usually it's possible to find queries that run a long time and also locks with the info below and the query right after it
# Time: 2020-01-28T17:31:48.634308Z
# User#Host: # localhost [::1] Id: 980397
# Query_time: 250.474040 Lock_time: 0.000000 Rows_sent: 10 Rows_examined: 195738
2- The above should give some clue on what's going on in the server and what might be waiting for a long time. Next I ran the following 3 queries to identify what is in use:
check process list on which process are running -
show full processlist;
check tables in use currently -
show open tables where in_use>0;
check running transactions -
SELECT * FROM `information_schema`.`innodb_trx` ORDER BY `trx_started`;
3- The above 2 steps should give enough information on which query is locking the tables. in my case here I had a SP that ran an insert into <different table> select from <my locked table>, while it was inserting to a totally different table, this query was locking my table due to the select operation that took a long time.
To work around it, I changed the SP to work with temporary tables and now although the query is still not completely optimized, there are no locks on my table.
Adding here how I run the SP on temporary tables for async aggregated updates.
CREATE DEFINER=`username`#`%` PROCEDURE `procedureName`()
BEGIN
drop temporary table if exists scheme.temp1;
drop temporary table if exists scheme.temp2;
drop temporary table if exists scheme.temp3;
create temporary table scheme.temp1 AS select * from scheme.live1;
create temporary table scheme.temp2 AS select * from scheme.live2;
create temporary table scheme.temp3 AS select * from scheme.live3;
create temporary table scheme.emptytemp (
`cName1` int(11) NOT NULL,
`cName2` varchar(45) NOT NULL,
`cName3` int(11) NOT NULL,
`cName4` datetime NOT NULL,
`cName5` datetime NOT NULL,
KEY `cName1` (`cName1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT into scheme.emptytemp
select t1.x,t2.y,t3.z
from scheme.temp1 t1
JOIN scheme.temp2 t2
ON t1.x = t2.x
JOIN scheme.temp3 t3
ON t2.y = t3.y
truncate table scheme.liveTable;
INSERT into scheme.liveTable
select * from scheme.emptytemp;
END
Hope this helps anyone that encounters this issue
I cannot get my very basic SQL query to work as it returns 0 values despite the fact that there are clearly nulls
query
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL
DDL
CREATE TABLE closes
(
id INT AUTO_INCREMENT
PRIMARY KEY,
lead_id INT NOT NULL,
close_date DATETIME NULL,
close_type VARCHAR(255) NULL,
primary_agent VARCHAR(255) NULL,
price FLOAT NULL,
gross_commission FLOAT NULL,
company_dollar FLOAT NULL,
address VARCHAR(255) NULL,
city VARCHAR(255) NULL,
state VARCHAR(10) NULL,
zip VARCHAR(10) NULL,
CONSTRAINT closes_ibfk_1
FOREIGN KEY (lead_id) REFERENCES leads (id)
)
ENGINE = InnoDB;
CREATE INDEX lead_id
ON closes (lead_id);
I should mention that I am inserting the data with a python web scraper and SQLAlchemy. If the data is not scraped it will be None on insert.
Here is a screenshot of datagrip showing a null value in the row
EDIT
Alright so I went ahead and ran the following on some of the entries in the table where the value was already <null>
UPDATE closes
SET close_date = NULL
WHERE
lead_id = <INTEGERVAL>
;
What is interesting now is that when running the original query I do actually return the 2 records that I ran the update query for (the expected outcome). This would lead me to believe that the issues is with how my SQLAlchemy model is mapping the values on insert.
models.py
class Close(db.Model, ItemMixin):
__tablename__ = 'closes'
id = db.Column(db.Integer, primary_key=True)
lead_id = db.Column(db.Integer, db.ForeignKey('leads.id'), nullable=False)
close_date = db.Column(db.DateTime)
close_type = db.Column(db.String(255))
primary_agent = db.Column(db.String(255))
price = db.Column(db.Float)
gross_commission = db.Column(db.Float)
company_dollar = db.Column(db.Float)
address = db.Column(db.String(255))
city = db.Column(db.String(255))
state = db.Column(db.String(10))
zip = db.Column(db.String(10))
def __init__(self, item):
self.build_from_item(item)
def build_from_item(self, item):
for k, v in item.items():
setattr(self, k, v)
But I am fairly confident the value is a python None in the event no value is scraped from the website. My understanding is that SQLAlchemy would map a None to NULL on insert and given that nullable=True is the default setting, which can seen on the generated DDL, I am still at a loss as to why it appears to be NULL when in reality it is not behaving that way.
EDIT 2
Only place where I think the issue would be happening is where my spider actually scrapes the data and assigns it to the Item which is shown below
closes.py
# item['close_date'] = None at this point
try:
item['close_date'] = arrow.get(item['close_date'], 'MMM D, YYYY').format('YYYY-MM-DD')
except ParserError as e:
# Maybe item['close_date'] = None here?
spider.logger.error(f'Parse error: {item["close_date"]} - {e}')
In the python code I've written this would appear to be the place where the issue would arise. But if arrow.get throws an exception the value of item['close_date'] should still be None. If that is not the case and even if it is it does not explain why it appears that the record value is NULL even thought it does not behave like it is.
I'm guessing that you're having an issue with the join, not the NULL value. The query below returns 1 result for me. More info about your data, the software used for querying (I tested with SQL Yog), and applicable versions might help.
EDIT
It could be that you're having issues with MySQL's 'zero date'.
https://dev.mysql.com/doc/refman/5.7/en/date-and-time-types.html
MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy
date.” This is in some cases more convenient than using NULL values,
and uses less data and index space. To disallow '0000-00-00', enable
the NO_ZERO_DATE mode.
I've updated the SQL data below to include a zero date in the INSERT and SELECT's WHERE.
DROP TABLE IF EXISTS closes;
DROP TABLE IF EXISTS leads;
CREATE TABLE leads (
id INT(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
) ENGINE=INNODB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
INSERT INTO leads(id) VALUES (1),(2),(3);
CREATE TABLE closes (
id INT(11) NOT NULL AUTO_INCREMENT,
lead_id INT(11) NOT NULL,
close_date DATETIME DEFAULT NULL,
close_type VARCHAR(255) DEFAULT NULL,
primary_agent VARCHAR(255) DEFAULT NULL,
price FLOAT DEFAULT NULL,
gross_commission FLOAT DEFAULT NULL,
company_dollar FLOAT DEFAULT NULL,
address VARCHAR(255) DEFAULT NULL,
city VARCHAR(255) DEFAULT NULL,
state VARCHAR(10) DEFAULT NULL,
zip VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (id),
KEY lead_id (lead_id),
CONSTRAINT closes_ibfk_1 FOREIGN KEY (lead_id) REFERENCES leads (id)
) ENGINE=INNODB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO closes(id,lead_id,close_date,close_type,primary_agent,price,gross_commission,company_dollar,address,city,state,zip)
VALUES
(1,3,'0000-00-0000',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(2,1,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(3,2,'2018-01-09 17:01:44',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL OR c.close_date = '0000-00-00';