MySQL insert statement does not work in Python - python

I am trying to fetch values from a csv calles 'items.csv' and then trying to store the values in a database table named 'articles2'. The insert statement is triggering following error:
pymysql.err.InternalError: (1292, "Incorrect datetime value: 'row[3]' for column 'date_added' at row 1")
This is my code:
import csv
import re
import pymysql
import sys
import os
import requests
from PIL import Image
def insert_articles2(rows):
rowcount = 0
for row in rows:
if rowcount !=0:
sql = "INSERT INTO articles2 (country, event_name, md5, date_added, profile_image, banner, sDate, eDate, address_line1, address_line2, pincode, state, city, locality, full_address, latitude, longitude, start_time, end_time, description, website, fb_page, fb_event_page, event_hashtag, source_name, source_url, email_id_organizer, ticket_url) VALUES ('row[0]', 'row[1]', 'row[2]', 'row[3]', 'row[4]', 'row[5]', 'row[6]', 'row[7]', 'row[8]', 'row[9]', 'row[10]', 'row[11]', 'row[12]', 'row[13]', 'row[14]', 'row[15]', 'row[16]', 'row[17]', 'row[18]', 'row[19]', 'row[20]', 'row[21]', 'row[22]', 'row[23]', 'row[24]', 'row[25]', 'row[26]', 'row[27]')"
cursor.execute(sql)
connection.commit()
rowcount+=1
rows = csv.reader(open("items.csv", "r"))
insert_articles2(rows)
Here's the structure of the table 'articles2'. Please see all the datatypes of the fields. What change should I made in my Python script to make this work? :
CREATE TABLE IF NOT EXISTS `articles2` (
`id` int(6) NOT NULL AUTO_INCREMENT,
`country` varchar(45) NOT NULL,
`event_name` varchar(200) NOT NULL,
`md5` varchar(35) NOT NULL,
`date_added` timestamp NULL DEFAULT NULL,
`profile_image` varchar(350) NOT NULL,
`banner` varchar(350) NOT NULL,
`sDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`eDate` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`address_line1` mediumtext,
`address_line2` mediumtext,
`pincode` int(7) NOT NULL,
`state` varchar(30) NOT NULL,
`city` text NOT NULL,
`locality` varchar(50) NOT NULL,
`full_address` varchar(350) NOT NULL,
`latitude` varchar(15) NOT NULL,
`longitude` varchar(15) NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL,
`description` longtext CHARACTER SET utf16 NOT NULL,
`website` varchar(50) DEFAULT NULL,
`fb_page` varchar(200) DEFAULT NULL,
`fb_event_page` varchar(200) DEFAULT NULL,
`event_hashtag` varchar(30) DEFAULT NULL,
`source_name` varchar(30) NOT NULL,
`source_url` varchar(350) NOT NULL,
`email_id_organizer` varchar(100) NOT NULL,
`ticket_url` mediumtext NOT NULL,
PRIMARY KEY (`id`),
KEY `full_address` (`full_address`),
KEY `full_address_2` (`full_address`),
KEY `id` (`id`),
KEY `event_name` (`event_name`),
KEY `sDate` (`sDate`),
KEY `eDate` (`eDate`),
KEY `id_2` (`id`),
KEY `country` (`country`),
KEY `event_name_2` (`event_name`),
KEY `sDate_2` (`sDate`),
KEY `eDate_2` (`eDate`),
KEY `state` (`state`),
KEY `locality` (`locality`),
KEY `start_time` (`start_time`),
KEY `start_time_2` (`start_time`),
KEY `end_time` (`end_time`),
KEY `id_3` (`id`),
KEY `id_4` (`id`),
KEY `event_name_3` (`event_name`),
KEY `md5` (`md5`),
KEY `sDate_3` (`sDate`),
KEY `eDate_3` (`eDate`),
KEY `latitude` (`latitude`),
KEY `longitude` (`longitude`),
KEY `start_time_3` (`start_time`),
KEY `end_time_2` (`end_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=4182 ;
A sample row of the csv:
country event_name md5 date_added profile_image banner sDate eDate address_line1 address_line2 pincode state city locality full_address latitude longitude start_time end_time description website fb_page fb_event_page event_hashtag source_name source_url email_id_organizer ticket_url
India India's largest 10K challenge, ProIndiaRun, Hyderabad on April 29th 6fa7ab214c279b765748b28362e9020b 2018-04-10 04:10:45 ../images/events/India-s-largest-10K-challenge-ProIndiaRun-Hyderabad-on-April-29th-Hyderabad-4-banner.png 2018-04-29 00:00:00 2018-04-29 00:00:00 500041 Telangana Hyderabad TBA, Hyderabad, Hyderabad, Telangana, 500041 05:00:00 10:00:00 Event Description,,ProIndiaRun, Hyderabad,,Welcome to Pro Run India, India's largest 10K challenge happening at Pan India Level in different cities. Come along with them to make India better, to raise the child in their choice of sports supporting them financially.,,,,Pro- Run India is coming to Hyderabad on 29th April 2018. The Run lets you choose from 5k and 10K Run. Hurry, Register today!,,,,5KM RUN : INR 650,,AGE: 10 to 50 Years(Male/Female),,AGE: 51 to 70 Years(Male/Female) VETERUN CATEGORY,,,Finisher Medals,,BIB with Timing Chip,,Electronic Timing Certificate,,Refreshment,,,10KM CHALLENGE : INR 1000,,AGE: 10 to 70 Years(Male/Female),,,Finisher Medals,,BIB with Timing Chip,,Electronic Timing Certificate,,Refreshment,,,PRIZES:-,,5KM (TROPHIES FOR 1ST THREE RUNNER UP'S MALE & FEMALE),,10KM CHALLENGE,,FEMALE,,1ST PRIZE INR 5000/- 2ND PRIZE INR 3000/- 3RD PRIZE INR 2000/-,,MALE,,1ST PRIZE INR 5000/- 2ND PRIZE INR 3000/- 3RD PRIZE INR 2000/-,,, https://www.eventsnow.com/events/9232-proindiarun-hyderabad proindiarun#gmail.com

With that statement you are trying to insert into the columns the strings 'row[0]', 'row[1]', 'row[2]',... and so on.
From the documentation, the an example of correct usage is:
sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
cursor.execute(sql, ('webmaster#python.org', 'very-secret'))
So in you case should be:
sql = """
INSERT INTO articles2 (country, event_name, md5, ..., ticket_url)
VALUES (%s, %s, %s, ..., %s)
"""
cursor.execute(sql, row)
Btw, if you are inserting all of the columns and the order in the table matches the csv you can avoid specifying (country, event_name, md5, ..., ticket_url).
Using executemany will instead allow you to avoid the for loop, inserting in a more efficient way the whole batch of rows in one call.
cursor.executemany(sql, rows)

Related

Update on Duplicate key with two columns to check mysql

I am trying to get my head around the 'On Duplicate Key' mysql statement. I have the following table:
id (primary key autoincr) / server id (INT) / member id (INT UNIQUE KEY) / basket (VARCHAR) / shop (VARCHAR UNIQUE KEY)
In this table each member can have two rows, one for each of the shops (shopA and shopB). I want to INSERT if there is no match for both the member id and shop. If there is a match I want it to update the basket to concat the current basket with additional information.
I am trying to use:
"INSERT INTO table_name (server_id, member_id, basket, shop) VALUES (%s, %s, %s, %s) ON DUPLICATE KEY UPDATE basket = CONCAT (basket,%s)"
Currently if there is an entry for the member for shopA when this runs with basket for shopB it adds the basket info to the shopA row instead of creating a new one.
Hope all this makes sense! Thanks in advance!
UPDATE: As requested here is the create table sql statement:
CREATE TABLE table_name ( member_id bigint(20) NOT NULL, server_id bigint(11) NOT NULL, basket varchar(10000) NOT NULL, shop varchar(30) NOT NULL, notes varchar(1000) DEFAULT NULL, PRIMARY KEY (member_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
In this table each member can have two rows, one for each of the shops
(shopA and shopB)
This means that member_id should not be the primary key of the table because it is not unique.
You need a composite primary key for the columns member_id and shop:
CREATE TABLE table_name (
member_id bigint(20) NOT NULL,
server_id bigint(11) NOT NULL,
basket varchar(10000) NOT NULL,
shop varchar(30) NOT NULL,
notes varchar(1000) DEFAULT NULL,
PRIMARY KEY (member_id, shop)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
See a simplified demo.

MySQL datetime column WHERE col IS NULL fails

I cannot get my very basic SQL query to work as it returns 0 values despite the fact that there are clearly nulls
query
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL
DDL
CREATE TABLE closes
(
id INT AUTO_INCREMENT
PRIMARY KEY,
lead_id INT NOT NULL,
close_date DATETIME NULL,
close_type VARCHAR(255) NULL,
primary_agent VARCHAR(255) NULL,
price FLOAT NULL,
gross_commission FLOAT NULL,
company_dollar FLOAT NULL,
address VARCHAR(255) NULL,
city VARCHAR(255) NULL,
state VARCHAR(10) NULL,
zip VARCHAR(10) NULL,
CONSTRAINT closes_ibfk_1
FOREIGN KEY (lead_id) REFERENCES leads (id)
)
ENGINE = InnoDB;
CREATE INDEX lead_id
ON closes (lead_id);
I should mention that I am inserting the data with a python web scraper and SQLAlchemy. If the data is not scraped it will be None on insert.
Here is a screenshot of datagrip showing a null value in the row
EDIT
Alright so I went ahead and ran the following on some of the entries in the table where the value was already <null>
UPDATE closes
SET close_date = NULL
WHERE
lead_id = <INTEGERVAL>
;
What is interesting now is that when running the original query I do actually return the 2 records that I ran the update query for (the expected outcome). This would lead me to believe that the issues is with how my SQLAlchemy model is mapping the values on insert.
models.py
class Close(db.Model, ItemMixin):
__tablename__ = 'closes'
id = db.Column(db.Integer, primary_key=True)
lead_id = db.Column(db.Integer, db.ForeignKey('leads.id'), nullable=False)
close_date = db.Column(db.DateTime)
close_type = db.Column(db.String(255))
primary_agent = db.Column(db.String(255))
price = db.Column(db.Float)
gross_commission = db.Column(db.Float)
company_dollar = db.Column(db.Float)
address = db.Column(db.String(255))
city = db.Column(db.String(255))
state = db.Column(db.String(10))
zip = db.Column(db.String(10))
def __init__(self, item):
self.build_from_item(item)
def build_from_item(self, item):
for k, v in item.items():
setattr(self, k, v)
But I am fairly confident the value is a python None in the event no value is scraped from the website. My understanding is that SQLAlchemy would map a None to NULL on insert and given that nullable=True is the default setting, which can seen on the generated DDL, I am still at a loss as to why it appears to be NULL when in reality it is not behaving that way.
EDIT 2
Only place where I think the issue would be happening is where my spider actually scrapes the data and assigns it to the Item which is shown below
closes.py
# item['close_date'] = None at this point
try:
item['close_date'] = arrow.get(item['close_date'], 'MMM D, YYYY').format('YYYY-MM-DD')
except ParserError as e:
# Maybe item['close_date'] = None here?
spider.logger.error(f'Parse error: {item["close_date"]} - {e}')
In the python code I've written this would appear to be the place where the issue would arise. But if arrow.get throws an exception the value of item['close_date'] should still be None. If that is not the case and even if it is it does not explain why it appears that the record value is NULL even thought it does not behave like it is.
I'm guessing that you're having an issue with the join, not the NULL value. The query below returns 1 result for me. More info about your data, the software used for querying (I tested with SQL Yog), and applicable versions might help.
EDIT
It could be that you're having issues with MySQL's 'zero date'.
https://dev.mysql.com/doc/refman/5.7/en/date-and-time-types.html
MySQL permits you to store a “zero” value of '0000-00-00' as a “dummy
date.” This is in some cases more convenient than using NULL values,
and uses less data and index space. To disallow '0000-00-00', enable
the NO_ZERO_DATE mode.
I've updated the SQL data below to include a zero date in the INSERT and SELECT's WHERE.
DROP TABLE IF EXISTS closes;
DROP TABLE IF EXISTS leads;
CREATE TABLE leads (
id INT(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
) ENGINE=INNODB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
INSERT INTO leads(id) VALUES (1),(2),(3);
CREATE TABLE closes (
id INT(11) NOT NULL AUTO_INCREMENT,
lead_id INT(11) NOT NULL,
close_date DATETIME DEFAULT NULL,
close_type VARCHAR(255) DEFAULT NULL,
primary_agent VARCHAR(255) DEFAULT NULL,
price FLOAT DEFAULT NULL,
gross_commission FLOAT DEFAULT NULL,
company_dollar FLOAT DEFAULT NULL,
address VARCHAR(255) DEFAULT NULL,
city VARCHAR(255) DEFAULT NULL,
state VARCHAR(10) DEFAULT NULL,
zip VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (id),
KEY lead_id (lead_id),
CONSTRAINT closes_ibfk_1 FOREIGN KEY (lead_id) REFERENCES leads (id)
) ENGINE=INNODB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1;
INSERT INTO closes(id,lead_id,close_date,close_type,primary_agent,price,gross_commission,company_dollar,address,city,state,zip)
VALUES
(1,3,'0000-00-0000',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(2,1,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL),
(3,2,'2018-01-09 17:01:44',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
SELECT
*
FROM
leads AS l
JOIN closes c ON l.id = c.lead_id
WHERE
c.close_date IS NULL OR c.close_date = '0000-00-00';

MariaDB duplicates being inserted

I have the following Python code to check if a MariaDB record exists already, and then inserting. However, I am having duplicates being inserted. Is there something wrong with the code, or is there a better way to do it? I'm new to using Python-MariaDB.
import mysql.connector as mariadb
from hashlib import sha1
mariadb_connection = mariadb.connect(user='root', password='', database='tweets_db')
# The values below are retrieved from Twitter API using Tweepy
# For simplicity, I've provided some sample values
id = '1a23bas'
tweet = 'Clear skies'
longitude = -84.361549
latitude = 34.022003
created_at = '2017-09-27'
collected_at = '2017-09-27'
collection_type = 'stream'
lang = 'us-en'
place_name = 'Roswell'
country_code = 'USA'
cronjob_tag = 'None'
user_id = '23abask'
user_name = 'tsoukalos'
user_geoenabled = 0
user_lang = 'us-en'
user_location = 'Roswell'
user_timezone = 'American/Eastern'
user_verified = 1
tweet_hash = sha1(tweet).hexdigest()
cursor = mariadb_connection.cursor(buffered=True)
cursor.execute("SELECT Count(id) FROM tweets WHERE tweet_hash = %s", (tweet_hash,))
if cursor.fetchone()[0] == 0:
cursor.execute("INSERT INTO tweets(id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified) VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)", (id,tweet,tweet_hash,longitude,latitude,created_at,collected_at,collection_type,lang,place_name,country_code,cronjob_tag,user_id,user_name,user_geoenabled,user_lang,user_location,user_timezone,user_verified))
mariadb_connection.commit()
cursor.close()
else:
cursor.close()
return
Below is the code for the table.
CREATE TABLE tweets (
id VARCHAR(255) NOT NULL,
tweet VARCHAR(255) NOT NULL,
tweet_hash VARCHAR(255) DEFAULT NULL,
longitude FLOAT DEFAULT NULL,
latitude FLOAT DEFAULT NULL,
created_at DATETIME DEFAULT NULL,
collected_at DATETIME DEFAULT NULL,
collection_type enum('stream','search') DEFAULT NULL,
lang VARCHAR(10) DEFAULT NULL,
place_name VARCHAR(255) DEFAULT NULL,
country_code VARCHAR(5) DEFAULT NULL,
cronjob_tag VARCHAR(255) DEFAULT NULL,
user_id VARCHAR(255) DEFAULT NULL,
user_name VARCHAR(20) DEFAULT NULL,
user_geoenabled TINYINT(1) DEFAULT NULL,
user_lang VARCHAR(10) DEFAULT NULL,
user_location VARCHAR(255) DEFAULT NULL,
user_timezone VARCHAR(100) DEFAULT NULL,
user_verified TINYINT(1) DEFAULT NULL
);
Add unique constant to tweet_has filed.
alter table tweets modify tweet_hash varchar(255) UNIQUE ;
Every table should have a PRIMARY KEY. Is id supposed to be that? (The CREATE TABLE is not saying so.) A PK is, by definition, UNIQUE, so that would cause an error on inserting a duplicate.
Meanwhile:
Why have a tweet_hash? Simply index tweet.
Don't say 255 when there are specific limits smaller than that.
user_id and user_name should be in another "lookup" table, not both in this table.
Does user_verified belong with the user? Or with each tweet?
If you are expecting millions of tweets, this table needs to be made smaller and indexed -- else you will run into performance problems.

SQLite3 DatabaseError: malformed database schema

Trying to run sydent on Debian Jessie using SQLite version 3.18.0, but receiving an error.
(sydent)gooseberry#servername:/opt/sydent# python -m sydent.sydent
INFO:sydent.db.sqlitedb:Using DB file sydent.db
WARNING:sydent.http.httpcommon:No HTTPS private key / cert found: not starting replication server or doing replication pushes
INFO:sydent.http.httpserver:Starting Client API HTTP server on port 8090
INFO:twisted:Site starting on 8090
INFO:twisted:Starting factory <twisted.web.server.Site instance at 0x7fda3b6c2950>
Unhandled error in Deferred:
CRITICAL:twisted:Unhandled error in Deferred:
CRITICAL:twisted:
Traceback (most recent call last):
File "/opt/sydent/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "/opt/sydent/local/lib/python2.7/site-packages/sydent/replication/pusher.py", line 76, in scheduledPush
peers = self.peerStore.getAllPeers()
File "/opt/sydent/local/lib/python2.7/site-packages/sydent/db/peers.py", line 52, in getAllPeers
res = cur.execute("select p.name, p.port, p.lastSentVersion, pk.alg, pk.key from peers p, peer_pubkeys pk "
DatabaseError: malformed database schema (medium_lower_address) - near "(": syntax error
^CINFO:twisted:Received SIGINT, shutting down.
below is the output from select * from sqlite_master;
table|invite_tokens|invite_tokens|2|CREATE TABLE invite_tokens (
id integer primary key,
medium varchar(16) not null,
address varchar(256) not null,
room_id varchar(256) not null,
sender varchar(256) not null,
token varchar(256) not null,
received_ts bigint, -- When the invite was received by us from the homeserver
sent_ts bigint -- When the token was sent by us to the user
)
index|invite_token_medium_address|invite_tokens|3|CREATE INDEX invite_token_medium_address on invite_tokens(medium, address)
index|invite_token_token|invite_tokens|4|CREATE INDEX invite_token_token on invite_tokens(token)
table|ephemeral_public_keys|ephemeral_public_keys|5|CREATE TABLE ephemeral_public_keys(
id integer primary key,
public_key varchar(256) not null,
verify_count bigint default 0,
persistence_ts bigint
)
index|ephemeral_public_keys_index|ephemeral_public_keys|6|CREATE UNIQUE INDEX ephemeral_public_keys_index on ephemeral_public_keys(public_key)
table|peers|peers|7|CREATE TABLE peers (id integer primary key, name varchar(255) not null, port integer default null, lastSentVersion integer, lastPokeSucceededAt integer, active integer not null default 0)
index|name|peers|8|CREATE UNIQUE INDEX name on peers(name)
table|peer_pubkeys|peer_pubkeys|9|CREATE TABLE peer_pubkeys (id integer primary key, peername varchar(255) not null, alg varchar(16) not null, key text not null, foreign key (peername) references peers (peername))
index|peername_alg|peer_pubkeys|10|CREATE UNIQUE INDEX peername_alg on peer_pubkeys(peername, alg)
table|threepid_validation_sessions|threepid_validation_sessions|11|CREATE TABLE threepid_validation_sessions (id integer primary key, medium varchar(16) not null, address varchar(256) not null, clientSecret varchar(32) not null, validated int default 0, mtime bigint not null)
table|threepid_token_auths|threepid_token_auths|12|CREATE TABLE threepid_token_auths (id integer primary key, validationSession integer not null, token varchar(32) not null, sendAttemptNumber integer not null, foreign key (validationSession) references threepid_validations(id))
table|local_threepid_associations|local_threepid_associations|13|CREATE TABLE local_threepid_associations (id integer primary key, medium varchar(16) not null, address varchar(256) not null, mxid varchar(256) not null, ts integer not null, notBefore bigint not null, notAfter bigint not null)
index|medium_address|local_threepid_associations|14|CREATE UNIQUE INDEX medium_address on local_threepid_associations(medium, address)
table|global_threepid_associations|global_threepid_associations|15|CREATE TABLE global_threepid_associations (id integer primary key, medium varchar(16) not null, address varchar(256) not null, mxid varchar(256) not null, ts integer not null, notBefore bigint not null, notAfter integer not null, originServer varchar(255) not null, originId integer not null, sgAssoc text not null)
index|medium_lower_address|global_threepid_associations|16|CREATE INDEX medium_lower_address on global_threepid_associations (medium, lower(address))
index|originServer_originId|global_threepid_associations|17|CREATE UNIQUE INDEX originServer_originId on global_threepid_associations (originServer, originId)
How do I resolve this error? I can't see any issues with the schema.
Expression indexes were added in version 3.9.0.
Apparently, sysdent uses an older SQLite version.

How to return the last primary key after INSERT in pymysql (python3.5)?

CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) COLLATE utf8_bin NOT NULL,
`password` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
AUTO_INCREMENT=1 ;
for example, how to get the primary key id of the last record that I insert into the table by cursor.execute("insert into ...", ...)?
after inserting,You can get it as:
cursor.execute('select LAST_INSERT_ID()') or use cursor.lastrowid

Categories

Resources