PostgresQL Query totally baffling me - python

I'm dealing with a django-silk issue trying to figure out what it won't migrate. It says all migrations complete and then when I run my code it gives me a warning that I still have 8 unapplied migrations, despite double checking with python manage.py migrate --plan. I'm pretty stumped at this point so I began setting up queries to just populate the db with the info already.
And now the query is giving me a syntax error that for the life of me I can't understand! Hoping there are some Postgres masters here who can tell me what I'm missing. Thanks!
Here's the query:
CREATE TABLE IF NOT EXISTS public.silk_request(
id character varying(36) COLLATE pg_catalog."default" NOT NULL,
path character varying(190) COLLATE pg_catalog."default" NOT NULL,
query_params text COLLATE pg_catalog."default" NOT NULL,
raw_body text COLLATE pg_catalog."default" NOT NULL,
body text COLLATE pg_catalog."default" NOT NULL,
method character varying(10) COLLATE pg_catalog."default" NOT NULL,
start_time timestamp with time zone NOT NULL,
view_name character varying(190) COLLATE pg_catalog."default",
end_time timestamp with time zone,
time_taken double precision,
encoded_headers text COLLATE pg_catalog."default" NOT NULL,
meta_time double precision,
meta_num_queries integer,
meta_time_spent_queries double precision,
pyprofile text COLLATE pg_catalog."default" NOT NULL,
num_sql_queries integer NOT NULL,
prof_file character varying(300) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT silk_request_pkey PRIMARY KEY (id)
) TABLESPACE pg_default;
ALTER TABLE
IF EXISTS public.silk_request OWNER to tapappdbuser;CREATE INDEX IF NOT EXISTS silk_request_id_5a356c4f_like ON public.silk_request USING btree (
id COLLATE pg_catalog."default" varchar_pattern_ops ASC NULLS LAST
) TABLESPACE pg_default;CREATE INDEX IF NOT EXISTS silk_request_path_9f3d798e ON public.silk_request USING btree (path COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default;CREATE INDEX IF NOT EXISTS silk_request_path_9f3d798e_like ON public.silk_request USING btree (
path COLLATE pg_catalog."default" varchar_pattern_ops ASC NULLS LAST
) TABLESPACE pg_default;CREATE INDEX IF NOT EXISTS silk_request_start_time_1300bc58 ON public.silk_request USING btree (start_time ASC NULLS LAST) TABLESPACE pg_default;CREATE INDEX IF NOT EXISTS silk_request_view_name_68559f7b ON public.silk_request USING btree (
view_name COLLATE pg_catalog."default" ASC NULLS LAST
) TABLESPACE pg_default;CREATE INDEX IF NOT EXISTS silk_request_view_name_68559f7b_like ON public.silk_request USING btree (
view_name COLLATE pg_catalog."default" varchar_pattern_ops ASC NULLS LAST
) TABLESPACE pg_default;
Thanks!
Update:
Here's the error message. Sorry should've included originally.
ERROR: syntax error at or near "("
LINE 5: ...t" NOT NULL,
CONSTRAINT silk_request_pkey PRIMARY KEY (id))
^
SQL state: 42601
Character: 1015

Related

PyMySql Column Truncated and Duplicate Index Error

Here is my table creation code:
CREATE TABLE `crypto_historical_price2` (
`Ticker` varchar(255) COLLATE latin1_bin NOT NULL,
`Timestamp` varchar(255) COLLATE latin1_bin NOT NULL,
`PerpetualPrice` double DEFAULT NULL,
`SpotPrice` double DEFAULT NULL,
`Source` varchar(255) COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`Ticker`,`Timestamp`,`Source`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
I'm updating stuff in batch with sql statements like the following
sql = "INSERT INTO crypto."+TABLE+"(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s;" % batchdata
where batchdata is just a string of data in "('SOL', '2022-11-03 02:01:00', '31.2725', '31.2875', 'FTX'),('SOL', '2022-11-03 02:02:00', '31.3075', '31.305', 'FTX')
Now my script runs for a bit of time successfully inserting data in to the table but then it barfs with the following errors:
error 1265 data truncated for column PerpetualPrice
and
Duplicate entry 'SOL-2022-11-02 11:00:00-FTX' for key 'primary'
I've tried to solve the second error with
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker), Timestamp = VALUES(Timestamp), PerpetualPrice = VALUES(PerpetualPrice), SpotPrice = VALUES(SpotPrice), Source = VALUES(Source);" % batchdata
and
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker),Timestamp = VALUES(Timestamp),Source = VALUES(Source);" % batchdata
The above 2 attempted remedy runs and doesn't throw an duplicate entry error but it doesn't update the table at all.
If I pause my script a couple of minutes and re-run, the error duplicate error goes away and it updates which even confuses me EVEN more lol.
Any ideas?

Scrapy MySQL foreign key constraint failures

I'm currently working on writing a web scraper for a website called Mountain Project and have come across an issue while inserting items into a MySQL (MariaDB) database.
The basic flow of my crawler is this:
Get response from link extractor (I'm subclassing CrawlSpider)
Extract data from response and send extracted items down the item pipeline
Items get picked up by the SqlPipeline and inserted into the database
An important note about step 2 is that the crawler is sending multiple items down the pipeline. The first item will be the main resource (either a route or an area), and then any items after that will be other important data about that resource. For areas, those items will be data on different weather conditions, for routes those items will be the different grades assigned to those routes.
My area and route tables look like this:
CREATE TABLE `area` (
`area_id` INT(10) UNSIGNED NOT NULL,
`parent_id` INT(10) UNSIGNED NULL DEFAULT NULL,
`name` VARCHAR(200) NOT NULL DEFAULT '',
`latitude` FLOAT(12) NULL DEFAULT -1,
`longitude` FLOAT(12) NULL DEFAULT -1,
`elevation` INT(11) NULL DEFAULT '0',
`link` VARCHAR(300) NOT NULL,
PRIMARY KEY (`area_id`) USING BTREE
)
CREATE TABLE `route` (
`route_id` INT(10) UNSIGNED NOT NULL,
`parent_id` INT(10) UNSIGNED NOT NULL,
`name` VARCHAR(200) NOT NULL DEFAULT '' COLLATE 'utf8_general_ci',
`link` VARCHAR(300) NOT NULL COLLATE 'utf8_general_ci',
`rating` FLOAT(12) NULL DEFAULT '0',
`types` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8_general_ci',
`pitches` INT(3) NULL DEFAULT '0',
`height` INT(5) NULL DEFAULT '0',
`length` VARCHAR(5) NULL DEFAULT NULL COLLATE 'utf8_general_ci',
PRIMARY KEY (`route_id`) USING BTREE,
INDEX `fk_parent_id` (`parent_id`) USING BTREE,
CONSTRAINT `fk_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `mountainproject`.`area` (`area_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
And here's an example of one of my condition tables:
CREATE TABLE `temp_avg` (
`month` INT(2) UNSIGNED NOT NULL,
`area_id` INT(10) UNSIGNED NOT NULL,
`avg_high` INT(3) NOT NULL,
`avg_low` INT(3) NOT NULL,
PRIMARY KEY (`month`, `area_id`) USING BTREE,
INDEX `fk_area_id` (`area_id`) USING BTREE,
CONSTRAINT `fk_area_id` FOREIGN KEY (`area_id`) REFERENCES `mountainproject`.`area` (`area_id`) ON UPDATE CASCADE ON DELETE CASCADE
)
Here's where things start to get troublesome. If I run my crawler and just extract areas, everything works fine. The area is inserted into the database, and all the conditions data gets inserted without a problem. However, when I try to extract areas and routes, I get foreign key constraint failures when trying to insert routes because the area that the route belongs to (parent_id) doesn't exist. Currently to work around this I've been running my crawler twice. Once to extract area data, and once to extract route data. If I do that, everything goes smoothly.
My best guess as to why this doesn't work currently is that the areas that are being inserted haven't been committed and so when I attempt to add a route that belongs to an uncommitted area, it can't find the parent area. This theory quickly falls apart though because I'm able to insert condition data in the same run that I insert the area that the data belongs to.
My insertion code looks like this:
def insert_item(self, table_name, item):
encoded_vals = [self.sql_encode(val) for val in item.values()]
sql = "INSERT INTO %s (%s) VALUES (%s)" % (
table_name,
", ".join(item.keys()),
", ".join(encoded_vals)
)
logging.debug(sql)
self.cursor.execute(sql)
# EDIT: As suggested by #tadman I have moved to using the built in sql value
# encoding. I'm leaving this here because it doesn't affect the issue
def sql_encode(self, value):
"""Encode provided value and return a valid SQL value
Arguments:
value {Any} -- Value to encode
Returns:
str -- SQL encode value as a str
"""
encoded_val = None
is_empty = False
if isinstance(value, str):
is_empty = len(value) == 0
encoded_val = "NULL" if is_empty or value is None else value
if isinstance(encoded_val, str) and encoded_val is not "NULL":
encode_val = encoded_val.replace("\"", "\\\"")
encoded_val = "\"%s\"" % encoded_val
return str(encoded_val)
The rest of the project lives in a GitHub repo if any more code/context is needed

postgresql(aws redshift) error 1204 String length exceeds DDL length

I am trying to import csv into aws redshift( postgresql 8.x) .
The data flow is:
mysql -> parquet files on s3 -> csv files on s3 -> redshift.
Table structure
The mysql table sql:
create table orderitems
(
id char(36) collate utf8_bin not null
primary key,
store_id char(36) collate utf8_bin not null,
ref_type int not null,
ref_id char(36) collate utf8_bin not null,
store_product_id char(36) collate utf8_bin not null,
product_id char(36) collate utf8_bin not null,
product_name varchar(50) null,
main_image varchar(200) null,
price int not null,
count int not null,
logistics_type int not null,
time_create bigint not null,
time_update bigint not null,
...
);
I used same sql to create table in redshift , but it got error while importing csv.
My code import csv to redshift (python)
# parquet is dumpy by sqoop
p2 = 'xxx'
df = pd.read_parquet(path)
with smart_open.smart_open(p2, 'w') as f:
df.to_csv(f, index=False) # python3 default encoding is utf-8
conn = psycopg2.connect(CONN_STRING)
sql="""COPY %s FROM '%s' credentials 'aws_iam_role=%s' region 'cn-north-1'
delimiter ',' FORMAT AS CSV IGNOREHEADER 1 ; commit ;""" % (to_table, p2, AWS_IAM_ROLE)
print(sql)
cur = conn.cursor()
cur.execute(sql)
conn.close()
Got error
By checking STL_LOAD_ERRORS found error on product_name column
row_field_value : .............................................215g/...
err_code: 1204
err_reason: String length exceeds DDL length
The real_value is 伊利畅轻蔓越莓奇亚籽风味发酵乳215g/瓶( chinese) .
So it looks like some encoding problem. Since mysql is utf-8 and the csv is utf-8 too , I don't know what is wrong .
Your column is a varchar data type, with length 50. That's 50 bytes, not 50 characters. The string example you've given looks to be about 16 chinese characters, which are probably 3 bytes each (UTF-8) and four ASCII characters (one byte each), so about 52 bytes. That's longer than the byte length of the column, so the import fails.

sqlalchemy: Insert html table into mysql db

Im new to python (3) and would like to now the following:
I'm trying to collect data via pandas from a website and would like to store the results into a mysql database like:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql://python:"+'pw'+"#localhost/test?charset=utf8")
url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")
devisen = dfs[9] #Select the right table
devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)
I'm receiving the following error:
....
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t' in 'field list'") [SQL: 'INSERT INTO tbl_fx (\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tzum Vortag\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tLetzter Stand\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTageshoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTagestief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochenhoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochentief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tDatum\n\t\t\t\t\t\t\t\n\t\t\t\t, \nAktionen\t\t\t\t) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (('VAE Dirham', '-0,5421%', 45321.0, 45512.0, 45306.0, 46080.0, 38550.0, '20.02.2018 14:29:00', None), ('Armenischer Dram', '-0,0403%', 5965339.0, 5970149.0, 5961011.0, 6043443.0, 5108265.0, '20.02.2018 01:12:00', None), ....
How can sqlalchemy INSERT respective data into table_fx? Problem is the header with the multiple \n and \t.
The mysql table hase the following structur:
(
name varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
bezeichnung varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
diff_vortag varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
last double DEFAULT NULL,
day_high double DEFAULT NULL,
day_low double DEFAULT NULL,
52_week_high double DEFAULT NULL,
52_week_low double DEFAULT NULL,
date_time varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
unnamed varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL
)
Any help is higly welcome.
Thank you very much in advance
Andreas
This should do it. If you convert to a dataframe you can rename columns first. The "dfs" entity you were creating was actually a list of dataframe entities.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql://python:"+'pw'+"#localhost/test?charset=utf8")
url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")
devisen = dfs[9].dropna(axis=0, thresh=4) # Select right table and make a DF
devisen.columns = devisen.columns.str.strip() # Strip extraneous characters
devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)

How to return the last primary key after INSERT in pymysql (python3.5)?

CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) COLLATE utf8_bin NOT NULL,
`password` varchar(255) COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
AUTO_INCREMENT=1 ;
for example, how to get the primary key id of the last record that I insert into the table by cursor.execute("insert into ...", ...)?
after inserting,You can get it as:
cursor.execute('select LAST_INSERT_ID()') or use cursor.lastrowid

Categories

Resources