sqlalchemy: Insert html table into mysql db - python

Im new to python (3) and would like to now the following:
I'm trying to collect data via pandas from a website and would like to store the results into a mysql database like:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql://python:"+'pw'+"#localhost/test?charset=utf8")
url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")
devisen = dfs[9] #Select the right table
devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)
I'm receiving the following error:
....
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t' in 'field list'") [SQL: 'INSERT INTO tbl_fx (\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tBezeichnung\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tzum Vortag\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tLetzter Stand\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTageshoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tTagestief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochenhoch\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t52-Wochentief\n\t\t\t\t\t\t\t\n\t\t\t\t, \n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\tDatum\n\t\t\t\t\t\t\t\n\t\t\t\t, \nAktionen\t\t\t\t) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)'] [parameters: (('VAE Dirham', '-0,5421%', 45321.0, 45512.0, 45306.0, 46080.0, 38550.0, '20.02.2018 14:29:00', None), ('Armenischer Dram', '-0,0403%', 5965339.0, 5970149.0, 5961011.0, 6043443.0, 5108265.0, '20.02.2018 01:12:00', None), ....
How can sqlalchemy INSERT respective data into table_fx? Problem is the header with the multiple \n and \t.
The mysql table hase the following structur:
(
name varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
bezeichnung varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
diff_vortag varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
last double DEFAULT NULL,
day_high double DEFAULT NULL,
day_low double DEFAULT NULL,
52_week_high double DEFAULT NULL,
52_week_low double DEFAULT NULL,
date_time varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
unnamed varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL
)
Any help is higly welcome.
Thank you very much in advance
Andreas

This should do it. If you convert to a dataframe you can rename columns first. The "dfs" entity you were creating was actually a list of dataframe entities.
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql://python:"+'pw'+"#localhost/test?charset=utf8")
url = r'http://www.boerse-frankfurt.de/devisen'
dfs = pd.read_html(url,header=0,index_col=0,encoding="UTF-8")
devisen = dfs[9].dropna(axis=0, thresh=4) # Select right table and make a DF
devisen.columns = devisen.columns.str.strip() # Strip extraneous characters
devisen.to_sql(name='table_fx', con=engine, if_exists='append', index=False)

Related

PyMySql Column Truncated and Duplicate Index Error

Here is my table creation code:
CREATE TABLE `crypto_historical_price2` (
`Ticker` varchar(255) COLLATE latin1_bin NOT NULL,
`Timestamp` varchar(255) COLLATE latin1_bin NOT NULL,
`PerpetualPrice` double DEFAULT NULL,
`SpotPrice` double DEFAULT NULL,
`Source` varchar(255) COLLATE latin1_bin NOT NULL,
PRIMARY KEY (`Ticker`,`Timestamp`,`Source`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin
I'm updating stuff in batch with sql statements like the following
sql = "INSERT INTO crypto."+TABLE+"(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s;" % batchdata
where batchdata is just a string of data in "('SOL', '2022-11-03 02:01:00', '31.2725', '31.2875', 'FTX'),('SOL', '2022-11-03 02:02:00', '31.3075', '31.305', 'FTX')
Now my script runs for a bit of time successfully inserting data in to the table but then it barfs with the following errors:
error 1265 data truncated for column PerpetualPrice
and
Duplicate entry 'SOL-2022-11-02 11:00:00-FTX' for key 'primary'
I've tried to solve the second error with
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker), Timestamp = VALUES(Timestamp), PerpetualPrice = VALUES(PerpetualPrice), SpotPrice = VALUES(SpotPrice), Source = VALUES(Source);" % batchdata
and
sql = "INSERT INTO crypto.crypto_historical_price2(Ticker,Timestamp,PerpetualPrice,SpotPrice,Source) VALUES %s ON DUPLICATE KEY UPDATE Ticker = VALUES(Ticker),Timestamp = VALUES(Timestamp),Source = VALUES(Source);" % batchdata
The above 2 attempted remedy runs and doesn't throw an duplicate entry error but it doesn't update the table at all.
If I pause my script a couple of minutes and re-run, the error duplicate error goes away and it updates which even confuses me EVEN more lol.
Any ideas?

mysql.connector.errors.DataError: 1406 (22001): Data too long for column

I have a problem with storing in MySQL db a PDF file made with Reportlab library. Here's my code:
def insertIntoDb(pdfFullPath,name,surname,gravity):
print('PRIMA DEL MYSQL')
print('pdf full path'+pdfFullPath)
mydb = mysql.connector.connect(host="localhost", user="root", passwd="", database="deepface")
with open(pdfFullPath,'rb') as pdfvar:
blob = pdfvar.read()
print(blob)
sqlQuery = "INSERT INTO diagnosi(name,surname,pdf,gravity) VALUES (%s,%s,%s,%s)"
mycursor = mydb.cursor()
val = (name,surname,blob,gravity,)
mycursor.execute(sqlQuery,val)
mydb.commit()
mycursor.close()
mydb.close()
The console says:
mysql.connector.errors.DataError: 1406 (22001): Data too long for column 'pdf' at row 1
I have already set max allowed packet in mysql configuration file, but the problem is that when I try to print (I know that i can't) the PDF I get this:
^\\7:[,1qq,N_Sd$dm-:XU2/Pga=O1f/`hY7X1nrca).:_\'-4,n*"L5r,CHFpGo:"E,MDLu7EW%CFF0$Rl?jT\'6%k%,?AF%UK6ojt/c$<^Xh=;VarY:L8cQYTgj/:CfA/j1=dbU#a<:%D;rDV[)WDu)5*"98A5kkfYAqs0FFVZk[*Mb(Rs?hIk
and another things, how i can store in my db?
I tried to decode with b64 but doesn't work.
> SHOW CREATE TABLE diagnosi;
diagnosi | CREATE TABLE `diagnosi` (
`tempId` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
`surname` varchar(30) NOT NULL,
`pdf` blob NOT NULL,
`gravity` varchar(50) NOT NULL,
PRIMARY KEY (`tempId`)\n) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
The MySQL BLOB datatype is limited to 216 bytes in size. The LONGBLOB datatype can be up to 232 bytes, so change the column type from BLOB to LONGBLOB.
See the storage requirements for string types in the docs.

postgresql(aws redshift) error 1204 String length exceeds DDL length

I am trying to import csv into aws redshift( postgresql 8.x) .
The data flow is:
mysql -> parquet files on s3 -> csv files on s3 -> redshift.
Table structure
The mysql table sql:
create table orderitems
(
id char(36) collate utf8_bin not null
primary key,
store_id char(36) collate utf8_bin not null,
ref_type int not null,
ref_id char(36) collate utf8_bin not null,
store_product_id char(36) collate utf8_bin not null,
product_id char(36) collate utf8_bin not null,
product_name varchar(50) null,
main_image varchar(200) null,
price int not null,
count int not null,
logistics_type int not null,
time_create bigint not null,
time_update bigint not null,
...
);
I used same sql to create table in redshift , but it got error while importing csv.
My code import csv to redshift (python)
# parquet is dumpy by sqoop
p2 = 'xxx'
df = pd.read_parquet(path)
with smart_open.smart_open(p2, 'w') as f:
df.to_csv(f, index=False) # python3 default encoding is utf-8
conn = psycopg2.connect(CONN_STRING)
sql="""COPY %s FROM '%s' credentials 'aws_iam_role=%s' region 'cn-north-1'
delimiter ',' FORMAT AS CSV IGNOREHEADER 1 ; commit ;""" % (to_table, p2, AWS_IAM_ROLE)
print(sql)
cur = conn.cursor()
cur.execute(sql)
conn.close()
Got error
By checking STL_LOAD_ERRORS found error on product_name column
row_field_value : .............................................215g/...
err_code: 1204
err_reason: String length exceeds DDL length
The real_value is 伊利畅轻蔓越莓奇亚籽风味发酵乳215g/瓶( chinese) .
So it looks like some encoding problem. Since mysql is utf-8 and the csv is utf-8 too , I don't know what is wrong .
Your column is a varchar data type, with length 50. That's 50 bytes, not 50 characters. The string example you've given looks to be about 16 chinese characters, which are probably 3 bytes each (UTF-8) and four ASCII characters (one byte each), so about 52 bytes. That's longer than the byte length of the column, so the import fails.

Mysql Python: Not all parameters were used in MySQL statement

So I am trying to input data into my table as variables but I seem to keep on getting the error mysql.connector.errors.ProgrammingError: Not all parameters were used in the SQL statement, I have used the %s instead of the row names but I seem to keep on getting the same error. I suspect it is my syntax but I cannot seem to figure it out btw this is my first time using python and MySQL together
import mysql.connector
Sensor_ID = "test"
Location = "room"
Sensor_IP = "192.168.1.1"
Sensor_1 = "10"
Sensor_1_Unit = "*C"
Sensor_2 =""
Sensor_2_Unit = ""
Sensor_3 = ""
Sensor_3_Unit = ""
conn = mysql.connector.connect(user='******', password='********', host='******', database='*****') #blanked my user n pass
mycursor = conn.cursor()
mycursor.execute('SHOW TABLES')
print(mycursor.fetchall())
print ""
mycursor.execute("SHOW VARIABLES LIKE '%version%'")
print "Version:",(mycursor.fetchall())
#works up to here
mycursor.execute("INSERT INTO iot_sensors VALUES (ID, Sensor_ID, Location, Sensor_IP, Sensor_1, Sensor_1_Unit, Sensor_2,Sensor_2_Unit, Sensor_3, Sensor_3_Unit)",(Sensor_ID,Location,Sensor_IP,Sensor_1,Sensor_1_Unit,Sensor_2,Sensor_2_Unit,Sens or_3,Sensor_3_Unit))
conn.commit()
# Sensor_ID,Location,Sensor_IP,Sensor_1,Sensor_1_Unit,Sensor_2,Sensor_2_Unit,Sensor_3,Sensor_3_Unit
CREATE TABLE IoT_Sensors(
ID INT NOT NULL AUTO_INCREMENT,
Sensor_ID VARCHAR (15) NOT NULL,
Location VARCHAR (20) NOT NULL,
Sensor_IP VARCHAR (15) NOT NULL,
Sensor_1 VARCHAR (15) NOT NULL,
Sensor_1_Unit VARCHAR (15) NOT NULL,
Sensor_2 VARCHAR (15),
Sensor_2_Unit VARCHAR (15),
Sensor_3 VARCHAR (15),
Sensor_3_Unit VARCHAR (15),
Time_Stamp TIMESTAMP NOT NULL,
PRIMARY KEY (ID));
You need to put %s into your SQL statement.
mycursor.execute("INSERT INTO iot_sensors VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",(Sensor_ID,Location,Sensor_IP,Sensor_1,Sensor_1_Unit,Sensor_2,Sensor_2_Unit,Sens or_3,Sensor_3_Unit))
See the example in the docs.
Looks like you are missing formatting your actual variables from the insert statement. Try formatting them using one of the known methods, %s or .format method. You are also not using the timestamp, last column, value in your table when inserting. If you just reference the table you will have a column mismatch. You will have to be explicit about what columns you are populating. You could use CURRENT_TIMESTAMP for that since it says NOT NULL for column description.
query = ("""Insert into iot_sensors (Sensor_ID, Location, Sensor_IP, Sensor_1, Sensor_1_Unit, Sensor_2, Sensor_2_Unit, Sensor_3, Sensor_3_Unit) values
('{0}','{1}','{2}','{3}','{4}','{5}','{6}','{7}','{8}');""".format(Sensor_ID,Location,Sensor_IP,Sensor_1,Sensor_1_Unit,Sensor_2,Sensor_2_Unit,Sensor_3,Sensor_3_Unit))
mycursor.execute(query)
or
query = ("""Insert into iot_sensors (Sensor_ID, Location, Sensor_IP, Sensor_1, Sensor_1_Unit, Sensor_2, Sensor_2_Unit, Sensor_3, Sensor_3_Unit) values
(%s,%s,%s,%s,%s,%s,%s,%s,%s);""" % (Sensor_ID,Location,Sensor_IP,Sensor_1,Sensor_1_Unit,Sensor_2,Sensor_2_Unit,Sensor_3,Sensor_3_Unit))
mycursor.execute(query)

pymysql not inserting data; but "autoincrement" increases

this is a follow-up from https://stackoverflow.com/questions/33336963/use-a-python-dictionary-to-insert-into-mysql/33337128#33337128.
import pymysql
conn = pymysql.connect(server, user , password, "db")
cur = conn.cursor()
ORFs={'E7': '562', 'E6': '83', 'E1': '865', 'E2': '2756 '}
table="genome"
cols = ORFs.keys()
vals = ORFs.values()
sql = "INSERT INTO %s (%s) VALUES(%s)" % (
table, ",".join(cols), ",".join(vals))
print sql
print ORFs.values()
cur.execute(sql)
cur.close()
conn.close()
Thanks to Xiaohen, my program works (i.e. it does not throw any errors), but when I go and check the mysql database, the data is not inserted. I noticed that the autoincrement ID column does increase with every failed attempt. So this suggests that I am at least making contact with the database?
As always, any help is much appreciated
EDIT: I included the output from mysql> show create table genome;
| genome | CREATE TABLE `genome` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`state` char(255) DEFAULT NULL,
`CG` text,
`E1` char(25) DEFAULT NULL,
`E2` char(25) DEFAULT NULL,
`E6` char(25) DEFAULT NULL,
`E7` char(25) DEFAULT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1 |
+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Think I figured it out.
I will add the info here in case someone else comes across this question:
I need to add conn.commit() to the script
You can use
try:
cur.execute(sql)
except Exception, e:
print e
If your code is wrong, the exception can tell you.
And it has another question.
the cols and vals are not match.
The values should be
vals = [dict[col] for col in cols]

Categories

Resources