I have a SQLite database created with SQLAlchemy which has the format below
Code Names Amt1 Amt2 Amt3 Amt4
502 Real Management 11.4 1.3 - -
5TG 80Commercial 85.8 152 4.845 4.12%
AZG Equipment Love 11.6 117.1 - -
But when I tried to read this into a pandas dataframe by using
pandas.read_sql('sqlite_table', con=engine)
It returns me with an error ValueError: could not convert string to float: '-'
I get that pandas can't register - in the Dataframe, but how can I work around this? Is it possible to read it - as 0 or something??
Update all rows in Amt3 with - (if you have set up your login auths and defined cursor) will be something like this:
cur.execute("UPDATE sqlite_table SET Amt3 = 0 WHERE Amt3 = '-'")
This seems to work fine for me, even with -, what is the type of your Atm3?
import pandas as pd
import sqlite3
con = sqlite3.connect(r"/Users/hugohonorem/Desktop/mytable.db") #create database
conn = sqlite3.connect('mytable.db') #connect to database
c = con.cursor() #set
c.execute('''CREATE TABLE mytable
(Code text, Names text, Atm1 integer, Atm2 integer, Atm3 integer, Atm4 integer)''')
c.execute("INSERT INTO mytable VALUES ('502', 'Real Management', '11.4', '1.3', '-' , '-')")
conn.commit()
So we have replicated your table, we can now remove your -
c.execute("UPDATE mytable SET Atm3 = NULL WHERE Atm3 = '-'") #set it to null
df = pd.read_sql("SELECT * from mytable", con)
print df
This will give us the output:
Code Names Atm1 Atm2 Atm3 Atm4
502 Real Management 11.4 1.3 None -
However, as you can see I can retrieve the table with Atm4 being -
I know this question is old but I ran into the same problem recently and this was the first result on google. What solved my problem was just to change pandas.read_sql('sqlite_table', con=engine) to the full query pandas.read_sql('SELECT * FROM sqlite_table', con=engine) as it seems to bypass the conversion attempt and it wasn't needed to edit the table.
Related
I'm executing a sql command which returns a create table statement provided by my database, PostgreSQL.
In order to execute the sql command I use:
import io
import json
import pandas as pd
import pandas.io.sql as psql
import psycopg2 as pg
import boto3
from datetime import datetime
conn = pg.connect(
host=pgparams['url'],
dbname=pgparams['db'],
user=pgparams['usr'],
password=pgparams['pwd'])
createTable_sql = "postgresql select which returns the create table statement"
df_create_table_script = pd.read_sql_query(createTable_sql ,con=connection)
My scope create a table. The table creation script is returned by PostgreSql after executing via Pandas / Python the "pd.read_sql_query" command.
If I execute the "createTable_sql" in an pgsql interpreter (e.g. pgadmin, etc.) it works fine and as a result I'm having only one column having the expected create table statement, or just a plain string having a length of 512 characters.
The content of "createTable_sql" variable is:
createTable_sql= "SELECT cast ('CREATE TABLE dbo.table1 (" ...
createTable_sql= createTable_sql + "|| string_agg(pa.attname || ' ' || pg_catalog.format_type(pa.atttypid, pa.atttypmod)|| coalesce(' DEFAULT ' || (select pg_catalog.pg_get_expr(d.adbin, d.adrelid) from pg_catalog.pg_attrdef d where d.adrelid = pa.attrelid and d.adnum = pa.attnum and pa.atthasdef), '') || ' ' || case pa.attnotnull when true then 'NOT NULL' else 'NULL' end, ',')"
createTable_sql= createTable_sql + " as column_from_script from pg_catalog.pg_attribute pa join pg_catalog.pg_class pc on pc.oid = pa.attrelid and pc.relname = 'tabl1_source' join pg_catalog.pg_namespace pn on pn.oid = pc.relnamespaceand pn.nspname = 'dbo' where pa.attnum > 0 and not pa.attisdropped group by pn.nspname, pc.relname, pa.attrelid;"
The result when executing this sql command should be :
CREATE TABLE dbo.table1 (col1 datatype, col2, datatype, ....etc) -total number of charters from script is 512.
My issue is that the Pandas read_sql_query, or read_sql has a limitation (or at least is what i think) for the returned data set.
I was expecting the returned or read data set to have 512 characters, but the read_sql method is truncating it.
The result i'm having when i try to access the result returned by Postgresql (db engine) is:
' CREATE TABLE dbo.tabl1 (col1...'
Therefore instead of the full text (representing the table creation script) I'm only having something which is truncated after the first couple of characters.
Initially I assumed it was only truncated when i was using the print() function to get the returned result, but the value itself is truncated as well.
I even tried another approach, such as:
conn = pg.connect(
host=pgparams['url'],
dbname=pgparams['db'],
user=pgparams['usr'],
password=pgparams['pwd'])
sql =createTable_sql
copy_func_csv = "COPY ({sql_cmd}) TO STDOUT WITH CSV {head}".format(sql_cmd=sql, head="HEADER")
cur = conn.cursor()
store = io.StringIO()
cur.copy_expert(copy_func_csv , store)
store.seek(0)
df_new = pd.read_csv(store, engine='python', true_values=[True, 't'],false_values =[False, 'f'])
table_script = df_new .column_from_script.to_string(header=False,index=False)
But the table_script content was still truncated, looking like:
' CREATE TABLE dbo.tabl1 (col1...'
Is there any way I can retrieve a response set, meaning a single column (e.g. Col1) which can have a datatype definition such as Varchar(1000), or STR?
Regards,
If running the same query from using pgadmin works fine, perhaps try running the query directly from psycopg2? Try the following code chunk in place of pandas:
cur = conn.cursor()
cur.execute(createTable_sql)
result = cur.fetchall()
How can I transfer data from one MySQL database to another? The other database may have different field names, except id, which will act as the primary key.
I have tried using mysqlalchemy, but the only data that gets mapped are the filed names that are same in both databases.
import sqlalchemy
db1 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/mydatabase1")
db2 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/nava")
print('Writing...')
query = ''' (SELECT * FROM customers1)'''
df = pd.read_sql(query, db1)
print(df)
#query1 = ''UPDATE 'leap' SET `leap`value '''
df.to_sql('nap', db2, index=False, if_exists='append')
i get error that other database dosent have same field names but what i want is that even if the field names change data still gets mapped with reference to the primary key id
this is the program that i asked about in the above question but there was an error so code hasent appeared in the right way
import pandas as pd
import sqlalchemy
db1 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/mydatabase1")
db2 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/nava")
print('Writing...')
query = ''' (SELECT * FROM customers1)'''
df = pd.read_sql(query, db1)
df.to_sql('nap', db2, index=False, if_exists='append')
Experts,
I am struggling to find an efficient way to work with pandas and sqlite.
I am building a tool that let's users
extract part of a sql database (sub_table) based on some filters
change part of sub_table
upload changed sub_table back to
overall sql table replacing old values
Users will only see excel data (so I need to write back and forth to excel which is not part of my example as out of scope).
Users can
replace existing rows (entries) with new data
delete existing rows
add new rows
Question: how can I most efficiently do this "replace/delete/add" using Pandas / sqlite3?
Here is my example code. If I use df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace") at the bottom than obviously the entire table is replaced...so there must be another way I cannot think of.
import pandas as pd
import sqlite3
import numpy as np
#### SETTING EXAMPLE UP
### Create DataFrame
data = dict({"City": ["London","Frankfurt","Berlin","Paris","Brondby"],
"Population":[8,2,4,9,0.5]})
df = pd.DataFrame(data,index = pd.Index(np.arange(5)))
### Create SQL DataBase
conn = sqlite3.connect("MyDB.db")
### Upload DataFrame as Table into SQL Database
df.to_sql("MyTable", con = conn, index = False, if_exists="replace")
### Read DataFrame from SQL DB
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
#### CREATE SUB_TABLE AND AMEND
#### EXTRACT sub_table FROM SQL TABLE
query = "SELECT * from MyTable WHERE Population > 2"
df_sub = pd.read_sql_query(query, con = conn)
df_sub
#### Amend Sub DF
df_sub[df_sub["City"] == "London"] = ["Brussel",4]
df_sub
#### Replace new data in SQL DB
df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
Thanks for your help!
Note: I did try to achieve via pure SQL queries but gave up. As I am not an expert on SQL I would want to go with pandas if a solution exists. If not a hint on how to achieve this via sql would be great!
I think there is no way around using SQL queries for this task.
With pandas it is only possible to read a query to a DataFrame and to write a DataFrame to a Database (replace or append).
If you want to update specific values/ rows or want to delete rows, you have to use SQL queries.
Commands you should look into are for example:
UPDATE, REPLACE, INSERT, DELETE
# Update the database, change City to 'Brussel' and Population to 4, for the first row
# (Attention! python indices start at 0, SQL indices at 1)
cur = conn.cursor()
cur.execute('UPDATE MyTable SET City=?, Population=? WHERE ROWID=?', ('Brussel', 4, 1))
conn.commit()
conn.close()
# Display the changes
conn = sqlite3.connect("MyDB.db")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con=conn)
For more examples on sql and pandas you can look at
https://www.dataquest.io/blog/python-pandas-databases/
I am trying to take a dataframe and convert it into sql. I am creating the table first to set the unique indexing to allow for a rolling update with out having duplicates if there happens to be two A. Rods over time. Though I can't seem to shake this table column error and i don't know why.
import pandas as pd
import sqlite3 as sq
conn = sq.connect('test.db')
c = conn.cursor()
def set_table():
c.execute("""CREATE TABLE IF NOT EXISTS players(
"#" INTEGER,
" " REAL,
"Named" TEXT,
"B/T" TEXT,
"Ht" TEXT,
"Wt" TEXT,
"DOB" TEXT);""")
conn.commit()
def set_index_table():
c.execute(""" CREATE UNIQUE INDEX index_unique
ON players (Named, DOB)""")
conn.commit()
set_table()
set_index_table()
roster_active = pd.read_html('http://m.yankees.mlb.com/roster',index_col=0)
df = roster_active[0]
df = df.rename(columns={'Name': 'Named'})
df.to_sql('players', conn, if_exists='append')
conn.commit()
conn.close()
sqlite3.OperationalError: table players has no column named
Thank you for your time.
So I am not completely sure why this doesn't work but I found how I could get it to work. I believe it had something to do with the dataframe index. So I defined what columns I wanted to select for the dataframe and that worked.
df = df[['Named','B/T', 'Ht','Wt','DOB']]
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.