I am using pydobc to connect with my sql server. I have a table from which I need to delete a column.
I can read this table, the code I used to read this is as follows:
import pyodbc
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0}; Server=xyz; database=db; Trusted_Connection=yes;")
cursor = cnxn.cursor()
df = pd.read_sql("select * from [db].[username].[mytable]", cnxn)
df.shape
Above code works as expected. But when I try to drop a column from this table it says can not find the object.
Here is my code trial
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN [TEMP CELCIUS]'
cursor.execute(query)
My question is how to drop this column. To add here this column has a white space in it's name.
Try:
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN "TEMP CELCIUS"'
OR:
query = 'ALTER TABLE [db].[username].[mytable] DROP COLUMN `TEMP CELCIUS`'
Related
I am trying to use a registered virtual table as a table in a SQL statement using a connection to another database. I can't just turn the column into a string and use that, I need the table/dataframe itself to work in the statement and join with the other tables in the SQL statment. I'm trying this out on an Access database to start. This is what I have so far:
import pyodbc
import pandas as pd
import duckdb
conn = duckdb.connect()
starterset = pd.read_excel (r'e:\Data Analytics\Python_Projects\Applications\DB_Test.xlsx')
conn.register("test_starter", starterset)
IDS = conn.execute("SELECT * FROM test_starter WHERE ProjectID > 1").fetchdf()
StartDate = '1/1/2015'
EndDate = '12/1/2021'
# establish the connection
connt = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=E:\Databases\Offline.accdb;')
cursor = conn.cursor()
# Run the query
query = ("Select ProjectID, Revenue, ClosedDate from Projects INNER JOIN " + IDS + " Z on Z.ProjectID = Projects.ProjectID "
"where ClosedDate between #" + StartDate + "# and #" + EndDate + "# AND Revenue > 0 order by ClosedDate")
sfd
df = pd.read_sql(query, connt)
df.to_excel(r'TEMP.xlsx', index=False)
os.system("start EXCEL.EXE TEMP.xlsx")
# Close the connection
cursor.close()
connt.close()
I have a list of IDs in the excel sheet that I'm trying to use as a filter from the database query. Ultimately, this will form into several criteria from the same table: dates, revenue, and IDs among others.
Honestly, I'm surprised I'm having so much trouble doing this. In SAS, with PROC SQL, it's so easy, but I can't get a dataframe to interface within the SQL parameters how I need it to. Am I making a syntax mistake?
Most common error so far is "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U55'), dtype('<U55')) -> dtype('<U55')", but the types are the same.
It looks like you are pushing the contents of a DataFrame into an Access database query. I don't think there is a native way to do this in Pandas. The technique I use is database vendor specific, but I just build up a text string as either a CTE/WITH Clause or a temporary table.
Ex:
"""WITH my_data as (
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
...
)
[Your original query here]
"""
I want to update a column with SQL server query in python, as you see I am updating the relative column as below:
I have a CSV file with some A values of relative table as below:
CSV file: (a.csv)
ART-B-C-ART0015-D-E01
ADC-B-C-ADC00112-V-E01
Python Code: (create Name Value)
ff = pd.read_csv("C:\\a.csv",encoding='cp1252')
ff["Name"]= df["A"].str.extract(r'([a-zA-Z]{3}\d{4,5})') + "-A"
Result of python Code:
ART0015-A
ADC00112-A
Table :
A Name FamilyName
ART-B-C-ART0015-D-E01 NULL ART
ADC-B-C-ADC00112-V-E01 NULL ADC00112
Also A is a column in my table (Not all of the A records but some of them) and based on A value I want to update Name column.
My database is SQL Server and I don't know how to update in Name Column in SQL Server where the A value in the csv file is equal to A in the relative table.
Code in Python:
conn = pyodbc.connect('Driver={SQL Server}; Server=ipaddress; Database=dbname; UID=username; PWD= {password};')
cursor = conn.cursor()
conn.commit()
for row in ff.itertuples():
cursor.execute('''UPDATE database.dbo.tablename SET Name where ?
)
conn.commit()
Expected result in table
A Name FamilyName
ART-B-C-ART0015-D-E01 ART0015-A ART
ADC-B-C-ADC00112-V-E01 ADC00112-A ADC00112
I would use an SQL temp table and inner join to update the values. This will work for only updating a subset of records in your SQL table. It can also be efficient at updating many records.
SQL Cursor
# reduce number of calls to server on inserts
cursor.fast_executemany = True
Create Temporary Table
statement = "CREATE TABLE #temp_tablename(A VARCHAR(200), Name VARCHAR(200))"
cursor.execute(statement)
Insert Values into a Temporary Table
# insert only the key and the updated values
subset = ff[['A','Name']]
# form SQL insert statement
columns = ", ".join(subset.columns)
values = '('+', '.join(['?']*len(subset.columns))+')'
# insert
statement = "INSERT INTO #temp_tablename ("+columns+") VALUES "+values
insert = [tuple(x) for x in subset.values]
cursor.executemany(statement, insert)
Update Values in Main Table from Temporary Table
statement = '''
UPDATE
tablename
SET
u.Name
FROM
tablename AS t
INNER JOIN
#temp_tablename AS u
ON
u.A=t.A;
'''
cursor.execute(statement)
Drop Temporary Table
cursor.execute("DROP TABLE #temp_tablename")
How can I transfer data from one MySQL database to another? The other database may have different field names, except id, which will act as the primary key.
I have tried using mysqlalchemy, but the only data that gets mapped are the filed names that are same in both databases.
import sqlalchemy
db1 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/mydatabase1")
db2 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/nava")
print('Writing...')
query = ''' (SELECT * FROM customers1)'''
df = pd.read_sql(query, db1)
print(df)
#query1 = ''UPDATE 'leap' SET `leap`value '''
df.to_sql('nap', db2, index=False, if_exists='append')
i get error that other database dosent have same field names but what i want is that even if the field names change data still gets mapped with reference to the primary key id
this is the program that i asked about in the above question but there was an error so code hasent appeared in the right way
import pandas as pd
import sqlalchemy
db1 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/mydatabase1")
db2 = sqlalchemy.create_engine("mysql+pymysql://root:#localhost:3306/nava")
print('Writing...')
query = ''' (SELECT * FROM customers1)'''
df = pd.read_sql(query, db1)
df.to_sql('nap', db2, index=False, if_exists='append')
Experts,
I am struggling to find an efficient way to work with pandas and sqlite.
I am building a tool that let's users
extract part of a sql database (sub_table) based on some filters
change part of sub_table
upload changed sub_table back to
overall sql table replacing old values
Users will only see excel data (so I need to write back and forth to excel which is not part of my example as out of scope).
Users can
replace existing rows (entries) with new data
delete existing rows
add new rows
Question: how can I most efficiently do this "replace/delete/add" using Pandas / sqlite3?
Here is my example code. If I use df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace") at the bottom than obviously the entire table is replaced...so there must be another way I cannot think of.
import pandas as pd
import sqlite3
import numpy as np
#### SETTING EXAMPLE UP
### Create DataFrame
data = dict({"City": ["London","Frankfurt","Berlin","Paris","Brondby"],
"Population":[8,2,4,9,0.5]})
df = pd.DataFrame(data,index = pd.Index(np.arange(5)))
### Create SQL DataBase
conn = sqlite3.connect("MyDB.db")
### Upload DataFrame as Table into SQL Database
df.to_sql("MyTable", con = conn, index = False, if_exists="replace")
### Read DataFrame from SQL DB
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
#### CREATE SUB_TABLE AND AMEND
#### EXTRACT sub_table FROM SQL TABLE
query = "SELECT * from MyTable WHERE Population > 2"
df_sub = pd.read_sql_query(query, con = conn)
df_sub
#### Amend Sub DF
df_sub[df_sub["City"] == "London"] = ["Brussel",4]
df_sub
#### Replace new data in SQL DB
df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
Thanks for your help!
Note: I did try to achieve via pure SQL queries but gave up. As I am not an expert on SQL I would want to go with pandas if a solution exists. If not a hint on how to achieve this via sql would be great!
I think there is no way around using SQL queries for this task.
With pandas it is only possible to read a query to a DataFrame and to write a DataFrame to a Database (replace or append).
If you want to update specific values/ rows or want to delete rows, you have to use SQL queries.
Commands you should look into are for example:
UPDATE, REPLACE, INSERT, DELETE
# Update the database, change City to 'Brussel' and Population to 4, for the first row
# (Attention! python indices start at 0, SQL indices at 1)
cur = conn.cursor()
cur.execute('UPDATE MyTable SET City=?, Population=? WHERE ROWID=?', ('Brussel', 4, 1))
conn.commit()
conn.close()
# Display the changes
conn = sqlite3.connect("MyDB.db")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con=conn)
For more examples on sql and pandas you can look at
https://www.dataquest.io/blog/python-pandas-databases/
Python Version - 2.7.6
Pandas Version - 0.17.1
MySQLdb Version - 1.2.5
In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record )
I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.
Now I need to run this
REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)
Question:-
Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.
With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.
I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert
#compiles(Insert)
def replace_string(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
s = s.replace("INSERT INTO", "REPLACE INTO")
return s
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(table.table.insert(replace_string=""), data)
You would pass it like so:
df.to_sql(db, if_exists='append', method=mysql_replace_into)
Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:
def mysql_replace_into(table, conn, keys, data_iter):
from sqlalchemy.dialects.mysql import insert
data = [dict(zip(keys, row)) for row in data_iter]
stmt = insert(table.table).values(data)
update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(),
stmt.inserted.values())))
conn.execute(update_stmt)
Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.
Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password#localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().
It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.
def to_sql_update(df, engine, schema, table):
df.reset_index(inplace=True)
sql = ''' SELECT column_name from information_schema.columns
WHERE table_schema = '{schema}' AND table_name = '{table}' AND
COLUMN_KEY = 'PRI';
'''.format(schema=schema, table=table)
id_cols = [x[0] for x in engine.execute(sql).fetchall()]
id_vals = [df[col_name].tolist() for col_name in id_cols]
sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
for row in zip(*id_vals):
sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
sql += ' OR ({}) '.format(sql_row)
engine.execute(sql)
df.to_sql(table, engine, schema=schema, if_exists='append', index=False)
If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:
df.to_sql(mydb,if_exists='replace')
That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.