I have pandas dataframe with unique number of user:
data_frame = pd.DataFrame({'uniq_num' :['1qw3','2wed','3das','4frr','533ew','612w']})
I want to pass this column to sql query where I use IN operator:
SELECT users FROM database
where users IN ("here I want to pass my dataframe, so it would search in all rows of my dataframe")
I have tried doing this
data_frame = ','.join([str(x) for x in data_frame.iloc[:, 0].tolist()])
which would retrun whith this '1qw3,2wed,3das,4frr,533ew,612w'
and then something like WHERE users in STRING_SPLIT(data_frame, ',') but this one is obviousely doesnt work...
You can convert the list into a tuple, this will give you the correct format
import pandas as pd
data_frame = pd.DataFrame({'uniq_num' :['1qw3','2wed','3das','4frr','533ew','612w']})
in_statement = tuple(data_frame.iloc[:, 0].tolist())
sql = f"""SELECT users FROM database
where users IN {in_statement}"""
output sql variable:
SELECT users FROM database
where users IN ('1qw3', '2wed', '3das', '4frr', '533ew', '612w')
If you are creating SQL query in python then try this,
user_ids = "'"+"','".join(data_frame.uniq_num.values)+"'"
query = "SELECT users FROM database "+\
"WHERE users IN ("+user_ids+")"
Related
I want get a db into pandas df in Python. I use a following code:
self.cursor = self.connection.cursor()
query = """
SELECT * FROM `an_visit` AS `visit`
JOIN `an_ip` AS `ip` ON (`visit`.`ip_id` = `ip`.`ip_id`)
JOIN `an_useragent` AS `useragent` ON (`visit`.`useragent_id` = `useragent`.`useragent_id`)
JOIN `an_pageview` AS `pageview` ON (`visit`.`visit_id` = `pageview`.`visit_id`)
WHERE `visit`.`visit_id` BETWEEN %s AND %s
"""
self.cursor.execute(query, (start_id, end_id))
df = pd.DataFrame(self.cursor.fetchall())
This code works, but I want to get column names as well. I tried this question MySQL: Get column name or alias from query
but this did not work:
fields = map(lambda x: x[0], self.cursor.description)
result = [dict(zip(fields, row)) for row in self.cursor.fetchall()]
How can I get column names from db into df? Thanks
The easy way to include column names within recordset is to set dictionary=True as following:
self.cursor = self.connection.cursor(dictionary=True)
Then, all of fetch(), fetchall() and fetchone() are return dictionary with column name and data
check out links:
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursordict.html
https://mariadb-corporation.github.io/mariadb-connector-python/connection.html
What work to me is:
field_names = [i[0] for i in self.cursor.description ]
the best practice to list all the columns in the database is to execute this query form the connection cursor
SELECT TABLE_CATALOG,TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME,DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='<schema>' AND TABLE_NAME = '<table_name>'
There is a column_names properties in MySql cursor that you can use:
row = dict(zip(self.cursor.column_names, self.cursor.fetchone()))
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-column-names.html
I am trying to use a registered virtual table as a table in a SQL statement using a connection to another database. I can't just turn the column into a string and use that, I need the table/dataframe itself to work in the statement and join with the other tables in the SQL statment. I'm trying this out on an Access database to start. This is what I have so far:
import pyodbc
import pandas as pd
import duckdb
conn = duckdb.connect()
starterset = pd.read_excel (r'e:\Data Analytics\Python_Projects\Applications\DB_Test.xlsx')
conn.register("test_starter", starterset)
IDS = conn.execute("SELECT * FROM test_starter WHERE ProjectID > 1").fetchdf()
StartDate = '1/1/2015'
EndDate = '12/1/2021'
# establish the connection
connt = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=E:\Databases\Offline.accdb;')
cursor = conn.cursor()
# Run the query
query = ("Select ProjectID, Revenue, ClosedDate from Projects INNER JOIN " + IDS + " Z on Z.ProjectID = Projects.ProjectID "
"where ClosedDate between #" + StartDate + "# and #" + EndDate + "# AND Revenue > 0 order by ClosedDate")
sfd
df = pd.read_sql(query, connt)
df.to_excel(r'TEMP.xlsx', index=False)
os.system("start EXCEL.EXE TEMP.xlsx")
# Close the connection
cursor.close()
connt.close()
I have a list of IDs in the excel sheet that I'm trying to use as a filter from the database query. Ultimately, this will form into several criteria from the same table: dates, revenue, and IDs among others.
Honestly, I'm surprised I'm having so much trouble doing this. In SAS, with PROC SQL, it's so easy, but I can't get a dataframe to interface within the SQL parameters how I need it to. Am I making a syntax mistake?
Most common error so far is "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U55'), dtype('<U55')) -> dtype('<U55')", but the types are the same.
It looks like you are pushing the contents of a DataFrame into an Access database query. I don't think there is a native way to do this in Pandas. The technique I use is database vendor specific, but I just build up a text string as either a CTE/WITH Clause or a temporary table.
Ex:
"""WITH my_data as (
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
...
)
[Your original query here]
"""
I have a GUI interacting with my database, and MySQL database has around 50 tables. I need to search each table for a value and return the field and key of the item in each table if it is found. I would like to search for partial matches. ex.( Search Value = "test", "Protest", "Test123" would be matches. Here is my attempt.
def searchdatabase(self, event):
print('Searching...')
self.connect_mysql() #Function to connect to database
d_tables = []
results_list = [] # I will store results here
s_string = "test" #Value I am searching
self.cursor.execute("USE db") # select the database
self.cursor.execute("SHOW TABLES")
for (table_name,) in self.cursor:
d_tables.append(table_name)
#Loop through tables list, get column name, and check if value is in the column
for table in d_tables:
#Get the columns
self.cursor.execute(f"SELECT * FROM `{table}` WHERE 1=0")
field_names = [i[0] for i in self.cursor.description]
#Find Value
for f_name in field_names:
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
print(table)
I get an error on print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
Exception: (1054, "Unknown column 'test' in 'where clause'")
I use a similar insert query that works fine so I am not understanding what the issue is.
ex. insert_query = (f"INSERT INTO `{source_tbl}` ({query_columns}) VALUES ({query_placeholders})")
May be because of single quote you have missed while checking for some columns.
TRY :
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE '{f_name}' LIKE '{s_string}'"))
Have a look -> here
Don’t insert user-provided data into SQL queries like this. It is begging for SQL injection attacks. Your database library will have a way of sending parameters to queries. Use that.
The whole design is fishy. Normally, there should be no need to look for a string across several columns of 50 different tables. Admittedly, sometimes you end up in these situations because of reasons outside your control.
I have two tables:
Customer
Account
I created these table in SQLite and import it to Pyhton through following code
-- Python Pandas code
import sqlite3
import pandas as pd
-- Creating connection
cnx = sqlite3.connect('testdatabase.db')
Customer = pd.read_sql_query("SELECT * FROM Customer", cnx)
Account = pd.read_sql_query("SELECT * FROM Account", cnx)
I want to fetch all state names and total of account balance for the users belonging to those states.
My expected output would be the following table:
One way I can solve this problem is that
data = cnx.execute("SELECT JSON_VALUE(c.address, '$.state') AS State,
SUM(a.account_balance) AS Account_balance
FROM Customer c
INNER JOIN Account a ON c.id = a.user_id
WHERE JSON_VALUE(c.address, '$.state') IS NOT NULL
GROUP BY JSON_VALUE(c.address, '$.state')")
for row in data:
print "State = ", row[0]
print "Account_balance = ", row[1], "\n"
cnx.close()
But I don't want to use this approach instead I want to solve this using pandas library and the dataframe which I create above.
Kindly suggest me with the concept and solutions for performing this operation.
Experts,
I am struggling to find an efficient way to work with pandas and sqlite.
I am building a tool that let's users
extract part of a sql database (sub_table) based on some filters
change part of sub_table
upload changed sub_table back to
overall sql table replacing old values
Users will only see excel data (so I need to write back and forth to excel which is not part of my example as out of scope).
Users can
replace existing rows (entries) with new data
delete existing rows
add new rows
Question: how can I most efficiently do this "replace/delete/add" using Pandas / sqlite3?
Here is my example code. If I use df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace") at the bottom than obviously the entire table is replaced...so there must be another way I cannot think of.
import pandas as pd
import sqlite3
import numpy as np
#### SETTING EXAMPLE UP
### Create DataFrame
data = dict({"City": ["London","Frankfurt","Berlin","Paris","Brondby"],
"Population":[8,2,4,9,0.5]})
df = pd.DataFrame(data,index = pd.Index(np.arange(5)))
### Create SQL DataBase
conn = sqlite3.connect("MyDB.db")
### Upload DataFrame as Table into SQL Database
df.to_sql("MyTable", con = conn, index = False, if_exists="replace")
### Read DataFrame from SQL DB
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
#### CREATE SUB_TABLE AND AMEND
#### EXTRACT sub_table FROM SQL TABLE
query = "SELECT * from MyTable WHERE Population > 2"
df_sub = pd.read_sql_query(query, con = conn)
df_sub
#### Amend Sub DF
df_sub[df_sub["City"] == "London"] = ["Brussel",4]
df_sub
#### Replace new data in SQL DB
df_sub.to_sql("MyTable", con = conn, index = False, if_exists="replace")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con = conn)
Thanks for your help!
Note: I did try to achieve via pure SQL queries but gave up. As I am not an expert on SQL I would want to go with pandas if a solution exists. If not a hint on how to achieve this via sql would be great!
I think there is no way around using SQL queries for this task.
With pandas it is only possible to read a query to a DataFrame and to write a DataFrame to a Database (replace or append).
If you want to update specific values/ rows or want to delete rows, you have to use SQL queries.
Commands you should look into are for example:
UPDATE, REPLACE, INSERT, DELETE
# Update the database, change City to 'Brussel' and Population to 4, for the first row
# (Attention! python indices start at 0, SQL indices at 1)
cur = conn.cursor()
cur.execute('UPDATE MyTable SET City=?, Population=? WHERE ROWID=?', ('Brussel', 4, 1))
conn.commit()
conn.close()
# Display the changes
conn = sqlite3.connect("MyDB.db")
query = "SELECT * from MyTable"
pd.read_sql_query(query, con=conn)
For more examples on sql and pandas you can look at
https://www.dataquest.io/blog/python-pandas-databases/