Python MySQL search entire database for value

Python MySQL search entire database for value - python

I have a GUI interacting with my database, and MySQL database has around 50 tables. I need to search each table for a value and return the field and key of the item in each table if it is found. I would like to search for partial matches. ex.( Search Value = "test", "Protest", "Test123" would be matches. Here is my attempt.
def searchdatabase(self, event):
print('Searching...')
self.connect_mysql() #Function to connect to database
d_tables = []
results_list = [] # I will store results here
s_string = "test" #Value I am searching
self.cursor.execute("USE db") # select the database
self.cursor.execute("SHOW TABLES")
for (table_name,) in self.cursor:
d_tables.append(table_name)
#Loop through tables list, get column name, and check if value is in the column
for table in d_tables:
#Get the columns
self.cursor.execute(f"SELECT * FROM `{table}` WHERE 1=0")
field_names = [i[0] for i in self.cursor.description]
#Find Value
for f_name in field_names:
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
print(table)
I get an error on print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
Exception: (1054, "Unknown column 'test' in 'where clause'")
I use a similar insert query that works fine so I am not understanding what the issue is.
ex. insert_query = (f"INSERT INTO `{source_tbl}` ({query_columns}) VALUES ({query_placeholders})")

May be because of single quote you have missed while checking for some columns.
TRY :
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE '{f_name}' LIKE '{s_string}'"))
Have a look -> here

Don’t insert user-provided data into SQL queries like this. It is begging for SQL injection attacks. Your database library will have a way of sending parameters to queries. Use that.
The whole design is fishy. Normally, there should be no need to look for a string across several columns of 50 different tables. Admittedly, sometimes you end up in these situations because of reasons outside your control.

Related

How to the store a column from a SQL request into a variable?

I want to put the result of each column of the result of my request and store them into separate variables, so I can exploit its results.
I precise this is with a SELECt * and not separate requests.
So, If I do for example:
with connection.cursor() as cursor:
# Read a single record
sql = 'SELECT * FROM table'
cursor.execute(sql)
result = cursor.fetchall()
print(result)
I want to do :
a = [results from column1]
b = [results from column2]
The results should be turned into a row and not be left as a column, to make it a dictionary.
It's probably very simple but I'm new with Python / SQL, thank you.

Trying to dynamically get column names of select query

I'd like to be able to get the column names of my query back from the database so I can then use it to dynamically set the column length and name of my gui table.
So far after establishing a connection I've tried:
mycursor = mydb.cursor()
mycursor.callproc('db1.load_table', args)
for r in mycursor.stored_results():
result = r.fetchall()
column_names = mycursor.column_names
I've also tried using mycursor.description
It didn't work.
What's the right approach here?

Something like this you can :
column_names = [desc[0] for desc in mycursor.description]

Alter query according to user selection in sqlite python

I have a sqlite database named StudentDB which has 3 columns Roll number, Name, Marks. Now I want to fetch only the columns that user selects in the IDE. User can select one column or two or all the three. How can I alter the query accordingly using Python?
I tried:
import sqlite3
sel={"Roll Number":12}
query = 'select * from StudentDB Where({seq})'.format(seq=','.join(['?']*len(sel))),[i for k,i in sel.items()]
con = sqlite3.connect(database)
cur = con.cursor()
cur.execute(query)
all_data = cur.fetchall()
all_data
I am getting:
operation parameter must be str

You should control the text of the query. The where clause shall allways be in the form WHERE colname=value [AND colname2=...] or (better) WHERE colname=? [AND ...] if you want to build a parameterized query.
So you want:
query = 'select * from StudentDB Where ' + ' AND '.join('"{}"=?'.format(col)
for col in sel.keys())
...
cur.execute(query, tuple(sel.values()))

In your code, the query is now a tuple instead of str and that is why the error.
I assume you want to execute a query like below -
select * from StudentDB Where "Roll number"=?
Then you can change the sql query like this (assuming you want and and not or) -
query = "select * from StudentDB Where {seq}".format(seq=" and ".join('"{}"=?'.format(k) for k in sel.keys()))
and execute the query like -
cur.execute(query, tuple(sel.values()))
Please make sure in your code the provided database is defined and contains the database name and studentDB is indeed the table name and not database name.

Python and Pandas to Query API's and update DB

I've been querying a few API's with Python to individually create CSV's for a table.
I would like to try and instead of recreating the table each time, update the existing table with any new API data.
At the moment the way the Query is working, I have a table that looks like this,
From this I am taking the suburbs of each state and copying them into a csv for each different state.
Then using this script I am cleaning them into a list (the api needs the %20 for any spaces),
"%20"
#suburbs = ["want this", "want this (meh)", "this as well (nope)"]
suburb_cleaned = []
#dont_want = frozenset( ["(meh)", "(nope)"] )
for urb in suburbs:
cleaned_name = []
name_parts = urb.split()
for part in name_parts:
if part in dont_want:
continue
cleaned_name.append(part)
suburb_cleaned.append('%20'.join(cleaned_name))
Then taking the suburbs for each state and putting them into this API to return a csv,
timestr = time.strftime("%Y%m%d-%H%M%S")
Name = "price_data_NT"+timestr+".csv"
url_price = "http://mwap.com/api"
string = 'gxg&state='
api_results = {}
n = 0
y = 2
for urbs in suburb_cleaned:
url = url_price + urbs + string + "NT"
print(url)
print(urbs)
request = requests.get(url)
api_results[urbs] = pd.DataFrame(request.json())
n = n+1
if n == y:
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
y = y+2
continue
print("made it through"+urbs)
# print(request.json())
# print(api_results)
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
Then adding the states manually in excel, and combining and cleaning the suburb names.
# use pd.concat
df = pd.concat([act, vic,nsw,SA,QLD,WA]).reset_index().set_index(['key']).rename_axis('suburb').reset_index().set_index(['state'])
# apply lambda to clean the %20
f = lambda s: s.replace('%20', ' ')
df['suburb'] = df['suburb'].apply(f)
and then finally inserting it into a db
engine = create_engine('mysql://username:password#localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Price_historic', conn, if_exists='replace',index=False)
Leading this this sort of output
Now, this is a hek of a process. I would love to simplify it and make the database only update the values that are needed from the API, and not have this much complexity in getting the data.
Would love some helpful tips on achieving this goal - I'm thinking I could do an update on the mysql database instead of insert or something? and with the querying of the API, I feel like I'm overcomplicating it.
Thanks!

I don't see any reason why you would be creating CSV files in this process. It sounds like you can just query the data and then load it into a MySql table directly. You say that you are adding the states manually in excel? Is that data not available through your prior api calls? If not, could you find that information and save it to a CSV, so you can automate that step by loading it into a table and having python look up the values for you?
Generally, you wouldn't want to overwrite the mysql table every time. When you have a table, you can identify the column or columns that uniquely identify a specific record, then create a UNIQUE INDEX for them. For example if your street and price values designate a unique entry, then in mysql you could run:
ALTER TABLE `Price_historic` ADD UNIQUE INDEX(street, price);
After this, your table will not allow duplicate records based on those values. Then, instead of creating a new table every time, you can insert your data into the existing table, with instructions to either update or ignore when you encounter a duplicate. For example:
final_str = "INSERT INTO Price_historic (state, suburb, property_price_id, type, street, price, date) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s) " \
"ON DUPLICATE KEY UPDATE " \
"state = VALUES(state), date = VALUES(date)"
con = pdb.connect(db_host, db_user, db_pass, db_name)
with con:
try:
cur = con.cursor()
cur.executemany(final_str, insert_list)

If the setup you are trying is something for longer term , I would suggest running 2 diff processes in parallel-
Process 1:
Query API 1, obtain required data and insert into DB table, with binary / bit flag that would specify only API 1 has been called.
Process 2:
Run query on DB to obtain all records needed for API call 2 based on binary/bit flag that we set in process 1--> For corresponding data run call 2 and update data back to DB table based on primary Key
Database : I would suggest adding Primary Key as well as [Bit Flag][1] that gives status of different API call statuses. Bit Flag also helps you
- in case you want to double confirm if specific API call has been made for specific record not.
- Expand your project to additional API calls and can still track status of each API call at record level
[1]: Bit Flags: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions014.htm#SQLRF00612

Python Script for SQL Server - Update values with MERGE

I have this python function which inserts to a SQL database. The script is such that every time it is rerun it will have to insert the same row over again in addition to new rows. Eventually I will be changing this so that it only inserts new rows but for now I have to work with some sort of update statement.
I'm aware that I can use MERGE in SQL Server to achieve something similar to MySQL's ON DUPLICATE KEY UPDATE, but I'm not exactly sure how it should be used. Any advice is welcome. Thanks!
def sqlInsrt(headers, values):
#create string input of mylisth
strheaders = ','.join(str(i) for i in headers)
#create string ? param's for INSERT clause
placestr = ','.join(i for i in ["?" for i in headers])
#create string ? param's for UPDATE clause
replacestr = ', '.join(['{}=?'.format(h) for h in headers])
#Setup and execute SQL query
insert = ("INSERT INTO "+part+" ("+strheaders+") VALUES ("+placestr+")")
cursor.execute(insert, values)
cnx.commit()

You should read the docs for Merge.
Basically MERGE INTO TargetTable
USING SourceTable
ON TargetTable.id = SourceTable.id
....
Then you can read the docs about using When not marched by Target etc.
So your Python would maybe swap out the table names and joins using params

I wrote a script that solves the simplest case of merging two identically structured tables, one containing new/updated data. This is useful in incremental data imports. You can expand it depending on your needs (eg. if you need a type 2 SCD):
def create_merge_query(
stg_schema: str,
stg_table: str,
schema: str,
table: str,
primary_key: str,
con: pyodbc.Connection,
) -> str:
"""
Create a merge query for the simplest possible upsert scenario:
- updating and inserting all fields
- merging on a single column, which has the same name in both tables
Args:
stg_schema (str): The schema where the staging table is located.
stg_table (str): The table with new/updated data.
schema (str): The schema where the table is located.
table (str): The table to merge into.
primary_key (str): The column on which to merge.
"""
columns_query = f"""
SELECT
col.name
FROM sys.tables AS tab
INNER JOIN sys.columns AS col
ON tab.object_id = col.object_id
WHERE tab.name = '{table}'
AND schema_name(tab.schema_id) = '{schema}'
ORDER BY column_id;
"""
columns_query_result = con.execute(columns_query)
columns = [tup[0] for tup in columns_query_result]
columns_stg_fqn = [f"stg.{col}" for col in columns]
update_pairs = [f"existing.{col} = stg.{col}" for col in columns]
merge_query = f"""
MERGE INTO {schema}.{table} existing
USING {stg_schema}.{stg_table} stg
ON stg.{primary_key} = existing.{primary_key}
WHEN MATCHED
THEN UPDATE SET {", ".join(update_pairs)}
WHEN NOT MATCHED
THEN INSERT({", ".join(columns)})
VALUES({", ".join(columns_stg_fqn)});
"""
return merge_query

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python MySQL search entire database for value - python

May be because of single quote you have missed while checking for some columns. TRY : print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE '{f_name}' LIKE '{s_string}'")) Have a look -> here

Related

How to the store a column from a SQL request into a variable?

Trying to dynamically get column names of select query

Alter query according to user selection in sqlite python

Python and Pandas to Query API's and update DB

Python Script for SQL Server - Update values with MERGE

Categories

Resources