Python Iterate over rows and update SQL Server table - python

My Python code works to this point and returns several rows. I need to take each row and process it in a loop in Python. The first row works fine and does its trick, but the second row never runs. Clearly, I am not looping correctly. I believe I am not iterating over each row in the results. Here is the code:
for row in results:
print(row[0])
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000014693-21-000091\full-submission.txt
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000894189-21-001890\full-submission.txt
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000894189-21-001895\full-submission.txt
for row in results:
with open(row[0],'r') as f:
contents = f.read()
bill = row
for x in range(0, 3):
VanHalen = 'Hello'
cnxn1 = pyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=00010KData;'
'Trusted_Connection=yes;')
curs1 = cnxn1.cursor()
curs1.execute('''
Update EdgarComments SET Comments7 = ? WHERE FullPath = ?
''', (VanHalen,bill))
curs1.commit()
curs1.close()
cnxn1.close()
print(x)
Error: ('HY004', '[HY004] [Microsoft][ODBC SQL Server Driver]Invalid SQL data type (0) (SQLBindParameter)')

The bill variable that you are storing in the FullPath column contains all the rows - is this what you want?
I would normally expect to see the file path (row[0]) being stored given the column name FullPath.
Since this is an Invalid Type error on the binding parameters, you can always check the type of the bill variable before inserting and make sure it is a type that the SQL driver accepts - usually you wanna convert strange types to strings before using them as binding-params.

Related

Python MySQL search entire database for value

I have a GUI interacting with my database, and MySQL database has around 50 tables. I need to search each table for a value and return the field and key of the item in each table if it is found. I would like to search for partial matches. ex.( Search Value = "test", "Protest", "Test123" would be matches. Here is my attempt.
def searchdatabase(self, event):
print('Searching...')
self.connect_mysql() #Function to connect to database
d_tables = []
results_list = [] # I will store results here
s_string = "test" #Value I am searching
self.cursor.execute("USE db") # select the database
self.cursor.execute("SHOW TABLES")
for (table_name,) in self.cursor:
d_tables.append(table_name)
#Loop through tables list, get column name, and check if value is in the column
for table in d_tables:
#Get the columns
self.cursor.execute(f"SELECT * FROM `{table}` WHERE 1=0")
field_names = [i[0] for i in self.cursor.description]
#Find Value
for f_name in field_names:
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
print(table)
I get an error on print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE {f_name} LIKE {s_string}"))
Exception: (1054, "Unknown column 'test' in 'where clause'")
I use a similar insert query that works fine so I am not understanding what the issue is.
ex. insert_query = (f"INSERT INTO `{source_tbl}` ({query_columns}) VALUES ({query_placeholders})")
May be because of single quote you have missed while checking for some columns.
TRY :
print("RESULTS:", self.cursor.execute(f"SELECT * FROM `{table}` WHERE '{f_name}' LIKE '{s_string}'"))
Have a look -> here
Don’t insert user-provided data into SQL queries like this. It is begging for SQL injection attacks. Your database library will have a way of sending parameters to queries. Use that.
The whole design is fishy. Normally, there should be no need to look for a string across several columns of 50 different tables. Admittedly, sometimes you end up in these situations because of reasons outside your control.

Not a duplicate- How to break apart/iterate Python SQL statement WHERE IN, with HUGE list [duplicate]

This question already has answers here:
Selecting rows in SQL database based on value in column matching value in list
(2 answers)
Python call sql-server stored procedure with table valued parameter
(3 answers)
Closed 1 year ago.
I am having trouble executing an SQL statement where I have a huge list of values for my IN statement. I'm trying to come up with a way to iterate through the list and get a combined result set.
The error I am currently getting is: [SQL Server]Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them. (8632) (SQLExecDirectW)')
This is because my IN list is 75k items long.
I'm assuming this needs to chunked up in some way, and looped through?
PseudoCode
List = 75k items
Chunk_Size = 10000
For i in first Chunk_Size of List
df.append = SQL results WHERE IN List of 10k items
This would then loop 8 times appending the results to a dataframe. - I want to be able to define the "chunk size" for testing purposes.
Pieces of real code:
import pyodbc, pandas as pd
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=server;'
'DATABASE=db;'
'UID=Me;'
'PWD=Secret;'
'Trusted_Connection=yes;')
list_of_ids = ['1', 'Hello', '3.4', 'xyz5000'] #etc to 75k
params= "', '".join(list_of_ids)
sql = ("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params))
sql_query = pd.read_sql_query(sql,conn)
I know that using format is bad, but I couldn't get the execute statement to work right.
When I do this:
sql_query2 = cursor.execute("SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ?;",list_of_ids)
I get errors like this: pyodbc.ProgrammingError: ('The SQL contains 1 parameter markers, but 4 parameters were supplied', 'HY000')
Thoughts? Thanks in advance!
SOLUTION EDIT, using Tim's solution below, I was able to accomplish this by doing:
Chunk_Size = 10000
list2 = []
cursor = conn.cursor()
for i in range(0, len(list_of_ids), Chunk_Size):
params= "', '".join(list_of_ids[i:i+Chunk_Size])
sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params)
cursor.execute(sql)
list1 = cursor.fetchall()
list2.append(list1)
from pandas.core.common import flatten
list3 = list(flatten(list2))
match_df = pd.DataFrame.from_records(list3, columns=['Item Number','Name','Part'])
Since pyodbc doesn't support array parameters, there are few alternatives to formatting the request yourself.
for i in range(0, len(list_of_ids), Chunk_Size):
params= "', '".join(list_of_ids[i:i+Chunk_Size])
sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ('{}');".format(params)
sql_query = pd.read_sql_query(sql,conn)
You could allow the substitutions by building the set on the fly like this:
for i in range(0, len(list_of_ids), Chunk_Size):
sublist = list_of_ids[i:i+Chunk_Size]
sql = "SELECT [ItemNumber],[Name],[Part] FROM table1 WHERE Part IN ({});".format(','.join(['?']*len(sublist)))
sql_query = pd.read_sql_query(sql,conn,sublist)
How many parts are there? Would it be quicker to fetch the entire table and filter it in Python?

How to get value from sqlalchemy .execute() method? Python + Sqlalchemy+ MS SQL server

I wrote a method to get the status of a csv file in a SQL Server table. The table has column named CSV_STATUS, and for the particular csv, I'd like my method to give me the value of the CSV status. I wrote the following function:
def return_csv_status_db(db_instance, name_of_db_instance_tabledict, csvfile_path):
table_dict = db_instance[name_of_db_instance_tabledict]
csvfile_name = csvfile_path.name
sql = db.select([table_dict['table'].c.CSV_STATUS]).where(table_dict['table'].c.CSV_FILENAME == csvfile_name)
result = table_dict['engine'].execute(sql)
print(result)
Whenever I print result, it returns: <sqlalchemy.engine.result.ResultProxy object at 0x0000005E642256C8>
How can I extract the value of the select statement?
Take a look at [1].
As I understand it, you need to do the following:
for row in result:
# do what you need to do for each row.
[1] - https://docs.sqlalchemy.org/en/13/core/connections.html

How to get table column-name/header for SQL query in python

I have the data in pandas dataframe which I am storing in SQLITE database using Python. When I am trying to query the tables inside it, I am able to get the results but without the column names. Can someone please guide me.
sql_query = """Select date(report_date), insertion_order_id, sum(impressions), sum(clicks), (sum(clicks)+0.0)/sum(impressions)*100 as CTR
from RawDailySummaries
Group By report_date, insertion_order_id
Having report_date like '2014-08-12%' """
cursor.execute(sql_query)
query1 = cursor.fetchall()
for i in query1:
print i
Below is the output that I get
(u'2014-08-12', 10187, 2024, 8, 0.3952569169960474)
(u'2014-08-12', 12419, 15054, 176, 1.1691244851866613)
What do I need to do to display the results in a tabular form with column names
In DB-API 2.0 compliant clients, cursor.description is a sequence of 7-item sequences of the form (<name>, <type_code>, <display_size>, <internal_size>, <precision>, <scale>, <null_ok>), one for each column, as described here. Note description will be None if the result of the execute statement is empty.
If you want to create a list of the column names, you can use list comprehension like this: column_names = [i[0] for i in cursor.description] then do with them whatever you'd like.
Alternatively, you can set the row_factory parameter of the connection object to something that provides column names with the results. An example of a dictionary-based row factory for SQLite is found here, and you can see a discussion of the sqlite3.Row type below that.
Step 1: Select your engine like pyodbc, SQLAlchemy etc.
Step 2: Establish connection
cursor = connection.cursor()
Step 3: Execute SQL statement
cursor.execute("Select * from db.table where condition=1")
Step 4: Extract Header from connection variable description
headers = [i[0] for i in cursor.description]
print(headers)
Try Pandas .read_sql(), I can't check it right now but it should be something like:
pd.read_sql( Q , connection)
Here is a sample code using cx_Oracle, that should do what is expected:
import cx_Oracle
def test_oracle():
connection = cx_Oracle.connect('user', 'password', 'tns')
try:
cursor = connection.cursor()
cursor.execute('SELECT day_no,area_code ,start_date from dic.b_td_m_area where rownum<10')
#only print head
title = [i[0] for i in cursor.description]
print(title)
# column info
for x in cursor.description:
print(x)
finally:
cursor.close()
if __name__ == "__main__":
test_oracle();

Column names missing from generator with pyodbc

Im using Pyodbc to connect to sqlserver to get few rows. The select query I execute fetches almost 200,000 rows causing a memory issue.
To resolve this issue Im using a generator object, to fetch 5000 rows at any point in time..
The problem with this kind of execution is the generator object. I lose the data column names..
For example, if my table1 has column NAME, through normal exection I can access the result set as result.NAME
but I can't do the same with the generator object..It doesn't allow me to access through column names.
Any inputs will be useful?
Using Cursor.fetchmany() to process query result in batches returns a list of pyodbc.Row objects, which allows reference by column name.
Take these examples of a SQL Server query that returns database names in batches of 5:
Without generator
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}',
server='localhost', database='master',
trusted_connection='yes')
sql = 'select name from sys.databases'
cursor = connection.cursor().execute(sql)
while True:
rows = cursor.fetchmany(5)
if not rows:
break
for row in rows:
print row.name
With generator (modified from sample here)
def rows(cursor, size=5):
while True:
rows = cursor.fetchmany(size)
if not rows:
break
for row in rows:
yield row
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}',
server='localhost', database='master',
trusted_connection='yes')
sql = 'select name from sys.databases'
cursor = connection.cursor().execute(sql)
for row in rows(cursor):
print row.name

Categories

Resources