Python to explore SQL Server - python

When I use visual studio I can connect to an SQL server and explore the databases that the server holds.
Is there a way I can do this with python?
I have created a script that allows me to connect to the server
import adodbapi
conn = adodbapi.connect("PROVIDER=SQLOLEDB;Data Source=<location>;Database=<databaseName>; \
trusted_connection=yes;")
cursor = conn.cursor()
<missing code here>
cursor.close()
conn.close()
This script runs fine, so i assume that the connection is generated fine.
I am hoping to create something like this
for table in sqlserver:
for row in table:
print row["name"]
or is it possible to explore the tables as a dictionary?
I am not asking anyone to write this code for me, but any help to allow me to do this would be appreciated, cheers
Thank you for the responses - I have found a solution to the question I asked.
To get a list of tables found replacing <missing code here> with the following code worked well.
tables = conn.get_table_names()
#prints all table names
for table in tables:
print table
Once I pick a table (in this case called "Parts") I can then view the data in each column. I have used the .fetchone() function just pull one row.
sql = r'SELECT * FROM Parts'
cursor.execute(sql)
rows = cursor.fetchone()
rownames = cursor.columnNames
for rowname in rownames: # iterate through all rows
print str(rowname) + " " + str(rows[rowname])

It sounds like you want to do something like this:
sql = 'SELECT * FROM table'
crsr.execute(sql)
rows = crsr.fetchone()
for row in rows: # iterate through all rows
name = row[0] # replace 0 with row containing names
See the documentation here for more info: http://adodbapi.sourceforge.net/quick_reference.pdf

Related

Upload an entire CSV into SQL Server [duplicate]

Below is my code that I'd like some help with.
I am having to run it over 1,300,000 rows meaning it takes up to 40 minutes to insert ~300,000 rows.
I figure bulk insert is the route to go to speed it up?
Or is it because I'm iterating over the rows via for data in reader: portion?
#Opens the prepped csv file
with open (os.path.join(newpath,outfile), 'r') as f:
#hooks csv reader to file
reader = csv.reader(f)
#pulls out the columns (which match the SQL table)
columns = next(reader)
#trims any extra spaces
columns = [x.strip(' ') for x in columns]
#starts SQL statement
query = 'bulk insert into SpikeData123({0}) values ({1})'
#puts column names in SQL query 'query'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
print 'Query is: %s' % query
#starts curser from cnxn (which works)
cursor = cnxn.cursor()
#uploads everything by row
for data in reader:
cursor.execute(query, data)
cursor.commit()
I am dynamically picking my column headers on purpose (as I would like to create the most pythonic code possible).
SpikeData123 is the table name.
As noted in a comment to another answer, the T-SQL BULK INSERT command will only work if the file to be imported is on the same machine as the SQL Server instance or is in an SMB/CIFS network location that the SQL Server instance can read. Thus it may not be applicable in the case where the source file is on a remote client.
pyodbc 4.0.19 added a Cursor#fast_executemany feature which may be helpful in that case. fast_executemany is "off" by default, and the following test code ...
cnxn = pyodbc.connect(conn_str, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("TRUNCATE TABLE fast_executemany_test")
sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)"
params = [(f'txt{i:06d}',) for i in range(1000)]
t0 = time.time()
crsr.executemany(sql, params)
print(f'{time.time() - t0:.1f} seconds')
... took approximately 22 seconds to execute on my test machine. Simply adding crsr.fast_executemany = True ...
cnxn = pyodbc.connect(conn_str, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("TRUNCATE TABLE fast_executemany_test")
crsr.fast_executemany = True # new in pyodbc 4.0.19
sql = "INSERT INTO fast_executemany_test (txtcol) VALUES (?)"
params = [(f'txt{i:06d}',) for i in range(1000)]
t0 = time.time()
crsr.executemany(sql, params)
print(f'{time.time() - t0:.1f} seconds')
... reduced the execution time to just over 1 second.
Update - May 2022: bcpandas and bcpyaz are wrappers for Microsoft's bcp utility.
Update - April 2019: As noted in the comment from #SimonLang, BULK INSERT under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).
BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...
1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07
... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...
1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07
... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...
1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07
... or a tab-delimited file (where → represents the tab character) ...
1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07
... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:
import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\\__tmp\\biTest.txt' WITH (
FIELDTERMINATOR='\\t',
ROWTERMINATOR='\\n'
);
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()
Note: As mentioned in a comment, executing a BULK INSERT statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.
yes bulk insert is right path for loading large files into a DB. At a glance I would say that the reason it takes so long is as you mentioned you are looping over each row of data from the file which effectively means are removing the benefits of using a bulk insert and making it like a normal insert. Just remember that as it's name implies that it is used to insert chucks of data.
I would remove loop and try again.
Also I'd double check your syntax for bulk insert as it doesn't look correct to me. check the sql that is generated by pyodbc as I have a feeling that it might only be executing a normal insert
Alternatively if it is still slow I would try using bulk insert directly from sql and either load the whole file into a temp table with bulk insert then insert the relevant column into the right tables. or use a mix of bulk insert and bcp to get the specific columns inserted or OPENROWSET.
This problem was frustrating me and I didn't see much improvement using fast_executemany until I found this post on SO. Specifically, Bryan Bailliache's comment regarding max varchar. I had been using SQLAlchemy and even ensuring better datatype parameters did not fix the issue for me; however, switching to pyodbc did. I also took Michael Moura's advice of using a temp table and found it shaved of even more time. I wrote a function in case anyone might find it useful. I wrote it to take either a list or list of lists for the insert. It took my insert of the same data using SQLAlchemy and Pandas to_sql from taking upwards of sometimes 40 minutes down to just under 4 seconds. I may have been misusing my former method though.
connection
def mssql_conn():
conn = pyodbc.connect(driver='{ODBC Driver 17 for SQL Server}',
server=os.environ.get('MS_SQL_SERVER'),
database='EHT',
uid=os.environ.get('MS_SQL_UN'),
pwd=os.environ.get('MS_SQL_PW'),
autocommit=True)
return conn
Insert function
def mssql_insert(table,val_lst,truncate=False,temp_table=False):
'''Use as direct connection to database to insert data, especially for
large inserts. Takes either a single list (for one row),
or list of list (for multiple rows). Can either append to table
(default) or if truncate=True, replace existing.'''
conn = mssql_conn()
cursor = conn.cursor()
cursor.fast_executemany = True
tt = False
qm = '?,'
if isinstance(val_lst[0],list):
rows = len(val_lst)
params = qm * len(val_lst[0])
else:
rows = 1
params = qm * len(val_lst)
val_lst = [val_lst]
params = params[:-1]
if truncate:
cursor.execute(f"TRUNCATE TABLE {table}")
if temp_table:
#create a temp table with same schema
start_time = time.time()
cursor.execute(f"SELECT * INTO ##{table} FROM {table} WHERE 1=0")
table = f"##{table}"
#set flag to indicate temp table was used
tt = True
else:
start_time = time.time()
#insert into either existing table or newly created temp table
stmt = f"INSERT INTO {table} VALUES ({params})"
cursor.executemany(stmt,val_lst)
if tt:
#remove temp moniker and insert from temp table
dest_table = table[2:]
cursor.execute(f"INSERT INTO {dest_table} SELECT * FROM {table}")
print('Temp table used!')
print(f'{rows} rows inserted into the {dest_table} table in {time.time() -
start_time} seconds')
else:
print('No temp table used!')
print(f'{rows} rows inserted into the {table} table in {time.time() -
start_time} seconds')
cursor.close()
conn.close()
And my console results first using a temp table and then not using one (in both cases, the table contained data at the time of execution and Truncate=True):
No temp table used!
18204 rows inserted into the CUCMDeviceScrape_WithForwards table in 10.595500707626343
seconds
Temp table used!
18204 rows inserted into the CUCMDeviceScrape_WithForwards table in 3.810380458831787
seconds
FWIW, I gave a few methods of inserting to SQL Server some testing of my own. I was actually able to get the fastest results by using SQL Server Batches and using pyodbcCursor.execute statements. I did not test the save to csv and BULK INSERT, I wonder how it compares.
Here's my blog on the testing I did:
http://jonmorisissqlblog.blogspot.com/2021/05/python-pyodbc-and-batch-inserts-to-sql.html
adding to Gord Thompson's answer:
# add the below line for controlling batch size of insert
cursor.fast_executemany_rows = batch_size # by default it is 1000

Insert values from list into table (Python, sqlite3)

I have this simple code which I can't make work.
import sqlite3
conn = sqlite3.connect('db\\books')
c = conn.cursor()
col = []
title = input('title')
text = input('text')
tags = input('tags')
col.append(title)
col.append(text)
col.append(tags)
c.executescript("INSERT INTO books (title,text,tags) "
"VALUES (col[0],col[1],col[2])")
The db code (connection and normal insert) works but the problem rise when I want to do what you see above.
The goal I would like to achieve is to let the user insert the data into db (all strings). I don't know if this is the right way to do this...
How can I do this ?
Thanks
One option is to change your last line to:
c.execute("INSERT INTO books (title,text,tags) VALUES (?,?,?)", (col[0], col[1], col[2]))
And then commit and close the connection if you're done making changes:
conn.commit()
conn.close()
This line:
c.executescript("INSERT INTO books (title,text,tags) "
"VALUES (col[0],col[1],col[2])")
Is not a valid SQL. Now since c is defined as a cursor, you can run execute from it directly instead of executescript, which the latter is suppose to create a cursor from a connection and execute a SQL. Just replace that line with the following should work:
c.execute("INSERT INTO books (title,text,tags) "
"VALUES (?,?,?)", col)
The SQL above uses a "qmark style" placeholder, which takes actual value from the parameter list that follows it.

Python SQLite3 How to extract specific tables only

I have a sqlite database that I want to extract specific tables from.
The database has thousands of table names.
I'm interested only in tables that startwith "contact_"
However there are many that are contactOLD, contact#### you name it.
I then need to extract the row data from each "contact_########" table and create a CSV or spreadsheet type document. In total there are 1600 or so with unique names.
I had initially thought I could do this with a sqlite query but could not.
I then tried to write a small script to do this but I could not figure out how to setup conditionals for the cursor.execute to only grab the data from the tables of interest to me.
Any ideas?
Update**
import sqlite3
fname = raw_input("Enter your filename: ")
con = sqlite3.connect(fname)
cursor = con.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
mydata = cursor.fetchall()
for lines in mydata:
print lines
I have been able to get the tables to list. I still need to add a condition to my WHERE for "contact_" When I add it, I get only [] on my print. I think I'm messing something up here.
****Update 2***
Thanks to #Olver W. below who got me on the right track with this.
fname = raw_input("Enter your filename: ")
con = sqlite3.connect(fname)
cursor = con.cursor()
for tablename in cursor.execute("SELECT name FROM sqlite_master WHERE type='table';"):
if tablename[0].startswith('Contacts_'):
tablename = str(tablename[0])
query = "Select * FROM " + tablename
query = str(query)
data = cursor.execute(query)
for items in data:
print items
I'm going to output this to a spreadsheet in some cases or another SQLite database but in my test database it is selecting the appropriate criteria and outputting the rows correctly. I can and will condense it a bit to make it cleaner but it does the trick. Thanks
As mentioned in the comments, you could first query all the table names, then go through them and check if the name condition is fulfilled, which can be done in Python:
for tablename in cur.execute('SELECT name FROM sqlite_master WHERE type="table";'):
if tablename[0].startswith('conn_'):
execute_some_query_using_this_table()

psycopg2: error in display many columns of data in Python connecting to PostgreSQL

I am using psycopg2 to connect to Postgres database and return query results on the screen using Python. I can only return one column of data, not many columns like PSQL does. Please see my code. Where did I do wrong?
Your kind response would be greatly appreciated.
#!/usr/bin/python
import psycopg2
CONNSTR = """
host=localhost
dbname=wa
user=super
password=test
port=5432"""
cxn = psycopg2.connect(CONNSTR)
cur = cxn.cursor()
cur.execute("""SELECT procpid,usename,current_query FROM pg_stat_activity;""")
rows = cur.fetchall()
print "\nShow me the query results:\n"
for row in rows:
print " ", row[1]
I found the answer on initd.org/psycopg/docs/cursor.html. Here is the correct code. Please see the last two lines of code with the changes.
For clarification, the first version of code only returns one column of data. The 2nd version listed below will return, actually it will display/print, all the columns I selected.
#!/usr/bin/python
import psycopg2
CONNSTR = """
host=localhost
dbname=wa
user=super
password=test
port=5432"""
cxn = psycopg2.connect(CONNSTR)
cur = cxn.cursor()
cur.execute("""SELECT procpid,usename,current_query FROM pg_stat_activity;""")
for rows in cur:
print rows
It is not quite clear to me what you are trying to achieve.
If you only need one column of data, then you can filter that out in the SQL query. For instance:
cur.execute("""SELECT procpid FROM pg_stat_activity;""")
Still psycopg2 will return an array of tuples, where each tuple contains the columns you requested for. If you do not want the tuples and just an array of values for your the column-data, then you can convert them using:
colum_data = [r[1] for r in rows]
If this is not what you are asking for, please rephrase your question.

How to write a python program to print out a column of records in database

I need to write a python program to print out the data in a column of a table within database. The database I'm using is postgreSQL. The path of the table would be: server-datastation-data. Now in the table named "data", I have a column name say "column". What should I do to access the database and all the way to the column records, and print it out?
I appreciate any help.
import psycopg
#fill in < > with relevant values
pgconnection = psycopg.connect('dbname=datastation', 'user=<user>')
curs=pgconnection.cursor()
statement = 'SELECT column FROM data'
curs.execute(statement)
records = curs.fetchall()
for record in records:
print record

Categories

Resources