Find my bottleneck with calling a command line tool with subprocess?

Find my bottleneck with calling a command line tool with subprocess? - python

I have a C++ command line executable (not made by me) that takes some parameters and then connects to a server and downloads data from a database and outputs it line by line to the command line, it can also be passed a -o myfile.txt parameter to output the lines to a file.
One particular usage returns ~13million lines of data.
Instead of calling this and outputting to a file I call it with the python subprocess module in order to process the output lines and insert into a local sqlite3 table I make, as this suits my needs much better than a text file.
When I call the command line tool outside of python with -o to output to a file it takes 37 minutes due to the large volume of data needing to be processed and it has to download it over a network from the server with this data.
When I call it in python to process the lines into SQL, it takes me ~90 minutes. Almost 2x as much time.
I imagine my code has a bottleneck then and it can't process the incoming stdout lines as fast as it is receiving them, else it wouldn't have such a delay.
Is there an obvious issue in my code or how can I troubleshoot to find where my bottleneck is?
My code is here: https://pastebin.com/9DgmPdzj
The InitialiseTable() function and InitialConfigDone variable are used in order to create the table I need to store the data based on the first line of data content, I check if this InitialTable() has been done and if not then it just inserts the data. It does InitialiseTable for the first line then doesn't do it again.
def CaptureIntoSQL(host,table, InitialConfigDone=False):
"""Starts AdminDataCapture and redirects output to a table in a local SQL database"""
command ="./MyCommandLineTool_x86-64_rhel6_gcc48_cxx11-vstring_mds -u {0} -p {1} -h {2} -t {3} --static -c".format(username,password,host,table)
table_name ="{0}_Table{1}_{2}".format(host,table,GetDate())
process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE)
while True:
cmdoutput = process.stdout.readline().decode("utf-8")
if cmdoutput == '' and process.poll() is not None:
break
if cmdoutput:
data = cmdoutput.split("|") # split the output line into a list
try:
Symbol = data[5]
if Symbol != "<Symbol>" and InitialConfigDone == False: # Run InitialiseTable() function to create the sqlite3 table, if it hasn't been done, once it's done once we don't want to do this again
try:
InitialiseTable(table_name,data)
InitialConfigDone = True
logger.info("Table created for {0}".format(table_name))
except:
logger.info("InitialiseTable(table_name,data) failed:\n table_name: {0} \n data: {1}".format(table_name,data))
elif Symbol != "<Symbol>" and InitialConfigDone == True: # If we've already ran InitialiseTable() begin inserting data.
Permission = data[3]
exchangeCode = data[5].split(".")[-1]
if exchangeCode == "":
exchangeCode = "."
else:
pass
valuelist = data[7::2]
valuelist.insert(0, Permission)
valuelist.insert(0, exchangeCode)
valuelist.insert(0, Symbol)
InsertQuery = "INSERT INTO '{0}' VALUES{1}".format(table_name, vtuple) # my query to insert into my sqlite table
try:
c.execute(InsertQuery, valuelist)
except:
pass
except:
pass
conn.commit()

Related

Frequently check a file while subprocess is writing to it

I have the following piece of code where the c++ executable (run.out) prints out a bunch of info in the runtime using std::cout. This code stores the outputs of run.out into the storage.txt.
storage = open("storage.txt", "w")
shell_cmd = "run.out"
proc = subprocess.Popen([shell_cmd], stdout=storage, stderr=storage)
Once the subprocess starts, I need to frequently check the contents of storage.txt and decide based on what has just been stored in there. How may I do that?

You could use subprocess.poll() which returns immediately and indicates if the subprocess is still running:
while proc.poll() is None:
time.sleep(0.25) # reads the content 4 times a seconds!
data = open("storage.txt").read()
if 'error' in data:
print("failed ...")
# somesomething ...

Running a function and then running another function depending on the result of the 1st one

I have a piece of Python (2.7) code which connects to a mySQL DB and enters content from a csv file (DBCommit()). I want to clear the CSV (ClearCache()) file ONLY if the MySQL commit (DBCommit) is completed successfully. If it isn't (i.e. there was a problem with connection, or accepting the data etc) then the CSV is retained for processing later. Do I use, try, subprocess, threading or multi-threading?
BTW I'm a complete novice at coding! Any pointers would be great!
Code:
*************************************
import subprocess, MySQLdb, os, sys, select, shutil
#***********************************
#Define Functions
#***********************************
#Cache file location - used as a transaction for the DB
CachePath = '/home/pi/GistCache.csv'
#Used as the initial log file. Another process writes to this file.
LogPath = '/home/pi/GistLog.csv'
#Connects to the DB and commits the CachePAth file to the MySQL DB
def DBCommit():
cnx = MySQLdb.connect(user='user',passwd='password',host='ip',db='dbname')
cur = cnx.cursor()
DataImport = ("""LOAD DATA INFILE "/home/pi/GistCache.csv" INTO TABLE PI_GIST_LOG.PI_Gist_Log_Tbl COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (RFID,TimeStamp,MACAddr,IPAddr)""")
cur.execute(DataImport)
cnx.commit()
cnx.close()
#opens the LogPath and Cache Path files, copys the content from LogPath to CachePath
def Append():
with open(LogPath, "r") as logopen:
next(logopen)
for line in logopen:
with open(CachePath, "a") as cacheopen:
cacheopen.write(line)
#Creates a new LogPath File
def NewLogPath():
with open(LogPath, 'w') as Log:
Log.write("ID,RFID,TimeStamp,MAC,IP,\n")
#Deletes the LogPath and Creates a new version
def ClearLogPath():
os.remove(LogPath)
NewLogPath()
#deletes CachePath
def ClearCache():
os.remove(CachePath)
#Reboots the PI
def Restart():
command = "/usr/bin/sudo /sbin/shutdown -r now"
import subprocess
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
output = process.communicate()[0]
print output
#*********************************
#Start Process
#*********************************
#checks if the CachePath file already exists - means the previous code failed to complete. If it does, then check LogPath. If that exists, then append the LogPath to CachePath, Clear log and upload to DB.
#If logPath doesn't exist, create it and reboot.
if os.path.lexists(CachePath):
if os.path.lexists(LogPath):
Append()
ClearLogPath()
*********************
#This is what I want to run first DBConnect.
DBCommit()
#Only if DBconnect is successful, should it run ClearCache
ClearCache()
*********************
else:
NewLogPath()
Restart()
#If Cache doesn't exist, Check if LogPath exists. If it does, rename logpath to cachepath and recreate logpath. Commit to DB. Otherwise is LogPath doesn't exist, create it and reboot the PI
else:
if os.path.lexists(LogPath):
shutil.move(LogPath, CachePath)
NewLogPath()
DBCommit()
ClearCache()
#If Cache and Log don't exist, recreate log and reboot
else:
NewLogPath()
Restart()

You could just use
if DBCommit():
## do stuff
If DBCommit returns anything (except 0, False, etc), it will be considered True, and do stuff.
And the negation: if not DBCommit()
Your code:
def DBCommit():
cnx = MySQLdb.connect(user='user',passwd='password',host='ip',db='dbname')
cur = cnx.cursor()
DataImport = ("""LOAD DATA INFILE "/home/pi/GistCache.csv" INTO TABLE PI_GIST_LOG.PI_Gist_Log_Tbl COLUMNS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (RFID,TimeStamp,MACAddr,IPAddr)""")
cur.execute(DataImport)
cnx.commit()
cnx.close()
return True
....
if os.path.lexists(CachePath):
if os.path.lexists(LogPath):
Append()
ClearLogPath()
*********************
#This is what I want to run first DBConnect.
DBCommit()
#Only if DBconnect is successful, should it run ClearCache
if DBCommit():
ClearCache()
*********************
else:
NewLogPath()
Restart()

What happens with your current code if anything fails in the DBCommit() function ? I'm not going to try it by myself but chances are some exception will be raised, which will interrupt the program execution at this point, thus preventing ClearCache() from being called. If you find out (after testing out all error conditions you can think of) that some possible errors in DBCommit do not raise an exception then you just have to handle these cases in DBCommit and make sure they DO raise an exception.

Using sqlldr in python subprocess.call() fails, but no information

I'm in need of some assistance. I'm attempting to perform SQLLDR from within python. The best method I found was to use subprocess.call. Using the params for another function, I duplicated it within this code.
When I run it, I get the appropriate fields, as expected.
But, the process returns a 1, which is a failure.
I have no additional information and can't locate what could be the problem.
I have verified the data.csv loads into my table from BASH, however python doesn't.
def load_raw():
DATA_FILE='data.csv'
CONTROL_FILE='raw_table.ctl'
LOG_FILE='logfile.log'
BAD_FILE='badfile.log'
DISCARD_FILE='discard.log'
connect_string = os.environ['CONNECT_STRING']
sqlldr_parms='rows=1000 readsize=50000 direct=true columnarrayrows=100 bindsize=500000 streamsize=500000 silent=(HEADER,FEEDBACK)'
parms = {}
parms['userid'] = connect_string
parms['sqlldr'] = sqlldr_parms
parms['data'] = DATA_FILE
parms['control'] = CONTROL_FILE
parms['log'] = LOG_FILE
parms['bad'] = BAD_FILE
parms['discard'] = DISCARD_FILE
cmd = "userid=%(userid)s %(sqlldr)s data=%(data)s control=%(control)s log=%(log)s bad=%(bad)s discard=%(discard)s" % parms
print "cmd is: %s" % cmd
with open('/opt/app/workload/bfapi/bin/stdout.txt', 'wb') as out:
process = call(cmd, shell=True, stdout=out, stderr=out)
print process
cmd is: sqlldr userid=usr/pass rows=1000 readsize=50000 direct=true columnarrayrows=100 bindsize=500000
streamsize=500000 silent=(HEADER,FEEDBACK) data=data.csv control=raw_table.ctl
log=logfile.log bad=badfile.log discard=discard.log
process returns 1
The log files for log, bad and discard are not created
stdout.txt contains
/bin/sh: -c: line 0: syntax error near unexpected token ('
/bin/sh: -c: line 0:sqlldr userid=usr/pass rows=1000 readsize=50000 direct=true columnarrayrows=100
bindsize=500000 streamsize=500000 silent=(HEADER,FEEDBACK) data=data.csv control=raw_table.ctl
log=logfile.log bad=badfile.log discard=discard.log'
data.csv contains
id~name~createdby~createddate~modifiedby~modifieddate
6~mark~margaret~"19-OCT-16 01.03.23.966000 PM"~kyle~"21-OCT-16 03.11.22.256000 PM"
8~jill~margaret~"27-AUG-16 12.10.12.214000 PM"~kyle~"21-OCT-16 04.16.01.171000 PM"
raw_table.ctl
OPTIONS ( SKIP=1)
LOAD DATA
CHARACTERSET UTF8
INTO TABLE RAW_TABLE
FIELDS TERMINATED BY '~' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(ID,
NAME,
CREATED_BY,
CREATED_DATETIME TIMESTAMP,
MODIFIED_BY,
MODIFIED_DATETIME TIMESTAMP)

The error was caused by the silent param. Wrapping it in single quotes allowed the code to work, as here: silent='(HEADER,FEEDBACK)'

Python / Pexpect before output out of sync

I'm using Python/Pexpect to spawn an SSH session to multiple routers. The code will work for one router but then the output of session.before will get out of sync with some routers so that it will return the output from a previous sendline. This seems particularly the case when sending a blank line (sendline()). Anyone got any ideas? Any insight would be really appreciated.
Below is a sample of what I'm seeing:
ssh_session.sendline('sh version')
while (iresult==2):
iresult = ssh_session.expect(['>','#','--More--'],timeout=SESSION_TIMEOUT)
debug_print("execute_1 " + str(iresult))
debug_print("execute_bef " + ssh_session.before)
debug_print("execute_af " + ssh_session.after)
thisoutput = ssh_session.before
output += thisoutput
if(iresult==2):
debug_print("exec MORE")
ssh_session.send(" ")
else:
debug_print("exec: end loop")
for cmd in config_commands:
debug_print("------------------------------------------------\n")
debug_print ("running command " + cmd.strip() + "\n")
iresult=2
ssh_session.sendline(cmd.strip())
while (iresult==2):
iresult = ssh_session.expect([prompt+">",prompt+"#"," --More-- "],timeout=SESSION_TIMEOUT)
thisoutput = ssh_session.before
debug_print("execute_1 " + str(iresult))
debug_print("execute_af " + ssh_session.after)
debug_print("execute_bef " + thisoutput)
thisoutput = ssh_session.before
output += thisoutput
if(iresult==2):
debug_print("exec MORE")
ssh_session.send(" ")
else:
debug_print("exec: end loop")
I get this:
logged in
exec: sh version
execute_1 1
execute_bef
R9
execute_af #
exec: end loop
------------------------------------------------
running command config t
execute_1 1
execute_af #
execute_bef sh version
Cisco IOS Software, 1841 Software (C1841-IPBASEK9-M), Version 15.1(4)M4, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport...

I've run into this before with pexpect (and I'm trying to remember how I worked around it).
You can re-synchronize with the terminal session by sending a return and then expecting for the prompt in a loop. When the expect times out then you know that you are synchronized.
The root cause is probably that you are either:
Calling send without a match expect (because you don't care about the output)
Running a command that produces output but expecting for a pattern in the middle of that output and then not to next prompt that is at end of the output. One way to deal with this is to change your expect pattern to "(.+)PROMPT" - this will expect until the next prompt and capture all the output of the command sent (which you can parse in the next step).

I faced a similar problem. I tried waiting for the command to be printed on the screen and the sending enter.
I you want to execute say command 'cmd', then you do:
session.send(cmd)
index = session.expect([cmd, pexpect.TIMEOUT], 1)
session.send('\n')
index = session.expect([whatever you expect])
Worked for me.

I'm not sure this is the root of your problem, but it may be worth a try.
Something I've run into is that when you spawn a session that starts with or lands you in a shell, you have to deal with quirks of the TERM type (vt220, color-xterm, etc.). You will see characters used to move the cursor or change colors. The problem is almost guaranteed to show up with the prompt; the string you are looking for to identify the prompt appears twice because of how color changes are handled (the prompt is sent, then codes to backspace, change the color, then the prompt is sent again... but expect sees both instances of the prompt).
Here's something that handles this, guaranteed to be ugly, hacky, not very Pythonic, and functional:
import pexpect
# wait_for_prompt: handle terminal prompt craziness
# returns either the pexpect.before contents that occurred before the
# first sighting of the prompt, or returns False if we had a timeout
#
def wait_for_prompt(session, wait_for_this, wait_timeout=30):
status = session.expect([wait_for_this, pexpect.TIMEOUT, pexpect.EOF], timeout=wait_timeout)
if status != 0:
print 'ERROR : timeout waiting for "' + wait_for_this + '"'
return False
before = session.before # this is what we will want to return
# now look for and handle any additional sightings of the prompt
while True:
try:
session.expect(wait_for_this, timeout=0.1)
except:
# we expect a timeout here. All is normal. Move along, Citizen.
break # get out of the while loop
return before
s = pexpect.spawn('ssh me#myserver.local')
s.expect('password') # yes, we assume that the SSH key is already there
# and that we will successfully connect. I'm bad.
s.sendline('mypasswordisverysecure') # Also assuming the right password
prompt = 'me$'
wait_for_prompt(s, prompt)
s.sendline('df -h') # how full are my disks?
results = wait_for_prompt(s, prompt)
if results:
print results
sys.exit(0)
else:
print 'Misery. You lose.'
sys.exit(1)

I know this is an old thread, but I didn't find much about this online and I just got through making my own quick-and-dirty workaround for this. I'm also using pexpect to run through a list of network devices and record statistics and so forth, and my pexpect.spawn.before will also get out of sync sometimes. This happens very often on the faster, more modern devices for some reason.
My solution was to write an empty carriage return between each command, and check the len() of the .before variable. If it's too small, it means it only captured the prompt, which means it must be at least one command behind the actual ssh session. If that's the case, the program sends another empty line to move the actual data that I want into the .before variable:
def new_line(this, iteration):
if iteration > 4:
return data
else:
iteration+=1
this.expect(":")
this.sendline(" \r")
data = this.before
if len(data) < 50:
# The numer 50 was chosen because it should be longer than just the hostname and prompt of the device, but shorter than any actual output
data = new_line(this, iteration)
return data
def login(hostname):
this = pexpect.spawn("ssh %s" % hostname)
stop = this.expect([pexpect.TIMEOUT,pexpect.EOF,":"], timeout=20)
if stop == 2:
try:
this.sendline("\r")
this.expect(":")
this.sendline("show version\r")
version = new_line(this,0)
this.expect(":")
this.sendline("quit\r")
return version
except:
print 'failed to execute commands'
this.kill(0)
else:
print 'failed to login'
this.kill(0)
I accomplish this by a recursive command that will call itself until the .before variable finally captures the command's output, or until it calls itself 5 times, in which case it simply gives up.

execute a sql script file from cx_oracle?

Is there a way to execute a sql script file using cx_oracle in python.
I need to execute my create table scripts in sql files.

PEP-249, which cx_oracle tries to be compliant with, doesn't really have a method like that.
However, the process should be pretty straight forward. Pull the contents of the file into a string, split it on the ";" character, and then call .execute on each member of the resulting array. I'm assuming that the ";" character is only used to delimit the oracle SQL statements within the file.
f = open('tabledefinition.sql')
full_sql = f.read()
sql_commands = full_sql.split(';')
for sql_command in sql_commands:
curs.execute(sql_command)

Another option is to use SQL*Plus (Oracle's command line tool) to run the script. You can call this from Python using the subprocess module - there's a good walkthrough here: http://moizmuhammad.wordpress.com/2012/01/31/run-oracle-commands-from-python-via-sql-plus/.
For a script like tables.sql (note the deliberate error):
CREATE TABLE foo ( x INT );
CREATE TABLER bar ( y INT );
You can use a function like the following:
from subprocess import Popen, PIPE
def run_sql_script(connstr, filename):
sqlplus = Popen(['sqlplus','-S', connstr], stdin=PIPE, stdout=PIPE, stderr=PIPE)
sqlplus.stdin.write('#'+filename)
return sqlplus.communicate()
connstr is the same connection string used for cx_Oracle. filename is the full path to the script (e.g. 'C:\temp\tables.sql'). The function opens a SQLPlus session (with '-S' to silence its welcome message), then queues "#filename" to send to it - this will tell SQLPlus to run the script.
sqlplus.communicate sends the command to stdin, waits for the SQL*Plus session to terminate, then returns (stdout, stderr) as a tuple. Calling this function with tables.sql above will give the following output:
>>> output, error = run_sql_script(connstr, r'C:\temp\tables.sql')
>>> print output
Table created.
CREATE TABLER bar (
*
ERROR at line 1:
ORA-00901: invalid CREATE command
>>> print error
This will take a little parsing, depending on what you want to return to the rest of your program - you could show the whole output to the user if it's interactive, or scan for the word "ERROR" if you just want to check whether it ran OK.

Into cx_Oracle library you can find a method used by tests to load scripts: run_sql_script
I modified this method in my project like this:
def run_sql_script(self, connection, script_path):
cursor = connection.cursor()
statement_parts = []
for line in open(script_path):
if line.strip() == "/":
statement = "".join(statement_parts).strip()
if not statement.upper().startswith('CREATE PACKAGE'):
statement = statement[:-1]
if statement:
try:
cursor.execute(statement)
except Exception as e:
print("Failed to execute SQL:", statement)
print("Error:", str(e))
statement_parts = []
else:
statement_parts.append(line)
The commands into script file must be separated by "/".
I hope it can be of help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find my bottleneck with calling a command line tool with subprocess? - python

Related

Frequently check a file while subprocess is writing to it

Running a function and then running another function depending on the result of the 1st one

Using sqlldr in python subprocess.call() fails, but no information

Python / Pexpect before output out of sync

execute a sql script file from cx_oracle?

Categories

Resources