I'm in need of some assistance. I'm attempting to perform SQLLDR from within python. The best method I found was to use subprocess.call. Using the params for another function, I duplicated it within this code.
When I run it, I get the appropriate fields, as expected.
But, the process returns a 1, which is a failure.
I have no additional information and can't locate what could be the problem.
I have verified the data.csv loads into my table from BASH, however python doesn't.
def load_raw():
DATA_FILE='data.csv'
CONTROL_FILE='raw_table.ctl'
LOG_FILE='logfile.log'
BAD_FILE='badfile.log'
DISCARD_FILE='discard.log'
connect_string = os.environ['CONNECT_STRING']
sqlldr_parms='rows=1000 readsize=50000 direct=true columnarrayrows=100 bindsize=500000 streamsize=500000 silent=(HEADER,FEEDBACK)'
parms = {}
parms['userid'] = connect_string
parms['sqlldr'] = sqlldr_parms
parms['data'] = DATA_FILE
parms['control'] = CONTROL_FILE
parms['log'] = LOG_FILE
parms['bad'] = BAD_FILE
parms['discard'] = DISCARD_FILE
cmd = "userid=%(userid)s %(sqlldr)s data=%(data)s control=%(control)s log=%(log)s bad=%(bad)s discard=%(discard)s" % parms
print "cmd is: %s" % cmd
with open('/opt/app/workload/bfapi/bin/stdout.txt', 'wb') as out:
process = call(cmd, shell=True, stdout=out, stderr=out)
print process
cmd is: sqlldr userid=usr/pass rows=1000 readsize=50000 direct=true columnarrayrows=100 bindsize=500000
streamsize=500000 silent=(HEADER,FEEDBACK) data=data.csv control=raw_table.ctl
log=logfile.log bad=badfile.log discard=discard.log
process returns 1
The log files for log, bad and discard are not created
stdout.txt contains
/bin/sh: -c: line 0: syntax error near unexpected token ('
/bin/sh: -c: line 0:sqlldr userid=usr/pass rows=1000 readsize=50000 direct=true columnarrayrows=100
bindsize=500000 streamsize=500000 silent=(HEADER,FEEDBACK) data=data.csv control=raw_table.ctl
log=logfile.log bad=badfile.log discard=discard.log'
data.csv contains
id~name~createdby~createddate~modifiedby~modifieddate
6~mark~margaret~"19-OCT-16 01.03.23.966000 PM"~kyle~"21-OCT-16 03.11.22.256000 PM"
8~jill~margaret~"27-AUG-16 12.10.12.214000 PM"~kyle~"21-OCT-16 04.16.01.171000 PM"
raw_table.ctl
OPTIONS ( SKIP=1)
LOAD DATA
CHARACTERSET UTF8
INTO TABLE RAW_TABLE
FIELDS TERMINATED BY '~' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(ID,
NAME,
CREATED_BY,
CREATED_DATETIME TIMESTAMP,
MODIFIED_BY,
MODIFIED_DATETIME TIMESTAMP)
The error was caused by the silent param. Wrapping it in single quotes allowed the code to work, as here: silent='(HEADER,FEEDBACK)'
Related
Let's say I have this snippet
list_command = 'mongo --host {host} --port {port} ' \
'--username {username} --password {password} --authenticationDatabase {database} < {path}'
def shell_exec(cmd: str):
import subprocess
p = subprocess.call(cmd, shell=True)
return p
Let's say these are the commands I'm trying to run on mongo
use users
show collections
db.base.find().pretty()
If format the string list_command with the appropriate values and pass it to the function with shell=True, it works fine. But I'm trying to avoid it for security purposes.
If I call it with shell=False, I get the following error:
2020-08-31T14:08:49.291+0100 E QUERY [thread1] SyntaxError: missing ; before statement #./mongo/user-01-09-2020:1:4
failed to load: ./mongo/user-01-09-2020
253
Your list_command is a shell command: in particular, it includes input redirection (via < {path}), which is a syntactic feature of the shell. To use it you need shell=True.
If you don’t want to use shell=True, you need to change the way you construct the argument (separate arguments need to be passed as separate items of a list rather than as a single string), and you need to pass the script into the standard input via an explicit pipe, by setting its input parameter:
cmd = ['mongo', '--host', '{host}', '--port', …]
subprocess.run(cmd, input=mongodb_script)
Using input raised the following error: TypeError: init() got an unexpected keyword argument 'input'.
I ended up doing the following:
import subprocess
def shell_exec(cmd: str, stdin=None):
with open(stdin, 'rb') as f:
return subprocess.call(cmd.split(), stdin=f)
I have a C++ command line executable (not made by me) that takes some parameters and then connects to a server and downloads data from a database and outputs it line by line to the command line, it can also be passed a -o myfile.txt parameter to output the lines to a file.
One particular usage returns ~13million lines of data.
Instead of calling this and outputting to a file I call it with the python subprocess module in order to process the output lines and insert into a local sqlite3 table I make, as this suits my needs much better than a text file.
When I call the command line tool outside of python with -o to output to a file it takes 37 minutes due to the large volume of data needing to be processed and it has to download it over a network from the server with this data.
When I call it in python to process the lines into SQL, it takes me ~90 minutes. Almost 2x as much time.
I imagine my code has a bottleneck then and it can't process the incoming stdout lines as fast as it is receiving them, else it wouldn't have such a delay.
Is there an obvious issue in my code or how can I troubleshoot to find where my bottleneck is?
My code is here: https://pastebin.com/9DgmPdzj
The InitialiseTable() function and InitialConfigDone variable are used in order to create the table I need to store the data based on the first line of data content, I check if this InitialTable() has been done and if not then it just inserts the data. It does InitialiseTable for the first line then doesn't do it again.
def CaptureIntoSQL(host,table, InitialConfigDone=False):
"""Starts AdminDataCapture and redirects output to a table in a local SQL database"""
command ="./MyCommandLineTool_x86-64_rhel6_gcc48_cxx11-vstring_mds -u {0} -p {1} -h {2} -t {3} --static -c".format(username,password,host,table)
table_name ="{0}_Table{1}_{2}".format(host,table,GetDate())
process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE)
while True:
cmdoutput = process.stdout.readline().decode("utf-8")
if cmdoutput == '' and process.poll() is not None:
break
if cmdoutput:
data = cmdoutput.split("|") # split the output line into a list
try:
Symbol = data[5]
if Symbol != "<Symbol>" and InitialConfigDone == False: # Run InitialiseTable() function to create the sqlite3 table, if it hasn't been done, once it's done once we don't want to do this again
try:
InitialiseTable(table_name,data)
InitialConfigDone = True
logger.info("Table created for {0}".format(table_name))
except:
logger.info("InitialiseTable(table_name,data) failed:\n table_name: {0} \n data: {1}".format(table_name,data))
elif Symbol != "<Symbol>" and InitialConfigDone == True: # If we've already ran InitialiseTable() begin inserting data.
Permission = data[3]
exchangeCode = data[5].split(".")[-1]
if exchangeCode == "":
exchangeCode = "."
else:
pass
valuelist = data[7::2]
valuelist.insert(0, Permission)
valuelist.insert(0, exchangeCode)
valuelist.insert(0, Symbol)
InsertQuery = "INSERT INTO '{0}' VALUES{1}".format(table_name, vtuple) # my query to insert into my sqlite table
try:
c.execute(InsertQuery, valuelist)
except:
pass
except:
pass
conn.commit()
I've been trying to run a Java program and capture it's STDOUT output to a file from the Python script. The idea is to run test files through my program and check if it matches the answers.
Per this and this SO questions, using subprocess.call is the way to go. In the code below, I am doing subprocess.call(command, stdout=f) where f is the file I opened.
The resulted file is empty and I can't quite understand why.
import glob
test_path = '/path/to/my/testfiles/'
class_path = '/path/to/classfiles/'
jar_path = '/path/to/external_jar/'
test_pattern = 'test_case*'
temp_file = 'res'
tests = glob.glob(test_path + test_pattern) # find all test files
for i, tc in enumerate(tests):
with open(test_path+temp_file, 'w') as f:
# cd into directory where the class files are and run the program
command = 'cd {p} ; java -cp {cp} package.MyProgram {tc_p}'
.format(p=class_path,
cp=jar_path,
tc_p=test_path + tc)
# execute the command and direct all STDOUT to file
subprocess.call(command.split(), stdout=f, stderr=subprocess.STDOUT)
# diff is just a lambda func that uses os.system('diff')
exec_code = diff(answers[i], test_path + temp_file)
if exec_code == BAD:
scream(':(')
I checked the docs for subprocess and they recommended using subprocess.run (added in Python 3.5). The run method returns the instance of CompletedProcess, which has a stdout field. I inspected it and the stdout was an empty string. This explained why the file f I tried to create was empty.
Even though the exit code was 0 (success) from the subprocess.call, it didn't mean that my Java program actually got executed. I ended up fixing this bug by breaking down command into two parts.
If you notice, I initially tried to cd into correct directory and then execute the Java file -- all in one command. I ended up removing cd from command and did the os.chdir(class_path) instead. The command now contained only the string to run the Java program. This did the trick.
So, the code looked like this:
good_code = 0
# Assume the same variables defined as in the original question
os.chdir(class_path) # get into the class files directory first
for i, tc in enumerate(tests):
with open(test_path+temp_file, 'w') as f:
# run the program
command = 'java -cp {cp} package.MyProgram {tc_p}'
.format(cp=jar_path,
tc_p=test_path + tc)
# runs the command and redirects it into the file f
# stores the instance of CompletedProcess
out = subprocess.run(command.split(), stdout=f)
# you can access useful info now
assert out.returncode == good_code
I'm trying to write a json dumps string using linux bash shell echo in a text file. My problem is it removes all double quotes.
example code.
d = {"key": "value"}
"echo %s > /home/user/%s" % (simplejson.dumps(d), 'textfile'))
Output in textfile
{key: value}
It removes all double quotes so I can't load it to json because it is not a valid json anymore.
Thanks
You need to escape quotes for Bash usage:
("echo %s > /home/user/%s" % (simplejson.dumps(d), 'textfile')).replace('"', '\\"')
Since you said you're using paramiko, writing to the file directly is perfect. Edited code to reflect paramiko:
You can write to the file directly after logging onto the server, no need to pass in the bash command (which is hackish as is).
You will need two try-catch's: one to catch any error in opening the file, the other to catch any write in the file. If you'd prefer an exception to be thrown up in either of those cases, remove the try-catch.
import paramiko
*do your ssh stuff to establish an SSH session to server*
sftp = ssh.open_sftp()
try:
file = sftp.file('/home/user/textfile', 'a+')
try:
file.write(simplejson.dumps(d))
except IOError:
...*do some error handling for the write here*
except IOError:
...*do some error handling for being unable to open the file here*
else:
file.close()
sftp.close()
Is there a way to execute a sql script file using cx_oracle in python.
I need to execute my create table scripts in sql files.
PEP-249, which cx_oracle tries to be compliant with, doesn't really have a method like that.
However, the process should be pretty straight forward. Pull the contents of the file into a string, split it on the ";" character, and then call .execute on each member of the resulting array. I'm assuming that the ";" character is only used to delimit the oracle SQL statements within the file.
f = open('tabledefinition.sql')
full_sql = f.read()
sql_commands = full_sql.split(';')
for sql_command in sql_commands:
curs.execute(sql_command)
Another option is to use SQL*Plus (Oracle's command line tool) to run the script. You can call this from Python using the subprocess module - there's a good walkthrough here: http://moizmuhammad.wordpress.com/2012/01/31/run-oracle-commands-from-python-via-sql-plus/.
For a script like tables.sql (note the deliberate error):
CREATE TABLE foo ( x INT );
CREATE TABLER bar ( y INT );
You can use a function like the following:
from subprocess import Popen, PIPE
def run_sql_script(connstr, filename):
sqlplus = Popen(['sqlplus','-S', connstr], stdin=PIPE, stdout=PIPE, stderr=PIPE)
sqlplus.stdin.write('#'+filename)
return sqlplus.communicate()
connstr is the same connection string used for cx_Oracle. filename is the full path to the script (e.g. 'C:\temp\tables.sql'). The function opens a SQLPlus session (with '-S' to silence its welcome message), then queues "#filename" to send to it - this will tell SQLPlus to run the script.
sqlplus.communicate sends the command to stdin, waits for the SQL*Plus session to terminate, then returns (stdout, stderr) as a tuple. Calling this function with tables.sql above will give the following output:
>>> output, error = run_sql_script(connstr, r'C:\temp\tables.sql')
>>> print output
Table created.
CREATE TABLER bar (
*
ERROR at line 1:
ORA-00901: invalid CREATE command
>>> print error
This will take a little parsing, depending on what you want to return to the rest of your program - you could show the whole output to the user if it's interactive, or scan for the word "ERROR" if you just want to check whether it ran OK.
Into cx_Oracle library you can find a method used by tests to load scripts: run_sql_script
I modified this method in my project like this:
def run_sql_script(self, connection, script_path):
cursor = connection.cursor()
statement_parts = []
for line in open(script_path):
if line.strip() == "/":
statement = "".join(statement_parts).strip()
if not statement.upper().startswith('CREATE PACKAGE'):
statement = statement[:-1]
if statement:
try:
cursor.execute(statement)
except Exception as e:
print("Failed to execute SQL:", statement)
print("Error:", str(e))
statement_parts = []
else:
statement_parts.append(line)
The commands into script file must be separated by "/".
I hope it can be of help.