`Argument list too long: '/bin/sh'` - python

I'm trying to invoke tar command via subprocess call from Python.The challenge I have is there are a lot files that get passed on to tar which is causing the command to throw the error Argument list too long: '/bin/sh'
The command I'm running is below
subprocess.call(f"ulimit -s 99999999; tar -cz -f {output_file} {file_list}", cwd=source_dir, shell=True)
To try to overcome the error, I added ulimit which doesn't seem to help. The OS I am running this on is Ubuntu-20.04 & Pyhon version is 3.8
Please could I get help to solve this problem.

ulimit does nothing to lift the kernel constant ARG_MAX which is what you are bumping into here. In fact, the only way to increase it is typically to recompile your kernel.
If your tar supports --files-from -, use that.
subprocess.check_call(
['tar', '-cz', '-f', output_file, '--files-from', '-'],
input='\n'.join(file_list), cwd=source_dir)
I obviously made assumptions about the contents of file_list (in particular, this will break if you have files whose name contains a newline character). Notice also how I avoid shell=True by passing in the command as a list of strings.
Of course, a much better solution for this use case is to use the Python tarfile module to create the tar file; this entirely avoids the need to transmit the list of file names across a process boundary.
import tarfile
with tarfile.open(output_file, "x:gz") as tar:
for name in file_list:
tar.add(name)
The "x:gz" mode of creation triggers an exception if the file already exists (use "w:gz" to simply overwrite).

Related

Python os.system or subprocess calls for command line automation

I would like to be able to call some executables that take in parameters and then dump the output to a file. I've attempted to use both os.system and subprocess calls to no avail. Here is a sample of what I'd like python to execute for me...
c:\directory\executable_program.exe -f w:\directory\input_file.txt > Z\directory\output_file.txt
Notice the absolute paths as I will be traversing hundreds of various directories to act on files etc..
Many thanks ahead of time!
Some examples that I've tried:
subprocess.run(['c:\directory\executable_program.exe -f w:\directory\input_file.txt > Z\directory\output_file.txt']
subprocess.call(r'"c:\directory\executable_program.exe -f w:\directory\input_file.txt > Z\directory\output_file.txt"']
subprocess.call(r'"c:\directory\executable_program.exe" -f "w:\directory\input_file.txt > Z\directory\output_file.txt"']
Your attempts contain various amounts of quoting errors.
subprocess.run(r'c:\directory\executable_program.exe -f w:\directory\input_file.txt > Z\directory\output_file.txt', shell=True)
should work, where the r prefix protects the backslashes from being interpreted and removed by Python before the subprocess runs, and the absence of [...] around the value passes it verbatim to the shell (hence, shell=True).
On Windows you could get away with putting the command in square brackets even though it's not a list, and omitting shell=True in some circumstances.
If you wanted to avoid the shell, try
with open(r'Z\directory\output_file.txt', 'wb') as dest:
subprocess.run(
[r'c:\directory\executable_program.exe', '-f', r'w:\directory\input_file.txt'],
stdout=dest)
which also illustrates how to properly pass a list of strings in square brackets as the first argument to subprocess.run.

Linux bash executables behave differently depending on whether I am activating from the command line or with os.system('')

I've been attempting to execute a certain CLI from within python and store the output for later use within the same script. I suspect this question has a simple answer, but if one wishes to go through the entire pipeline, here is the tool in question.
wget http://rna.urmc.rochester.edu/Releases/current/RNAstructureForLinux.tgz
tar xvf as usual, go inside the resulting directory and execute 'make all', the executables I use in the bash script are within the 'exe' directory.
I attempted to execute the commands with os.system(), but with little luck. The CLI I am using; however, seems to be running. The function which I have set to execute the os.system() commands contains the following block.
txt = open('home/spectre/tools/RNAstructure/exe/RNAStructure_nucleic_acid.txt',"w")
txt.write('AAGGCTGTCCAGGCGCAATGTGGTGGCTGCTTCTCTGGGGAGTCCTCCAGGCTTGCCCAACCCGGGGCTCCGTCCTCTTGGCCCAAGAGCTACCCCAGCAGCTGACATCCCCCGGGTACCCAGAGCCGTATGGCAAAGGCCAAGAGAGCAGCACGGACATCAAGGCTCCAGAGGGCTTTGCTGTGAGGCTCGTCTTCCAGGACTTCGACCTGGAGCCGTCCCAGGACTGTGCAGGGGACTCTGTCACAGTGAGCTGGGGATGGGGGGGGTCCCGCCAGGACTGTGGCCAGGGAGATTCCCGGGGTTGTGGGAAGTGGCGGTGCCCTGAATCCCCCATCTGGAGGAGGGATGAAT')
os.system(' cd ~/tools/RNAstructure/exe ; ./python_RNA_structure.sh')
nucleotides, structure, MFE =
RNAStructure_from_file('home/spectre/tools/RNAstructure/exe/RNAStructure_bracket_output.txt')
The executable *.sh file contains this.
#!/bin/bash
cd ~/tools/RNAstructure/exe
./Fold RNAStructure_nucleic_acid.txt RNAStructure_nucleic_acid_output.txt
./ct2dot RNAStructure_nucleic_acid_output.txt -1 RNAStructure_bracket_output.txt
If I execute the bash script from the command line the output should look a little like this
Initializing nucleic acids...
Using auto-detected DATAPATH: "../data_tables" (set DATAPATH to avoid this warning).
done.
98% \[==================================================\] \\ done.
Writing output ct file...done.
Single strand folding complete.
Converting CT file...
Using auto-detected DATAPATH: "../data_tables" (set DATAPATH to avoid this warning).
CT file conversion complete.
If I execute the bash script form the python file.
Initializing nucleic acids...
Using auto-detected DATAPATH: "../data_tables" (set DATAPATH to avoid this warning).
Error reading sequence. The file did not contain any nucleotides.
Single strand folding complete with errors.
Converting CT file...
Using auto-detected DATAPATH: "../data_tables" (set DATAPATH to avoid this warning).
CT file conversion complete.
It looks an awful lot like my CLI can find the files it needs inside the terminal, but not outside of it. I haven't experimented with any parameters like trying absolute paths, but I understood by using os.system() I could execute a bash script, but it is not clear to me why this is changing how that script behaves.
What I've done to resolve the problem:
reopening the file seems to resolve the problem, but I am still working out why.
The problem seems to resolve when I reopen the file within the python script like so:
txt = open('home/spectre/tools/RNAstructure/exe/RNAStructure_nucleic_acid.txt',"w")
txt.write('AAGGCTGTCCAGGCGCAATGTGGTGGCTGCTTCTCTGGGGAGTCCTCCAGGCTTGCCCAACCCGGGGCTCCGTCCTCTTGGCCCAAGAGCTACCCCAGCAGCTGACATCCCCCGGGTACCCAGAGCCGTATGGCAAAGGCCAAGAGAGCAGCACGGACATCAAGGCTCCAGAGGGCTTTGCTGTGAGGCTCGTCTTCCAGGACTTCGACCTGGAGCCGTCCCAGGACTGTGCAGGGGACTCTGTCACAGTGAGCTGGGGATGGGGGGGGTCCCGCCAGGACTGTGGCCAGGGAGATTCCCGGGGTTGTGGGAAGTGGCGGTGCCCTGAATCCCCCATCTGGAGGAGGGATGAAT')
txt = open('home/spectre/tools/RNAstructure/exe/RNAStructure_nucleic_acid.txt')
os.system(' cd ~/tools/RNAstructure/exe ; ./python_RNA_structure.sh')
nucleotides, structure, MFE =
RNAStructure_from_file('home/spectre/tools/RNAstructure/exe/RNAStructure_bracket_output.txt')
I am not sure why this resolves the problem, I found this solution serendipitously. I'll update the answer when I figure out why, unless someone wants to beat me to it. It's magic to me for now.
It seems that after opening the file, RNAStructure_nucleic_acid.txt, and assigning it to the txt variable for writing, I need to reopen it after writing is complete. Otherwise the file is blank when I try printing it's output within the program, but after the program finishes executing, the file contains the correct text.

Python script doesn't delete file from archive -- printed command via terminal works fine

I'm creating an archive in Python using this code:
#Creates archive using string like [proxy_16-08-15_08.57.07.tar]
proxyArchiveLabel = 'proxy_%s' % EXECUTION_START_TIME + '.tar'
log.info('Packaging %s ...' % proxyArchiveLabel)
#Removes .tar from label during creation
shutil.make_archive(proxyArchiveLabel.rsplit('.',1)[0], 'tar', verbose=True)
So this creates an archive fine in the local directory. The problem is, there's a specific directory in my archive I want to remove, due to it's size and lack of necessity for this task.
ExecWithLogging('tar -vf %s --delete ./roles/jobs/*' % proxyArchiveLabel)
# ------------
def ExecWithLogging(cmd):
print cmd
p = subprocess.Popen(cmd.split(' '), env=os.environ, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
log.info(p.stdout.readline().strip())
if(p.poll() is not None):
break
However, this seems to do basically nothing. The size remains the same. If I print cmd inside of the ExecWithLogging, and copy/past that command to a terminal in the working directory of the script, it works fine. Just to be sure, I also tried hard-coding the full path to where the archive is created as part of the tar -vf %s --delete command, but still nothing seemed to happen.
I do get this output in my INFO log: tar: Pattern matching characters used in file names, so I'm kind of thinking Popen is interpreting my command incorrectly somehow... (or rather, I'm more likely passing in something incorrectly).
Am I doing something wrong? What else can I try?
You may have to use the --wildcards option in the tar command, which enables pattern matching. This may well be what you are seeing in your log, be it somewhat cryptically.
Edit:
In answer to your question Why? I suspect that the shell is performing the wildcard expansion whilst the command proffered through Popen is not. The --wildcard option for tar, forces tar to perform the wildcard expansion.
For a more detailed explanation see here:
Tar and wildcards

python: How does subprocess.check_output create it's calls?

I'm trying to read the duration of video files using mediainfo. This shell command works
mediainfo --Inform="Video;%Duration/String3%" file
and produces an output like
00:00:33.600
But when I try to run it in python with this line
subprocess.check_output(['mediainfo', '--Inform="Video;%Duration/String3%"', file])
the whole --Inform thing is ignored and I get the full mediainfo output instead.
Is there a way to see the command constructed by subprocess to see what's wrong?
Or can anybody just tell what's wrong?
Try:
subprocess.check_output(['mediainfo', '--Inform=Video;%Duration/String3%', file])
The " in your python string are likely passed on to mediainfo, which can't parse them and will ignore the option.
These kind of problems are often caused by shell commands requiring/swallowing various special characters. Quotes such as " are often removed by bash due to shell magic. In contrast, python does not require them for magic, and will thus replicate them the way you used them. Why would you use them if you wouldn't need them? (Well, d'uh, because bash makes you believe you need them).
For example, in bash I can do
$ dd of="foobar"
and it will write to a file named foobar, swallowing the quotes.
In python, if I do
subprocess.check_output(["dd", 'of="barfoo"', 'if=foobar'])
it will write to a file named "barfoo", keeping the quotes.

os.system: saving shell variables with multiple commands in one method

I am having a problem using my command/commands with one instance of os.system.
Unfortunately I have to use os.system as I have no control over this, as I send the string to the os.system method. I know I should really use subprocess module for my case, but that ain't an option.
So here is what I am trying to do.
I have a string like below:
cmd = "export BASE_PATH=`pwd`; export fileList=`python OutputString.py`; ./myscript --files ${fileList}; cp outputfile $BASE_PATH/.;"
This command then gets sent to the os.system module like so
os.system(cmd)
unfortunately when I consult my log file I get something that looks like this
os.system(r"""export BASE_PATH=/tmp/bla/bla; export fileList=; ./myscript --files ; cp outputfile /.;""")
As you can see BASE_PATH seems to be working but then when I call it with the cp outputfile /.
I get a empty string
Also with my fileList I get a empty string as fileList=python OutputString.py should print out a file list to this variable.
My thoughts:
Are these bugs due to a new process for each command? Hence I loose the variable in BASE_PATH in the next command.
Also for I not sure why fileList is empty.
Is there a solution to my above problem using os.system and my command string?
Please Note I have to use os.system module. This is out of my control.

Categories

Resources