Python: subprocess.Popen variable behavior - python

I want to read all files in a directory and pass them via command line to another program.
The following is a part of my code (for one file here) which does not seem to work, and I don't really understand why it would not work.
My code (with a bit of debug print):
# -*- coding: iso-8859-15 -*-
# Python 3
avidemux_dir = "C:\\Program Files (x86)\\Avi Demux\\avidemux.exe"
start_dir = "F:\\aaa" # without ending backslash!
extension = ".mpg"
import os
import subprocess
for dirpath, dirnames, filenames in os.walk(start_dir):
if filenames:
first_file = os.path.join(dirpath, filenames[0])
test2 = "--load " + first_file
print(dirpath) #results in: F:\aaa\av01
print(first_file) #results in: F:\aaa\av01\av01.mpg
print(test2) #results in: --load F:\aaa\av01\av01.mpg
p1 = subprocess.Popen([avidemux_dir, "--load", first_file])
p2 = subprocess.Popen([avidemux_dir, test2])
For this example, avidemux will work (load the correct file) only for p1. p2 does not work.
Why is that?
The commandline example that works in .bat:
avidemux.exe --load F:\aaa\av01\av01.mpg
I really would like to have it all in one string like in p2 because I join a larger list of files together to one big string with the correct variables for avidemux.

shlex is one approach, but from the files paths it's obvious you're running on Windows, and shlex assumes conventions used in Unix-ish shells. They can get you in trouble on Windows.
As the docs say, the underlying Windows API call takes a single string as an argument, so on Windows you're generally much better off passing a single string to Popen().
Oops! I see you've already discovered that. But at least now you know why ;-)

use
import shlex
p2 = subprocess.Popen([avidemux_dir] + shlex.split(test2))
see the docs about command args of Popen.

Ah, you're passing a string of two arguments there. You need to split it, if necessary using shlex.split:
p2 = subprocess.Popen([avidemux_dir, *shlex.split(test2)])
Or just pass a string:
p2 = subprocess.Popen(avidemux_dir + ' ' + test2, shell=True)

Just stumbled across the solution: Not using a list, when doing something like that.
Solution:
test2 = avidemux_dir + " --load " + first_file
and
p2 = subprocess.Popen(test2) # no more list but the pure string.

Related

Need to pass argument from sys.argv[1] as input to glob() in python

I have a situation where i need to take argument from command-line and use that string (expression) I need to print files based on that regex.
I want to use glob to parse my string as i can pass regex to filter.
excerpt from python file:
dated = sys.argv[1]
files = glob.glob(dated)
This throws me empty list
> python analysis.py <some_expression>
[]
However, if I give any value manually:
dated = '*.xlsx' # example sake
files = glob.glob(dated)
print(files)
it prints:
[<list of files conforming to the required filter>]
It's obvious that the CLI arguments that it prints above, but I want the CLI argument to work properly.
I tested manually if the arguments are actually testing and it worked, so the sys.argv[1] is working but the results are not getting parsed in the glob.glob()
any ideas if I am missing something somewhere?
The issue her is not in Python, but in the shell that invokes it. Most shells I know (definitely all Linux shells) perform glob expansion before passing arguments to the executable they spawn (your Python script, in this case). This means that, at most, sys.argv[1] would contain the first file matching the glob expression you pass, and anyway, applying glob on it would not do any good.
For example, if your work directory has files a.xlsx, b.xlsx and c.xlsx, and you invoke you code using:
python mycode.py *.xlsx
Then the shell will actually glob the argument you specified, and will pass the results to your script, making the following true:
sys.argv[1:] == [`a.xlsx`, `b.xlsx`, `c.xlsx`]
In fact, instead of explicitly invoking glob, you can simply iterate on sys.argv[1:].

Python foo > bar(input file, output file)

It's propably very basic question but I couldn't find any answer. Right now I have something like:
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as input,open(outFile,'w+') as out:
#dosomething
I can run it with
./modulname foo bar (working). How can I change it so it will work with /.modulname foo > bar? (right now it gives me following error).
./pagereport.py today.log > sample.txt
Traceback (most recent call last):
File "./pagereport.py", line 7, in <module>
outFile = sys.argv[2]
IndexError: list index out of range
You could skip the second open (out) and instead use sys.stdout to write to.
If you want to be able to use both ways of calling it, argparse has a comfortable way of doing that with add_argument by combining type= to a file open for writing and making sys.stdout its default.
When you do:
./modulname foo > bar
> is acted upon by shell, and duplicates the STDOUT stream (FD 1) to the file bar. This happens before the command even runs, so no, you can't pass the command like that and have bar available inside the Python script.
If you insist on using >, a poor man's solution would be to make the arguments a single string, and do some string processing inside, something like:
./modulname 'foo >bar'
And inside your script:
infile, outfile = map(lambda x: x.strip(), sys.argv[1].split('>'))
Assuming no filename have whitespaces, take special treatment like passing two arguments in that case.
Also, take a look at the argparse module for more flexible argument parsing capabilities.
What error have you got?
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as in_put ,open(outFile,'w+') as out:
buff = in_put.read()
out.write(buff)
I try to run you code, but you have no import sys , so after fixed it as above . I can run it as a simple cp command.
python p4.py p4.py p4.py-bk

How to import without using "import" in python

The context of this question is that I am trying to write a program for assist in analysis of data. It should be written in python3, however the kind of data it is for is usually stored in a format that python cannot read.
There is a package to the read these data but it is only compatible with python2. In order to the read the data I therefore wanted to write a python2 scripts that reads the file and converts it into a numpy array. This I want to read in my python3 program. (The package in question is axographio1).
In generality what I want is this:
Given a (python2) script like
#reading.py
import numpy
x = numpy.random.random(size=10000)
run a (python3) that can somehow get x
#analyze.py
import matplotlib.pyplot as plt
#fancyfunction that gets x from reading.py
plt.plot(x)
plt.show()
It is important here that reading.py be executed by the python2 interpreter since it will not work with python3.
Have you tried to pickle the data
In python 2
import pickle
pickle.dumps(x)
In python 3
import pickle
pickle.load(x)
If I remember correctly, a better approach is to save your numpy array in a json file (maybe using panda over numpy) in python 2, and do the reverse in python 3
Something like that :
df = pandas.Data[...]
See http://www.thegeeklegacy.com/t/how-to-import-without-using-import-in-python/35/#post-103 for the details
Below is the solution I used.
From the subprocess module I used the function Popen with argument shell=True to call my python2 scripts from the python3 and collected the stdout. (This means everything has be printed to the console.)
This is python2 code I want to call in python3:
#readying.py
import axographio #this is the python2 package
import os #specific to the problem of reading data
import sys #needed to use arguments from the command line
if __name__=='__main__': #when y
path = sys.argv[1]
filename = sys.argv[2]
os.chdir(path)
file = axographio.read(filename)
# print 'done reading files'
data = file.data
no_measurements = len(data)
measurement_len = len(data[0])
print(no_measurements)
print(measurement_len)
for i, name in enumerate(file.names):
print(name)
for i in range(no_measurements):
for j in range(measurement_len):
print(data[i][j])
print('next measurement')
#this acts as a seperator when reading the data in python3, anything that cannot be converted to a float works
What this code does is simply take argument from the commad line. (Using the sys module they can be passed to a .py scripts in the command line in the form python2 script.py arg1 arg2 arg2.
The output of reading.py is given out using print statements. Note that I print every datum individually in order to avoid truncation of the output. In this way the output is the standard output stdout of the python2 call.
The python3 code is
#analyze.py
import subprocess
import numpy as np
def read_data(path, filename):
module_name = 'reading.py'
cmd = 'python2 '+module_name+' '+path+' '+filename
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
out, err = p.communicate()
#out is a bytes object
result = out.split(b'\n') #the output from different print statements is seperated by "b'\n'" in the bytes object
#everything below here is concerned with unpacking the output and structuring it as I needed it
output = []
i = 0
while i < len(result):
output.append(result[i].decode("utf-8"))
i+=1
ind = 0 #this should keep track of where in the output list we are
no_measurements = int(output[ind])
ind += 1
measurement_len = int(output[ind])
ind += 1
names = []
for i in np.arange(ind,ind+no_measurements):
names.append(output[i])
ind += 1
data = []
measurement = []
while ind<len(output):
try:
measurement.append(float(output[ind]))
ind+=1
except:
data.append(measurement)
measurement = []
ind+=1
data = np.array(data)
return names, data
What this does is to use the subprocess module to execute as a shell command python2 reading.py path filename. As mentioned above the stdout of this call is the output of its print statements. These are bytes objects encoded with 'UTF-8'. Using the decode method they can be converted to string objects which can then be changed in type using functions like float. (This is another reason why it is useful to print everything seperately since it is rather annoying to have to scan through a string to find arrays or numbers.) In readying.py data are seperated explicitely while being printed which makes it easy to structure them once they are read.
Tags: axographio in python3--read axograph files python3

Issues calling awk from within Python using subprocess.call

Having some issues calling awk from within Python. Normally, I'd do the following to call the command in awk from the command line.
Open up command line, in admin mode or not.
Change my directory to awk.exe, namely cd R\GnuWin32\bin
Call awk -F "," "{ print > (\"split-\" $10 \".csv\") }" large.csv
My command is used to split up the large.csv file based on the 10th column into a number of files named split-[COL VAL HERE].csv. I have no issues running this command. I tried to run the same code in Python using subprocess.call() but I'm having some issues. I run the following code:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', '\",\"',
'\"{ print > (\\"split-\\" $10 \\".csv\\") }\"', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')
and clearly, something is running when I execute the function (CPU usage, etc) but when I go to check C:/R/GnuWin32/bin/ there are no split files in the directory. Any idea on what's going wrong?
As I stated in my previous answer that was downvoted, you overprotect the arguments, making awk argument parsing fail.
Since there was no comment, I supposed there was a typo but it worked... So I suppose that's because I should have strongly suggested a full-fledged python solution, which is the best thing to do here (as stated in my previous answer)
Writing the equivalent in python is not trivial as we have to emulate the way awk opens files and appends to them afterwards. But it is more integrated, pythonic and handles quoting properly if quoting occurs in the input file.
I took the time to code & test it:
def split_ByInputColumn():
# get rid of the old data from previous runs
for f in glob.glob("split-*.csv"):
os.remove(f)
open_files = dict()
with open('large.csv') as f:
cr = csv.reader(f,delimiter=',')
for r in cr:
tenth_row = r[9]
filename = "split-{}.csv".format(tenth_row)
if not filename in open_files:
handle = open(filename,"wb")
open_files[filename] = (handle,csv.writer(handle,delimiter=','))
open_files[filename][1].writerow(r)
for f,_ in open_files.values():
f.close()
split_ByInputColumn()
in detail:
read the big file as csv (advantage: quoting is handled properly)
compute the destination filename
if filename not in dictionary, open it and create csv.writer object
write the row in the corresponding dictionary
in the end, close file handles
Aside: My old solution, using awk properly:
import subprocess
def split_ByInputColumn():
subprocess.call(['awk.exe', '-F', ',',
'{ print > ("split-" $10 ".csv") }', 'large.csv'],cwd = 'some_directory')
Someone else posted an answer (and then subsequently deleted it), but the issue was that I was over-protecting my arguments. The following code works:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', ',',
'{ print > (\"split-\" $10 \".csv\") }', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')

adding string in others

I have a string like script = "C:\Users\dell\byteyears.py". I wanna put the string "Python27\" in between the string like script = "C:\Users\dell\Python27\byteyears.py. Why I need is because build_scripts is not running correctly on the windows. Anyway, how can I do this wish in time efficient way ?
EDIT : I will not print anything. String is stored on the script variable in the build_scripts
script = convert_path(script)
I should put something to convert it, like
script = convert_path(script.something("Python27/"))
The question is that what something should be.
os.path is best for dealing with paths, also forward slashes are ok to use in Python.
In [714]: script = r"C:/Users/dell/byteyears.py"
In [715]: head, tail = os.path.split(script)
In [716]: os.path.join(head, 'Python27', tail)
Out[716]: 'C:/Users/dell/Python27/byteyears.py'
in a module.
import os
script = r"C:/Users/dell/byteyears.py"
head, tail = os.path.split(script)
newpath = os.path.join(head, 'Python27', tail)
print newpath
gives
'C:/Users/dell/Python27/byteyears.py'
internally Python is in general agnostic about the slashes, so use forward slashes "/" as they look nicer and save having to escape.
import os
os.path.join(script[:script.rfind('\\')],'Python27',script[script.rfind('\\'):])
Try:
from os.path import abspath
script = "C:\\Users\\dell\\byteyears.py"
script = abspath(script.replace('dell\\', 'dell\\Python27\\'))
Note: Never forget to escape \ when working with strings!
And if you're mixing / and \ then you'd better use abspath() to correct it to your platform!
Other ways:
print "C:\\Users\\dell\\%s\\byteyears.py" % "Python27"
or if you want the path to be more dynamic, this way you can pass a empty string:
print "C:\\Users\\dell%s\\byeyears.py" % "\\Python27"
Also possible:
x = "C:\\Users\\dell%s\\byeyears.py"
print x
x = x % "\\Python27"
print x

Categories

Resources