I'm using python subprocess to unzip a zip archive. My code is as below:
subprocess.Popen(['unzip', '{}.zip'.format(inputFile), '-d', output_directory])
Is there an unzip command to remove the zip source file after unzipping it? If no, how can I pipe an rm to the subprocess.Popen but to make sure it waits for the file to unzip first?
Thanks.
You could use && in the Shell, which will execute the second command only if the first was successful:
import subprocess
import os
values = {'zipFile': '/tmp/simple-grid.zip', 'outDir': '/tmp/foo'}
command = 'unzip {zipFile} -d {outDir} && rm {zipFile}'.format(**values)
proc = subprocess.Popen(command, shell=True)
_ = proc.communicate()
print('Success' if proc.returncode == 0 else 'Error')
Or, os.remove() if unzip succeeded:
inputFile = values['zipFile']
output_directory = values['outDir']
proc = subprocess.Popen(
['unzip', '{}'.format(inputFile), '-d', output_directory]
)
_ = proc.communicate() # communicate blocks!
if proc.returncode == 0:
os.remove(values['zipFile'])
print('Success' if not os.path.exists(inputFile) else 'Error')
Related
In python subprocess using Popen or check_output, I need to list files and directories in a given source directory. But I can only use the command ls -l.
Sample code
cmd = ["ls", "-l", source]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout, stderr = proc.communicate()
exitcode = proc.returncode
if exitcode != 0:
raise SystemError("Exitcode '{}', stderr: '{}', stdout: '{}' for command: '{}'".format(
exitcode, stderr, stdout, cmd))
From above proc, by using grep or any other way, can I get only a list of files and directory names inside source directory without other information?
If you insist on using subprocess please try:
[x.split(' ')[-1] for x in stdout.decode().split('\n')[1:-1]]
Obviously this is a pretty "hacky" way of doing this. Instead I can suggest the standard library glob
import glob
glob.glob(source + '/*')
returns a list of all file/directory names in source.
Edit:
cmd = ["ls", source]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout, stderr = proc.communicate()
exitcode = proc.returncode
stdout.decode("utf-8").split('\n')[:-1]
Should also do it. -l option is not necessary here.
Parsing the output of ls is a bad idea for a few reasons. If your file name has a trailing space, then ls will display it as 'trailing space ' and if you try to open("'trailing space '") it won't work. Also file names can contain newlines.
Use pathlib instead:
from pathlib import Path
source = Path("/path/to/some/directory")
[x.name for x in source.iterdir()]
# ['a_file', 'some_other_file.txt', 'a_directory']
As Charles Duffy mentioned, you can use os. Like this.
import os
directory=#wherever you want to search
files_and_directories=os.listdir(directory)
Out: ['Directories and file names in a list']
How to get a list of files from hdfs (hadoop) directory using python script?
I have tried with following line:
dir = sc.textFile("hdfs://127.0.0.1:1900/directory").collect()
The directory have list of files "file1,file2,file3....fileN". By using the line i got all the content list only.
But i need to get list of file names.
Can anyone please help me to find out this problem?
Thanks in advance.
Use subprocess
import subprocess
p = subprocess.Popen("hdfs dfs -ls <HDFS Location> | awk '{print $8}'",
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
for line in p.stdout.readlines():
print line
EDIT: Answer without python. The first option can be used to recursively print all the sub-directories as well. The last redirect statement can be omitted or changed based on your requirement.
hdfs dfs -ls -R <HDFS LOCATION> | awk '{print $8}' > output.txt
hdfs dfs -ls <HDFS LOCATION> | awk '{print $8}' > output.txt
EDIT:
Correcting a missing quote in awk command.
import subprocess
path = "/data"
args = "hdfs dfs -ls "+path+" | awk '{print $8}'"
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
s_output, s_err = proc.communicate()
all_dart_dirs = s_output.split() #stores list of files and sub-directories in 'path'
Why not have the HDFS client do the hard work by using the -C flag instead of relying on awk or python to print the specific columns of interest?
i.e. Popen(['hdfs', 'dfs', '-ls', '-C', dirname])
Afterwards, split the output on new lines and then you will have your list of paths.
Here's an example along with logging and error handling (including for when the directory/file doesn't exist):
from subprocess import Popen, PIPE
import logging
logger = logging.getLogger(__name__)
FAILED_TO_LIST_DIRECTORY_MSG = 'No such file or directory'
class HdfsException(Exception):
pass
def hdfs_ls(dirname):
"""Returns list of HDFS directory entries."""
logger.info('Listing HDFS directory ' + dirname)
proc = Popen(['hdfs', 'dfs', '-ls', '-C', dirname], stdout=PIPE, stderr=PIPE)
(out, err) = proc.communicate()
if out:
logger.debug('stdout:\n' + out)
if proc.returncode != 0:
errmsg = 'Failed to list HDFS directory "' + dirname + '", return code ' + str(proc.returncode)
logger.error(errmsg)
logger.error(err)
if not FAILED_TO_LIST_DIRECTORY_MSG in err:
raise HdfsException(errmsg)
return []
elif err:
logger.debug('stderr:\n' + err)
return out.splitlines()
For python 3:
from subprocess import Popen, PIPE
hdfs_path = '/path/to/the/designated/folder'
process = Popen(f'hdfs dfs -ls -h {hdfs_path}', shell=True, stdout=PIPE, stderr=PIPE)
std_out, std_err = process.communicate()
list_of_file_names = [fn.split(' ')[-1].split('/')[-1] for fn in std_out.decode().readlines()[1:]][:-1]
list_of_file_names_with_full_address = [fn.split(' ')[-1] for fn in std_out.decode().readlines()[1:]][:-1]
use the following :
hdfsdir = r"hdfs://VPS-DATA1:9000/dir/"
filepaths = [ line.rsplit(None,1)[-1] for line in sh.hdfs('dfs','-ls',hdfsdir).split('\n') if len(line.rsplit(None,1))][1:]
for path in filepaths:
print(path)
to get list of hdfs files in a drectory :
hdfsdir = /path/to/hdfs/directory
filelist = [ line.rsplit(None,1)[-1] for line in sh.hdfs('dfs','-ls',hdfsdir).split('\n') if len(line.rsplit(None,1))][1:]
for path in filelist:
#reading data file from HDFS
with hdfs.open(path, "r") as read_file:
#do what u wanna do
data = json.load(read_file)
this list is a list of all files in hdfs directory
filelist = [ line.rsplit(None,1)[-1] for line in sh.hdfs('dfs','-ls',hdfsdir).split('\n') if len(line.rsplit(None,1))][1:]
I want my script to go under a particular file path mentioned in os.walk() and then execute a grep command on all the files under that location and redirect the output to a file. Below is the script I created, but the subprocess executes ls -al command under the current directory but the print statment show me the contents of os.walk. So I need the subprocess to execute the command under the os.walk path as well.
with open('ipaddressifle.out', 'w') as outfile:
for pdir, dir, files in os.walk(r'/Users/skandasa/perforce/projects/releases/portal-7651'):
for items in files:
print(items)
#subprocess.call(['ls', '-al'])
process = subprocess.Popen(['ls', '-al'], shell= True, stdout=outfile, stderr=outfile)
#process = subprocess.Popen(['grep', 'PORTALSHARED','*', '|', 'awk', '-F', '[','{print', '$1}'], shell= True, stdout=outfile, stderr=outfile)
[output, err] = process.communicate()
And is there anyother way apart from adding a cd command to the subprocess call.
You can use os.chdir(path) to change the current working directory.
I reworked your snippet to use subprocess.check_output to call a command and retrieve its stdout. I also use shlex.split(command) to write the command in a single string and split it correctly for Popen.
The script does os.walk(DIRECTORY) and write the output of ls -la in each subdirectory into OUTPUT_FILE:
import os
import shlex
from subprocess import check_output
DIRECTORY = '/tmp'
OUTPUT_FILE = '/tmp/output.log'
with open(OUTPUT_FILE, 'w') as output:
for parent, _, _ in os.walk(DIRECTORY):
os.chdir(parent)
output.write(check_output(shlex.split('ls -al')))
This question already has answers here:
How do I execute a program or call a system command?
(65 answers)
Closed 7 years ago.
i have an XML file, I recovered from ftp. i want to convert this xml to json
i use the xml2json
How can I call an external command from within a Python script?
python script:
#!/usr/bin/env python
import ftplib
import os
# Connection information
server = 'xxxxxx.xxxx'
username = 'xxxxxxxx'
password = 'xxxxxxxx'
# Directory and matching information
directory = '/datas/'
filematch = '*.xml'
src='/opt/scripts/'
dst='/opt/data/'
# Establish the connection
ftp = ftplib.FTP(server)
ftp.login(username, password)
# Change to the proper directory
ftp.cwd(directory)
# Loop through matching files and download each one individually
for filename in ftp.nlst(filematch):
fhandle = open(filename, 'wb')
print 'Getting ' + filename
ftp.retrbinary('RETR ' + filename, fhandle.write)
fhandle.close()
#?????????????????? EXECUTE XML2JSON TO CONVERT MY XML INTO JSON ???????????????????????
#?????????????????? xml2json -t xml2json -o stockvo.json stockvo.xml --strip_text ?????????????
#move the stockvo.xml file to destination
os.system('mv %s %s' % (src+'stockvo.xml', dst+'stockvo.xml'))
#remove the src file
os.unlink(src+'stockvo.xml')
The subprocess module has a function for that.
You could do something like:
import subprocess
subprocess.call('xml2json -t xml2json -o stockvo.json stockvo.xml --strip_text', shell=True)
Please note that using the shell=True option can be a security hazard, but that depends on what you will do with your script and whether a potential user could try to do shell injection on it.
Edit: As #PadraicCunningham suggested, there's no need to use shell=True actually, since you're not using shell utilities as wildcards or ~ for home expansion. So it should work only like:
subprocess.call('xml2json -t xml2json -o stockvo.json stockvo.xml --strip_text')
import subprocess
subprocess.call(['xml2json', '-t', 'xml2json', '-o', 'stockvo.json', 'stockvo.xml', '--strip_text'])
Using subprocess module:
subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None,
shell=False)'
Run command with arguments. Wait for command to complete.
If the return code was zero then return, otherwise raise
CalledProcessError. The CalledProcessError object will have the return
code in the returncode attribute.
import subprocess
try:
subprocess.check_call(['xml2json', '-t', 'xml2json', '-o', 'stockvo.json', 'stockvo.xml' '--strip_text'])
except subprocess.CalledProcessError:
pass # handle failure here
I have the following command
lessc lessc xyz.less > xyz.css
I want to run that command in python for which i have written this code
try:
project_path = settings.PROJECT_ROOT
less_path = os.path.join(project_path, "static\\less")
css_path = os.path.join(project_path, "static\\css")
except Exception as e:
print traceback.format_exc()
less_file = [f for f in os.listdir(less_path) if isfile(join(less_path, f))]
for files in less_file:
file_name = os.path.splitext(files)[0]
cmd = '%s\%s > %s\%s' % (less_path, files, css_path, file_name + '.css')
p = subprocess.Popen(['lessc', cmd], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
but it gives error windowerror 2 cannot find the path specifies
Make sure that 'lessc' is in your path, you could try using the full path to lessc instead.
You don't need to use shell style redirection with Popen like this, check the subprocess.Popen docs
Here is an example of how to do it without shell redirection:
import subprocess
lessc_command = '/path/to/lessc'
less_file_path = '/path/to/input.less'
css_file_path = '/path/to/output.css'
with open(css_file_path, 'w') as css_file:
less_process = subprocess.Popen([lessc_command, less_file_path], stdout=css_file)
less_process.communicate()