Python code, does subprocess work with glob?

Python code, does subprocess work with glob? - python

the short of it is that i need a program to upload all txt files from a local directory via sftp, to a specific remote directory. if i run mput *.txt from sftp command line, while im already in the right local directory, then that was what i was shooting for.
Here is the code im trying. No errors when i run it, but no results either when i sftp to the server and ls the upload directory, its empty. i may be barking up the wrong tree all together. i see other solutions like lftp using mget in bash...but i really want this to work with python. either way i have a lot to learn still. this is what ive come up with after a few days reading about what some stackoverflow users suggested, a few libraries that might help. im not sure i can do the "for i in allfiles:" with subprocess.
import os
import glob
import subprocess
os.chdir('/home/submitid/Local/Upload') #change pwd so i can use mget *.txt and glob similarly
pwd = '/Home/submitid/Upload' #remote directory to upload all txt files to
allfiles = glob.glob('*.txt') #get a list of txt files in lpwd
target="user#sftp.com"
sp = subprocess.Popen(['sftp', target], shell=False, stdin=subprocess.PIPE)
sp.stdin.write("chdir %s\n" % pwd) #change directory to pwd
for i in allfiles:
sp.stdin.write("put %s\n" % allfiles) #for each file in allfiles, do a put %filename to pwd
sp.stdin.write("bye\n")
sp.stdin.close()

When you iterate over allfiles, you are not passing the iterator variable sp.stdin.write, but allfiles itself. It should be
for i in allfiles:
sp.stdin.write("put %s\n" % i) #for each file in allfiles, do a put %filename to pwd
You may also need to wait for sftp to authenticate before issuing commands. You could read stdout from the process, or just put some time.sleep delays in your code.
But why not just use scp and build the full command line, then check if it executes successfully? Something like:
result = os.system('scp %s %s:%s' % (' '.join(allfiles), target, pwd))
if result != 0:
print 'error!'

You dont need to iterate over allfiles
sp.stdin.write("put *.txt\n")
is enough. You instruct sftp to put all files at once, instead of one by one.

Related

Scan a directory and transfer files

I think that's easy but my script doesn't work. I think it's gonna be easier if I show you what I want: I want a script (in python) which does that:
I have a directory like:
boite_noire/
....helloworld/
....test1.txt/
....test2.txt/
And after running the script I would like something like:
boite_noire/
helloworld/
....test1/
........test1_date.txt
....test2/
........test2_date.txt
and if I add an other test1.txt like:
boite_noire/
helloworld/
....test1/
........test1_date.txt
....test2/
........test2_date.txt
....test1.txt
The next time I run the script:
boite_noire/
helloworld/
....test1/
........test1_date.txt
........test1_date.txt
....test2/
........test2_date.txt
I wrote this script :
But os.walk read files in directories and then create a directory named as the file, and I don't want that :(
Can someone help me please?

You could loop through each file and move it into the correct directory. This will work on a linux system (not sure about Windows - maybe better to use the shutil.move command).
import os
import time
d = 'www/boite_noire'
date = time.strftime('%Y_%m_%d_%H_%M_%S')
filesAll = os.listdir(d)
filesValid= [i for i in filesAll if i[-4:]=='.txt']
for f in filesValid:
newName = f[:-4]+'_'+date+'.txt'
try:
os.mkdir('{0}/{1}'.format(d, f[:-4]))
except:
print 'Directory {0}/{1} already exists'.format(d, f[:-4])
os.system('mv {0}/{1} {0}/{2}/{3}'.format(d, f, f[:-4], newName))
This is what the code is doing:
Find all file in a specified directory
Check the extension is .txt
For each valid file:
Create a new name by appending the date/time
Create the directory if it exists
move the file into the directory (changing the name as it is moved)

Untar file in Python script with wildcard

I am trying in a Python script to import a tar.gz file from HDFS and then untar it. The file comes as follow 20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz, it has always the same structure.
In my python script, I would like to copy it locally and the extract the file. I am using the following command to do this:
import subprocess
import os
import datetime
import time
today = time.strftime("%Y%m%d")
#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]
p=subprocess.Popen(args)
p.wait()
#Untar the CSV file
args = ["tar","-xzvf",today + "*"]
p=subprocess.Popen(args)
p.wait()
The import works perfectly but I am not able to extract the file, I am getting the following error:
['tar', '-xzvf', '20160822*.tar']
tar (child): 20160822*.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
put: `reportResults.csv': No such file or directory
Can anyone help me?
Thanks a lot!

Try with the shell option:
p=subprocess.Popen(args, shell=True)
From the docs:
If shell is True, the specified command will be executed through the
shell. This can be useful if you are using Python primarily for the
enhanced control flow it offers over most system shells and still want
convenient access to other shell features such as shell pipes,
filename wildcards, environment variable expansion, and expansion of ~
to a user’s home directory.
And notice:
However, note that Python itself offers implementations of many
shell-like features (in particular, glob, fnmatch, os.walk(),
os.path.expandvars(), os.path.expanduser(), and shutil).

In addition to #martriay answer, you also got a typo - you wrote "20160822*.tar", while your file's pattern is "20160822*.tar.gz"
When applying shell=True, the command should be passed as a whole string (see documentation), like so:
p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True)
If you don't need p, you can simply use subprocess.call:
subprocess.call('tar -xzvf 20160822*.tar.gz', shell=True)
But I suggest you use more standard libraries, like so:
import glob
import tarfile
today = "20160822" # compute your common prefix here
target_dir = "/tmp" # choose where ever you want to extract the content
for targz_file in glob.glob('%s*.tar.gz' % today):
with tarfile.open(targz_file, 'r:gz') as opened_targz_file:
opened_targz_file.extractall(target_dir)

I found a way to do what I needed, instead of using os command, I used python tar command and it works!
import tarfile
import glob
os.chdir("/folder_to_scan/")
for file in glob.glob("*.tar.gz"):
print(file)
tar = tarfile.open(file)
tar.extractall()
Hope this help.
Regards
Majid

Download files from an FTP server containing given string using Python

I'm trying to download a large number of files that all share a common string (DEM) from an FTP sever. These files are nested inside multiple directories. For example, Adair/DEM* and Adams/DEM*
The FTP sever is located here: ftp://ftp.igsb.uiowa.edu/gis_library/counties/ and requires no username and password.
So, I'd like to go through each county and download the files containing the string DEM.
I've read many questions here on Stack Overflow and the documentation from Python, but cannot figure out how to use ftplib.FTP() to get into the site without a username and password (which is not required), and I can't figure out how to grep or use glob.glob inside of ftplib or urllib.
Thanks in advance for your help

Ok, seems to work. There may be issues if trying to download a directory, or scan a file. Exception handling may come handy to trap wrong filetypes and skip.
glob.glob cannot work since you're on a remote filesystem, but you can use fnmatch to match the names
Here's the code: it download all files matching *DEM* in TEMP directory, sorting by directory.
import ftplib,sys,fnmatch,os
output_root = os.getenv("TEMP")
fc = ftplib.FTP("ftp.igsb.uiowa.edu")
fc.login()
fc.cwd("/gis_library/counties")
root_dirs = fc.nlst()
for l in root_dirs:
sys.stderr.write(l + " ...\n")
#print(fc.size(l))
dir_files = fc.nlst(l)
local_dir = os.path.join(output_root,l)
if not os.path.exists(local_dir):
os.mkdir(local_dir)
for f in dir_files:
if fnmatch.fnmatch(f,"*DEM*"): # cannot use glob.glob
sys.stderr.write("downloading "+l+"/"+f+" ...\n")
local_filename = os.path.join(local_dir,f)
with open(local_filename, 'wb') as fh:
fc.retrbinary('RETR '+ l + "/" + f, fh.write)
fc.close()

The answer by #Jean with the local pattern matching is the correct portable solution adhering to FTP standards.
Though as most FTP servers do support non-standard wildcard use with file listing commands, you can almost always use a simpler and mainly more efficient solution like:
files = ftp.nlst("*DEM*")
for f in files:
with open(f, 'wb') as fh:
ftp.retrbinary('RETR ' + f, fh.write)

You can use fsspecs FTPFileSystem for convenient globbing on an FTP server:
import fsspec.implementations.ftp
ftpfs = fsspec.implementations.ftp.FTPFileSystem("ftp.ncdc.noaa.gov")
files = ftpfs.glob("/pub/data/swdi/stormevents/csvfiles/*1985*")
print(files)
contents = ftpfs.cat(files[0])
print(contents[:100])
Result:
['/pub/data/swdi/stormevents/csvfiles/StormEvents_details-ftp_v1.0_d1985_c20160223.csv.gz', '/pub/data/swdi/stormevents/csvfiles/StormEvents_fatalities-ftp_v1.0_d1985_c20160223.csv.gz', '/pub/data/swdi/stormevents/csvfiles/StormEvents_locations-ftp_v1.0_d1985_c20160223.csv.gz']
b'\x1f\x8b\x08\x08\xcb\xd8\xccV\x00\x03StormEvents_details-ftp_v1.0_d1985_c20160223.csv\x00\xd4\xfd[\x93\x1c;r\xe7\x8b\xbe\x9fOA\xe3\xd39f\xb1h\x81[\\\xf8\x16U\x95\xac\xca\xc5\xacL*3\x8b\xd5\xd4\x8bL\xd2\xb4\x9d'
A nested search also works, for example, nested_files = ftpfs.glob("/pub/data/swdi/stormevents/**1985*"), but it can be quite slow.

Python - Execute command line function on every file in a directory?

I have a directory of CSV files that I want to import into MySQL. There are about 100 files, and doing a manual import is painful.
My command line is this:
mysqlimport -u root -ppassword --local --fields-terminated-by="|" data PUBACC_FR.dat
The files are all of type XX.dat, i.e. AC.dat, CP.dat, etc. I actually rename them first before processing them (via rename 's/^/PUBACC_/' *.dat). Ideally I'd like to be able to accomplish both tasks in one script: Rename the files, then run the command.
From what I've found reading, something like this:
for filename in os.listdir("."):
if filename.endswith("dat"):
os.rename(filename, filename[7:])
Can anyone help me get started with a script that will accomplish this, please? Read the file names, rename them, then for each one run the mysqlimport command?
Thanks!

I suppose something like the python code below could be used:
import subprocess
import os
if __name__ == "__main__":
for f in os.listdir("."):
if (f.endswith(".dat")):
subprocess.call("echo %s" % f, shell=True)
Obviously, you should change the command from echo to your command instead.
See http://docs.python.org/2/library/subprocess.html for more details of using subprocess, or see the possible duplicate.

Python subprocess.call - adding a variable to subprocess.call [duplicate]

This question already has answers here:
Why does passing variables to subprocess.Popen not work despite passing a list of arguments?
(5 answers)
Closed 1 year ago.
I'm trying to write a simple program in Python that takes all the music files from my Downloads folder and puts them in my Music folder. I'm using Windows, and I can move the files using the cmd prompt, but I get this error:
WindowsError: [Error 2] The system cannot find the file specified
Here's my code:
#! /usr/bin/python
import os
from subprocess import call
def main():
os.chdir("C:\\Users\Alex\Downloads") #change directory to downloads folder
suffix =".mp3" #variable holdinng the .mp3 tag
fnames = os.listdir('.') #looks at all files
files =[] #an empty array that will hold the names of our mp3 files
for fname in fnames:
if fname.endswith(suffix):
pname = os.path.abspath(fname)
#pname = fname
#print pname
files.append(pname) #add the mp3 files to our array
print files
for i in files:
#print i
move(i)
def move(fileName):
call("move /-y "+ fileName +" C:\Music")
return
if __name__=='__main__':main()
I've looked at the subprocess library and countless other articles, but I still have no clue what I'm doing wrong.

The subprocess.call method taks a list of parameters not a string with space separators unless you tell it to use the shell which is not recommended if the string can contain anything from user input.
The best way is to build the command as a list
e.g.
cmd = ["move", "/-y", fileName, "C:\Music"]
call(cmd)
this also makes it easier to pass parameters (e.g. paths or files) with spaces in to the called program.
Both these ways are given in the subprocess documentation.
You can pass in a delimited string but then you have to let the shell process the arguments
call("move /-y "+ fileName +" C:\Music", shell=True)
Also in this case for move there is a python command to do this. shutil.move

I'm not answering your question directly, but for such tasks, plumbum is great and would make your life so much easier. subprocess's api is not very intuitive.

There could be several issues:
fileName might contain a space in it so the move command only sees a part of filename.
if move is an internal command; you might need shell=True to run it:
from subprocess import check_call
check_call(r"move /-y C:\Users\Alex\Downloads\*.mp3 C:\Music", shell=True)
To move .mp3 files from Downloads folder to Music without subprocess:
from glob import glob
from shutil import move
for path in glob(r"C:\Users\Alex\Downloads\*.mp3"):
move(path, r"C:\Music")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python code, does subprocess work with glob? - python

You dont need to iterate over allfiles sp.stdin.write("put *.txt\n") is enough. You instruct sftp to put all files at once, instead of one by one.

Related

Scan a directory and transfer files

Untar file in Python script with wildcard

Download files from an FTP server containing given string using Python

Python - Execute command line function on every file in a directory?

Python subprocess.call - adding a variable to subprocess.call [duplicate]

Categories

Resources