Untar file in Python script with wildcard

Untar file in Python script with wildcard - python

I am trying in a Python script to import a tar.gz file from HDFS and then untar it. The file comes as follow 20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz, it has always the same structure.
In my python script, I would like to copy it locally and the extract the file. I am using the following command to do this:
import subprocess
import os
import datetime
import time
today = time.strftime("%Y%m%d")
#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]
p=subprocess.Popen(args)
p.wait()
#Untar the CSV file
args = ["tar","-xzvf",today + "*"]
p=subprocess.Popen(args)
p.wait()
The import works perfectly but I am not able to extract the file, I am getting the following error:
['tar', '-xzvf', '20160822*.tar']
tar (child): 20160822*.tar: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
put: `reportResults.csv': No such file or directory
Can anyone help me?
Thanks a lot!

Try with the shell option:
p=subprocess.Popen(args, shell=True)
From the docs:
If shell is True, the specified command will be executed through the
shell. This can be useful if you are using Python primarily for the
enhanced control flow it offers over most system shells and still want
convenient access to other shell features such as shell pipes,
filename wildcards, environment variable expansion, and expansion of ~
to a user’s home directory.
And notice:
However, note that Python itself offers implementations of many
shell-like features (in particular, glob, fnmatch, os.walk(),
os.path.expandvars(), os.path.expanduser(), and shutil).

In addition to #martriay answer, you also got a typo - you wrote "20160822*.tar", while your file's pattern is "20160822*.tar.gz"
When applying shell=True, the command should be passed as a whole string (see documentation), like so:
p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True)
If you don't need p, you can simply use subprocess.call:
subprocess.call('tar -xzvf 20160822*.tar.gz', shell=True)
But I suggest you use more standard libraries, like so:
import glob
import tarfile
today = "20160822" # compute your common prefix here
target_dir = "/tmp" # choose where ever you want to extract the content
for targz_file in glob.glob('%s*.tar.gz' % today):
with tarfile.open(targz_file, 'r:gz') as opened_targz_file:
opened_targz_file.extractall(target_dir)

I found a way to do what I needed, instead of using os command, I used python tar command and it works!
import tarfile
import glob
os.chdir("/folder_to_scan/")
for file in glob.glob("*.tar.gz"):
print(file)
tar = tarfile.open(file)
tar.extractall()
Hope this help.
Regards
Majid

Related

Use command prompt from python

I write a python scripts that after execute some db queries, save the result of that queries on different csv files.
Now, it's mandatory to rename this file with the production's timestamps and so every hour i got new file with new name.
The script run with a task scheduler every hour and after save my csv files I need to run automatically the command prompt and execute some command that includes my csv files name in the path....
Is it possible to run the cmd and paste him the path of csv file like a variable? in python I save the file in this way:
date_time_str_csv1 = now.strftime("%Y%m%d_%H%M_csv1")
I don't know how to write automatically the different file name when i call the cmd

If I understand your question correctly, one solution would be to simply execute the command-line command directly from the Python script.
You can use the subprocess module from Python (as also explained here: How do I execute a program or call a system command?).
This could look like this for example:
csv_file_name = date_time_str_csv1 +".csv"
subprocess.run(["cat", csv_file_name)

You can run a system cmd from within Python using os.system:
import os
os.system('command filename.csv')
Since the argument to os.system is a string, you can build it with your created filename above.

you can try using the subprocess library, and get a list of the files in the folder in an array. This example is using the linux shell:
import subprocess
str = subprocess.check_output('ls', shell=True)
arr = str.decode('utf-8').split('\n')
print(arr)
After this you can iterate to find the newest file and use that one as the variable.

Use Python to launch Excel file

when i try
os.system("open " + 'myfile.xlsx')
i get the output '0'
similarly, trying
os.system("start excel.exe myfilepath")
gives the result 32512
I have imported os and system, and I'm on mac. How can I change this so it does actually launch that excel file? And out of curiosity, what do the numbers it prints out mean?
Thanks!

Just these two lines
import os
os.system("start EXCEL.EXE file.xlsx")
Provided that file.xlsx is in the current directory.

If you only want to open the excel application you could use subprocess:
import subprocess
subprocess.check_call(['open', '-a', 'Microsoft Excel'])
You can also use os and open a specific file:
import os
os.system("open -a 'path/Microsoft Excel.app' 'path/file.xlsx'")
If you on other hand want to open an excel file within python and modify it there's a number of packages to use as xlsxwriter, xlutils and openpyxl where the latter is prefered by me.
Another note, if you're on mac the excel application isn't .exe

On Windows 10, this works for me:
import os
full_path_to_file = "C:\blah\blah\filename.xlsx"
os.system(full_path_to_file)
Copy-pasting any full path to a file in the command prompt (or passing it to os.system()) seems to work as long as you have permission to open the file.
I suppose this only works when Excel is selected as default application for the .xlsx extention.

I don't know about Mac OS, but if Windows Operating System is the case and provided the Microsoft Windows is properly installed, then consider using :
import os
os.system('start "excel" "C:\\path\\to\\myfile.xlsx"')
double-quotation is important for excel and C:\\path\\to\\myfile.xlsx ( where C just denotes the letter for the partition within the file system, might be replaced by D,E ..etc. ), and single-quotation is needed for the whole string within the os.system().

As in: How to open an Excel file with Python to display its content?
import os
file = "C:\\Documents\\file.xlsx"
os.startfile(file)
It opens the file with the default application.

If you want this to work with your most recent downloaded xlsx file, you can use the following:
import glob
import os
list_of_files = glob.glob('C:\\Downloads\\*.xlsx') *#note the .xlsx extension is needed to retrieve the last downloaded xlsx file.*
latest_file = max(list_of_files, key=os.path.getctime)
os.system(latest_file)

A minor improvement that handles spaces and explicitly uses Excel:
import os
file = "C:/Documents/file name with spaces.xlsx"
file_with_quotes = '"' + file + '"'
os.system('start "EXCEL.exe" {}'.format(file_with_quotes))

Python - Execute command line function on every file in a directory?

I have a directory of CSV files that I want to import into MySQL. There are about 100 files, and doing a manual import is painful.
My command line is this:
mysqlimport -u root -ppassword --local --fields-terminated-by="|" data PUBACC_FR.dat
The files are all of type XX.dat, i.e. AC.dat, CP.dat, etc. I actually rename them first before processing them (via rename 's/^/PUBACC_/' *.dat). Ideally I'd like to be able to accomplish both tasks in one script: Rename the files, then run the command.
From what I've found reading, something like this:
for filename in os.listdir("."):
if filename.endswith("dat"):
os.rename(filename, filename[7:])
Can anyone help me get started with a script that will accomplish this, please? Read the file names, rename them, then for each one run the mysqlimport command?
Thanks!

I suppose something like the python code below could be used:
import subprocess
import os
if __name__ == "__main__":
for f in os.listdir("."):
if (f.endswith(".dat")):
subprocess.call("echo %s" % f, shell=True)
Obviously, you should change the command from echo to your command instead.
See http://docs.python.org/2/library/subprocess.html for more details of using subprocess, or see the possible duplicate.

How to move a file in Python?

How can I do the equivalent of mv in Python?
mv "path/to/current/file.foo" "path/to/new/destination/for/file.foo"

os.rename(), os.replace(), or shutil.move()
All employ the same syntax:
import os
import shutil
os.rename("path/to/current/file.foo", "path/to/new/destination/for/file.foo")
os.replace("path/to/current/file.foo", "path/to/new/destination/for/file.foo")
shutil.move("path/to/current/file.foo", "path/to/new/destination/for/file.foo")
The filename ("file.foo") must be included in both the source and destination arguments. If it differs between the two, the file will be renamed as well as moved.
The directory within which the new file is being created must already exist.
On Windows, a file with that name must not exist or an exception will be raised, but os.replace() will silently replace a file even in that occurrence.
shutil.move simply calls os.rename in most cases. However, if the destination is on a different disk than the source, it will instead copy and then delete the source file.

Although os.rename() and shutil.move() will both rename files, the command that is closest to the Unix mv command is shutil.move(). The difference is that os.rename() doesn't work if the source and destination are on different disks, while shutil.move() is files disk agnostic.

After Python 3.4, you can also use pathlib's class Path to move file.
from pathlib import Path
Path("path/to/current/file.foo").rename("path/to/new/destination/for/file.foo")
https://docs.python.org/3.4/library/pathlib.html#pathlib.Path.rename

For either the os.rename or shutil.move you will need to import the module.
No * character is necessary to get all the files moved.
We have a folder at /opt/awesome called source with one file named awesome.txt.
in /opt/awesome
○ → ls
source
○ → ls source
awesome.txt
python
>>> source = '/opt/awesome/source'
>>> destination = '/opt/awesome/destination'
>>> import os
>>> os.rename(source, destination)
>>> os.listdir('/opt/awesome')
['destination']
We used os.listdir to see that the folder name in fact changed.
Here's the shutil moving the destination back to source.
>>> import shutil
>>> source = '/opt/awesome/destination'
>>> destination = '/opt/awesome/source'
>>> shutil.move(source, destination)
>>> os.listdir('/opt/awesome/source')
['awesome.txt']
This time I checked inside the source folder to be sure the awesome.txt file I created exists. It is there
Now we have moved a folder and its files from a source to a destination and back again.

This is what I'm using at the moment:
import os, shutil
path = "/volume1/Users/Transfer/"
moveto = "/volume1/Users/Drive_Transfer/"
files = os.listdir(path)
files.sort()
for f in files:
src = path+f
dst = moveto+f
shutil.move(src,dst)
You can also turn this into a function, that accepts a source and destination directory, making the destination folder if it doesn't exist, and moves the files. Also allows for filtering of the src files, for example if you only want to move images, then you use the pattern '*.jpg', by default, it moves everything in the directory
import os, shutil, pathlib, fnmatch
def move_dir(src: str, dst: str, pattern: str = '*'):
if not os.path.isdir(dst):
pathlib.Path(dst).mkdir(parents=True, exist_ok=True)
for f in fnmatch.filter(os.listdir(src), pattern):
shutil.move(os.path.join(src, f), os.path.join(dst, f))

The accepted answer is not the right one, because the question is not about renaming a file into a file, but moving many files into a directory. shutil.move will do the work, but for this purpose os.rename is useless (as stated on comments) because destination must have an explicit file name.

Since you don't care about the return value, you can do
import os
os.system("mv src/* dest/")

Also possible with using subprocess.run() method.
python:
>>> import subprocess
>>> new = "/path/to/destination"
>>> old = "/path/to/new/destination"
>>> process = "mv ..{} ..{}".format(old,new)
>>> subprocess.run(process, shell=True) # do not remember, assign shell value to True.
This will work fine when working on Linux. Windows probably gives error since there is no mv Command.

Based on the answer described here, using subprocess is another option.
Something like this:
subprocess.call("mv %s %s" % (source_files, destination_folder), shell=True)
I am curious to know the pro's and con's of this method compared to shutil. Since in my case I am already using subprocess for other reasons and it seems to work I am inclined to stick with it.
This is dependent on the shell you are running your script in. The mv command is for most Linux shells (bash, sh, etc.), but would also work in a terminal like Git Bash on Windows. For other terminals you would have to change mv to an alternate command.

This is solution, which does not enables shell using mv.
from subprocess import Popen, PIPE, STDOUT
source = "path/to/current/file.foo",
destination = "path/to/new/destination/for/file.foo"
p = Popen(["mv", "-v", source, destination], stdout=PIPE, stderr=STDOUT)
output, _ = p.communicate()
output = output.strip().decode("utf-8")
if p.returncode:
print(f"E: {output}")
else:
print(output)

import os,shutil
current_path = "" ## source path
new_path = "" ## destination path
os.chdir(current_path)
for files in os.listdir():
os.rename(files, new_path+'{}'.format(f))
shutil.move(files, new_path+'{}'.format(f)) ## to move files from
different disk ex. C: --> D:

Problem with subprocess.call

In my current working directory I have the dir ROOT/ with some files inside.
I know I can exec cp -r ROOT/* /dst and I have no problems.
But if I open my Python console and I write this:
import subprocess
subprocess.call(['cp', '-r', 'ROOT/*', '/dst'])
It doesn't work!
I have this error: cp: cannot stat ROOT/*: No such file or directory
Can you help me?

Just came across this while trying to do something similar.
The * will not be expanded to filenames
Exactly. If you look at the man page of cp you can call it with any number of source arguments and you can easily change the order of the arguments with the -t switch.
import glob
import subprocess
subprocess.call(['cp', '-rt', '/dst'] + glob.glob('ROOT/*'))

Try
subprocess.call('cp -r ROOT/* /dst', shell=True)
Note the use of a single string rather than an array here.
Or build up your own implementation with listdir and copy

The * will not be expanded to filenames. This is a function of the shell. Here you actually want to copy a file named *. Use subprocess.call() with the parameter shell=True.

Provide the command as list instead of the string + list.
The following two commands are same:-
First Command:-
test=subprocess.Popen(['rm','aa','bb'])
Second command:-
list1=['rm','aa','bb']
test=subprocess.Popen(list1)
So to copy multiple files, one need to get the list of files using blob and then add 'cp' to the front of list and destination to the end of list and provide the list to subprocess.Popen().
Like:-
list1=blob.blob("*.py")
list1=['cp']+list1+['/home/rahul']
xx=subprocess.Popen(list1)
It will do the work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Untar file in Python script with wildcard - python

I found a way to do what I needed, instead of using os command, I used python tar command and it works! import tarfile import glob os.chdir("/folder_to_scan/") for file in glob.glob("*.tar.gz"): print(file) tar = tarfile.open(file) tar.extractall() Hope this help. Regards Majid

Related

Use command prompt from python

Use Python to launch Excel file

Python - Execute command line function on every file in a directory?

How to move a file in Python?

Problem with subprocess.call

Categories

Resources