Bash to Python: flatten directory tree

Bash to Python: flatten directory tree - python

On Unix-like systems I use this script, which I'd like some help on porting to Python for execution on Windows hosts:
#!/bin/bash
SENTINEL_FILENAME='__sentinel__'
SENTINEL_MD5_CHECKSUM=''
SENTINEL_SHA_CHECKSUM=''
function is_directory_to_be_flattened() {
local -r directory_to_consider="$1"
local -r sentinel_filepath="${directory_to_consider}/${SENTINEL_FILENAME}"
if [ ! -f "${sentinel_filepath}" ]; then
return 1
fi
if [[
"$(
md5 "${sentinel_filepath}" \
| awk '{ print $NF }' 2> /dev/null
)" \
== "${SENTINEL_MD5_CHECKSUM}"
&& \
"$(
shasum -a 512 "${sentinel_filepath}" \
| awk '{ print $1 }' 2> /dev/null
)" \
== "${SENTINEL_SHA_CHECKSUM}"
]]; then
return 0
else
return 1
fi
}
function conditionally_flatten() {
local -r directory_to_flatten="$1"
local -r flatten_into_directory="$2"
if is_directory_to_be_flattened "${directory_to_flatten}"; then
if [ ! -d "${flatten_into_directory}" ]; then
mkdir -v "${flatten_into_directory}"
fi
for file_to_move in $(find ${directory_to_flatten} -type f -maxdepth 1); do
mv \
-v \
-n \
"${file_to_move}" \
"${flatten_into_directory}"
done
fi
}
function flatten_directory() {
local -r directory_to_flatten="$1"
local -r descend_depth="$2"
local -r flattened_directory="${directory_to_flatten}/__flattened__"
if [ ! -d "${directory_to_flatten}" ]; then
printf "The argument '%s' does not seem to be a directory.\n" \
"${directory_to_flatten}" \
>&2
return
fi
find "${directory_to_flatten}" \
-type d \
-maxdepth "${descend_depth}" \
| \
while read directory_path; do
conditionally_flatten \
"${directory_path}" \
"${flattened_directory}"
done
}
n_arguments="$#"
if [ "${n_arguments}" -eq 1 ]; then
flatten_directory "$1" '1' # maybe use a constant, not a "magic #" here?
else
echo usage: "$0" /path/to/directory/to/flatten
fi
unset is_directory_to_be_flattened
unset conditionally_flatten
unset flatten_directory
How would you port this to Win Python? I am a beginner in both Python and Bash scripting..
Feel free to upgrade my implementation as you port it if you find it lacking in any way too, with a justification please. This is not "Code Review" but a "thumbs up/thumbs down" on my effort in Bash would give me a sense of whether I am improving or I should change the way I study altogether...
Here we go, my attempt in Python: (criticise it hard if need be, it's the only way for me to learn!)
#!/usr/bin/env python2.7
import sys
import os
import shutil
SENTINEL_FILENAME=''
SENTINEL_MD5_CHECKSUM=''
SENTINEL_SHA_CHECKSUM=''
DEFAULT_DEPTH = 1
FLATTED_DIRECTORY_NAME = '__flattened__'
def is_directory_to_be_flattened(directory_to_consider):
sentinel_location = os.path.join(directory_to_consider, SENTINEL_FILENAME)
if not os.path.isfile(sentinel_location):
return False
import hashlib
with open(sentinel_location) as sentinel_file:
file_contents = sentinel_file.read()
return (hashlib.md5(file_contents).hexdigest() == SENTINEL_MD5_CHECKSUM
and hashlib.sha512(file_contents).hexdigest() == SENTINEL_SHA_CHECKSUM)
def flatten(directory, depth, to_directory, do_files_here):
if depth < 0:
return
contained_filenames = [f for f in os.listdir(directory)]
if do_files_here:
for filename in contained_filenames:
if filename == SENTINEL_FILENAME:
continue
filepath = os.path.join(directory, filename)
if not os.path.isfile(filepath):
continue
file_to = os.path.join(to_directory, filename)
if not os.path.isdir(to_directory):
os.makedirs(to_directory)
if not os.path.isfile(file_to):
print "Moving: '{}' -> '{}'".format(filepath, file_to)
shutil.move(filepath, file_to)
else:
sys.stderr.write('Error: {} exists already.\n'.format(file_to))
next_depth = depth - 1
for subdirectory in (d for d in contained_filenames if os.path.isdir(d)):
if is_directory_to_be_flattened(subdirectory):
flatten(subdirectory, next_depth, to_directory, True)
def flatten_directory(to_flatten, depth):
to_directory = os.path.join(to_flatten, FLATTED_DIRECTORY_NAME)
if not os.path.isdir(to_flatten):
sys.stderr.write(
'The argument {} does not seem to be a directory.\n'.format(
to_flatten))
return
flatten(to_flatten, depth, to_directory, False)
def main():
if len(sys.argv) == 2:
flatten_directory(sys.argv[1], DEFAULT_DEPTH)
else:
print 'usage: {} /path/to/directory/to/flatten'.format(sys.argv[0])
if __name__ == '__main__':
main()
Although it's obvious from the code, the intent is:
Start at a given directory
Descend up to a certain depth
Consider subdirectories and move all files therein if and only if:
The directory contains a "sentinel file" with a given filename
The sentinel file is actually a sentinel file, not just a file renamed to the same name
Collate files in a __flattened__ directory under the directory in which the search started

Most file dealing functions in Python are in the module "os" - therein you will find
os.rename (for renaming or moving a directoruy entry), os.listdir - which gives you a listing of filenames in the directory, passed as first arg, os.walk - to recursively walk through a directory structure, os.path.walk, to do the same, but with a callback, os.path.exists, os.path.isdir, os.mkdir, are others that might be handy.
For a "quick and dirty" translation you might also cehck "os.system". which allows you to execute a shell command just like it was typed in the shell, and os.popen - which allows access to stdin and stdout of said process. A more carefull translation, tough, would require using anothe module: "subprocess" which can give one full control of a shell command executed as sub process (although if you need find, for example, it won't be available on windows)
Other moduless of interest are sys (sys.argv are the arguments passed to the script) and shutil (with things like copy, rmtree and such)
Your script does some error checking, and it is trivial, given the above funcion names in "os" and basic Python to add them - but a short "just do it" script in Python could be just:
import os, sys
dir_to_flatten = sys.argv[1]
for dirpath, dirnames, filenames in os.walk(dir_to_flatten):
for filename in filenames:
try:
os.rename(os.path.join(dirpath, filename), os.path.join(dir_to_flatten, filename))
except OSError:
print ("Could not move %s " % os.path.join(dirpath, filename))

Related

Get a string in Shell/Python using sys.argv

I'm beginning with bash and I'm executing a script :
$ ./readtext.sh ./InputFiles/applications.txt
Here is my readtext.sh code :
#!/bin/bash
filename="$1"
counter=1
while IFS=: true; do
line=''
read -r line
if [ -z "$line" ]; then
break
fi
echo "$line"
python3 ./download.py \
-c ./credentials.json \
--blobs \
"$line"
done < "$filename"
I want to print the string ("./InputFiles/applications.txt") in a python file, I used sys.argv[1] but this line gives me -c. How can I get this string ? Thank you

It is easier for you to pass the parameter "$1" to the internal command python3.
If you don't want to do that, you can still get the external command line parameter with the trick of /proc, for example:
$ cat parent.sh
#!/usr/bin/bash
python3 child.py
$ cat child.py
import os
ext = os.popen("cat /proc/" + str(os.getppid()) + "/cmdline").read()
print(ext.split('\0')[2:-1])
$ ./parent.sh aaaa bbbb
['aaaa', 'bbbb']
Note:
the shebang line in parent.sh is important, or you should execute ./parent.sh with bash, else you will get no command line param in ps or /proc/$PPID/cmdline.
For the reason of [2:-1]: ext.split('\0') = ['bash', './parent.sh', 'aaaa', 'bbbb', ''], real parameter of ./parent.sh begins at 2, ends at -1.
Update: Thanks to the command of #cdarke that "/proc is not portable", I am not sure if this way of getting command line works more portable:
$ cat child.py
import os
ext = os.popen("ps " + str(os.getppid()) + " | awk ' { out = \"\"; for(i = 6; i <= NF; i++) out = out$i\" \" } END { print out } ' ").read()
print(ext.split(" ")[1 : -1])
which still have the same output.

This is the python file that you can use in ur case
import sys
file_name = sys.argv[1]
with open(file_name,"r") as f:
data = f.read().split("\n")
print("\n".join(data))
How to use sys.argv
How to use join method inside my python code

search strings in multiple textfile

I have thousands of text files on my disk.
I need to search for them in terms of selected words.
Currently, I use:
grep -Eri 'text1|text2|text3|textn' dir/ > results.txt
The result is saved to a file: results.txt
I would like the result to be saved to many files.
results_text1.txt, results_text2.txt, results_textn.txt
Maybe someone has encountered some kind of script eg in python?

One solution might be to use a bash for loop.
for word in text1 text2 text3 textn; do grep -Eri '$word' dir/ > results_$word.txt; done
You can run this directly from the command line.

By using combination of "sed" and "xargs"
echo "text1,text2,text3,textn" | sed "s/,/\n/g" | xargs -I{} sh -c "grep -ir {} * > result_{}"

One way (using Perl because it's easier for regex and one-liner).
Sample data:
% mkdir dir dir/dir1 dir/dir2
% echo -e "text1\ntext2\nnope" > dir/file1.txt
% echo -e "nope\ntext3" > dir/dir1/file2.txt
% echo -e "nope\ntext2" > dir/dir1/file3.txt
Search:
% find dir -type f -exec perl -ne '/(text1|text2|text3|textn)/ or next;
$pat = $1; unless ($fh{$pat}) {
($fn = $1) =~ s/\W+/_/ag;
$fn = "results_$fn.txt";
open $fh{$pat}, ">>", $fn;
}
print { $fh{$pat} } "$ARGV:$_"' {} \;
Content of results_text1.txt:
dir/file1.txt:text1
Content of results_text2.txt:
dir/dir2/file3.txt:text2
dir/file1.txt:text2
Content of results_text3.txt:
dir/dir1/file2.txt:text3
Note:
you need to put the pattern inside parentheses to capture it. grep doesn't allow one to do this.
the captured pattern is then filtered (s/\W+/_/ag means to replace nonalphanumeric characters with underscore) to ensure it's safe as part of a filename.

Is there a way to force an auto-generated file to be committed along with the rest of the commit that triggers its generation?

I use a simple test framework which converts contents of an *.xlsx file into a selenium browser test. This allows a high level of abstraction but is annoying to work in git due to it being a binary.
The format of the excel test file is:
Column A -> Action
Column B -> Identity (which text field, etc)
Column C onwards -> Value
I wrote the following code which successfully converts the excel file into a text file (*.json), essentially just a list of lists (so that order is maintained and no extra keywords are introduced).
#!/usr/bin/env python
"""
This script takes a GUI test in xlsx format and outputs a text interpretation.
"""
import argparse
import json
from openpyxl import load_workbook
from openpyxl.cell import get_column_letter
def main(xlsx_test):
ws = load_workbook(filename=xlsx_test, data_only=True).active
col_size = len(ws.column_dimensions)
row_size = len(ws.row_dimensions)
full_document = []
for col in xrange(2, col_size):
col_letter = get_column_letter(col + 1)
test_document = []
for row in xrange(row_size):
cell_reference = col_letter + str(row + 1)
cell_value = ws[cell_reference].value
if cell_value:
action = ws['A' + str(row + 1)].value
identity = ws['B' + str(row + 1)].value
if not action:
action = ""
if not identity:
identity = ""
test_item = [action, identity, cell_value]
test_document.append(test_item)
if test_document:
full_document.append(test_document)
with open(xlsx_test + '.json', 'w') as outfile:
json.dump(full_document, outfile, indent=4)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Generate text verion of tests')
test_file = parser.add_argument(
'--input',
type=str,
nargs='?',
help='the file to generate text version of tests from'
)
args = parser.parse_args()
if args.input is None:
print '''No file selected'''
else:
main(args.input)
To run it, I would execute the following in a bash script:
python /path/to/this/script.py --input='/path/to/test.xlsx'
I would like the text file it produces to be committed along with the xlsx binary just so that we get a git change history.
I have been reading about pre-commit hooks (something I am happy to set up on each of the contributor's machine, but I am struggling to establish whether it will include an extra generated file into the commit or not. Any direction would be great as I am not really sure what to try at the moment.
I also have to work out whether I can iterate over the list of files in the commit with extension *.xlsx and pass in the path as an argument to the python script above, in a loop. Is what I am asking doable in this way?

I worked out a satisfactory solution. I created two commit hooks.
A pre-commit hook:
#!/bin/bash
echo
touch .commit
exit
And a post-commit hook:
#!/bin/bash
if [ -e .commit ]
then
rm .commit
git diff-tree --no-commit-id --name-only -r HEAD | while read i; do
if [[ "${i}" == *.xlsx ]]; then
echo
echo ➤ Found an Excel file: $(basename "${i}") in the git diff-tree
if [ ! -f "${i}" ]; then
echo ➤ $(basename "${i}") was deleted. Not generating a text interpretation
if [ -f "${i}.json" ]; then
echo ➤ Found an orphaned text interpretation: $(basename "${i}").json
rm "${i}.json"
echo ➤ Removed $(basename "${i}").json
git rm -q "${i}.json"
echo ➤ Removed reference to $(basename "${i}").json from git
fi
else
python "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )/../../relative/path/to/script.py" --input="$(pwd)/${i}"
echo ➤ Generated text interpretation: $(basename "${i}").json
git add "${i}.json"
echo ➤ Added $(basename "${i}").json to git
fi
fi
done
echo -----------------------------------------------------------------------
git commit --amend -C HEAD --no-verify
echo ➤ Ammended commit above ⬏
echo -----------------------------------------------------------------------
echo ➤ Initial commit below ⬎
fi
exit
Currently I can see an additional problem, to do with committing an added/modified file. A contributor will only be able to commit from the base directory of the repository.
I will update with changes once I work them out.

Removing non-alphanumeric characters with bash or python

I have much files like these(please see the screenshot):
30.230201521829.jpg
Mens-Sunglasses_L.180022111040.jpg
progressive-sunglasses.180041285287.jpg
Atmosphere.222314222509.jpg
Womens-Sunglasses-L.180023271958.jpg
DAILY ESSENTIALS.211919012115.jpg
aviator-l.Sunglasses.240202216759.jpg
aviator-l.Sunglasses.women.240202218530.jpg
I want to raname them to the following:
230201521829.jpg
180022111040.jpg
180041285287.jpg
222314222509.jpg
172254027299.jpg
211919012115.jpg
240202216759.jpg
240202218530.jpg
230201521829 is a timestamp ,180022111040 is a timestamp,180041285287 is a timestamp, etc.
Ensure that the final file name looks like "timestamp.jpg".
But I am not able to write the script more.
Sed(Bash) command or Python can be used to do it?
Could you give me a example? Thanks.

Using command substitution for renaming the file. Following code will loop to the current directory's (unless path is modified) jpg files.
Awk is used to filter out the penultimate and last column of file name which are separated by "." .
for file in *.jpg
do
mv "$file" $(echo "$file" |awk -F'.' '{print $(NF-1)"." $NF}')
done

I use python
examp.
import os
import sys
import glob
pth = "C:\Users\Test"
dir_show = os.listdir(pth)
for list_file in dir_show:
if list_file.endswith(".JPG"):
(shrname, exts) = os.path.splitext(list_file)
path = os.path.join(pth, list_file)
newname=os.path.join(pth,shrname[shrname.find(".")+1:len(shrname)]+".JPG")
os.rename(path,newname)

Using perl rename one-liner:
$ touch 30.230201521829.jpg Mens-Sunglasses_L.180022111040.jpg progressive-sunglasses.180041285287.jpg Atmosphere.222314222509.jpg Womens-Sunglasses-L.180023271958.jpg Womens-Eyeglasses-R.172254027299.jpg
$ ls -1
30.230201521829.jpg
Atmosphere.222314222509.jpg
Mens-Sunglasses_L.180022111040.jpg
progressive-sunglasses.180041285287.jpg
Womens-Eyeglasses-R.172254027299.jpg
Womens-Sunglasses-L.180023271958.jpg
$ prename -v 's/^[^.]*\.//' *.*.jpg
30.230201521829.jpg renamed as 230201521829.jpg
Atmosphere.222314222509.jpg renamed as 222314222509.jpg
Mens-Sunglasses_L.180022111040.jpg renamed as 180022111040.jpg
progressive-sunglasses.180041285287.jpg renamed as 180041285287.jpg
Womens-Eyeglasses-R.172254027299.jpg renamed as 172254027299.jpg
Womens-Sunglasses-L.180023271958.jpg renamed as 180023271958.jpg

You can use parameter expansion to strip off the extension, then
remove all but the last .-delimited field from the remaining name. After than, you can reapply the extension.
for f in *; do
ext=${f##*.}
base=${f%.$ext}
mv -- "$f" "${base##*.}.$ext"
done
The first line sets ext to the string following the last .. The second line sets base to the string that precedes the last . (by removing the last . and whatever $ext was set to). The third line constructs a new file name by first removing everything up to, and including, the final . in base, then reapplying the extension to the result.

#!/bin/bash/
echo "test: "
echo "" > 30.230201521829.jpg
echo "" > Mens-Sunglasses_L.180022111040.jpg
echo "" > progressive-sunglasses.180041285287.jpg
echo "" > Atmosphere.222314222509.jpg
echo "" > Womens-Sunglasses-L.180023271958.jpg
echo "" > DAILY\ ESSENTIALS.211919012115.jpg
echo "" > aviator-l.Sunglasses.240202216759.jpg
echo "" > aviator-l.Sunglasses.women.240202218530.jpg
echo "before: "
ls -ltr
for f in *.jpg; do
renamed=${f: -16}
mv "${f}" "${renamed}"
done

How do I convert this bash loop to python?

How would I do file reading loop in python? I'm trying to convert my bash script to python but have never written python before. FYI, the reason I am read reading the file after a successful command competition is to make sure it reads the most recent edit (say if the URLs were reordered).
Thanks!
#!/bin/bash
FILE=$1
declare -A SUCCESS=()
declare -A FAILED=()
for (( ;; )); do
# find a new link
cat "$FILE" > temp.txt
HASNEW=false
while read; do
[[ -z $REPLY || -n ${SUCCESS[$REPLY]} || -n ${FAILED[$REPLY]} ]] && continue
HASNEW=true
break
done < temp.txt
[[ $HASNEW = true ]] || break
# download
if axel --alternate --num-connections=6 "$REPLY"; then
echo
echo "Succeeded at $DATETIME downloading following link $REPLY"
echo "$DATETIME Finished: $REPLY" >> downloaded-links.txt
echo
SUCCESS[$REPLY]=.
else
echo
echo "Failed at $DATETIME to download following link $REPLY"
echo "$DATETIME Failed: $REPLY" >> failed-links.txt
FAILED[$REPLY]=.
fi
# refresh file
cat "$FILE" > temp.txt
while read; do
[[ -z ${SUCCESS[REPLY]} ]] && echo "$REPLY"
done < temp.txt > "$FILE"
done
This is what I've got so far which is working, and I can't figure out how to make it read the top line of the file after every successful execution of the axel line like the bash script does. I'm open to other options on the subprocess call such as threading, but I'm not sure how to make that work.
#!/usr/bin/env python
import subprocess
from optparse import OptionParser
# create command line variables
axel = "axel --alternate --num-connections=6 "
usage = "usage: %prog [options] ListFile.txt"
parser = OptionParser(usage=usage)
parser.add_option("-s", "--speed", dest="speed",
help="speed in bits per second i.e. 51200 is 50kps", metavar="speedkbps")
(opts, args) = parser.parse_args()
if args[0] is None:
print "No list file given\n"
parser.print_help()
exit(-1)
list_file_1 = args[0]
try:
opts.speed
except NoSpeed:
with open(list_file_1, 'r+') as f:
for line in f:
axel_call = axel + "--max-speed=" + opts.speed + " " + line
# print ("speed option set line send to subprocess is: " + axel_call)
subprocess.call(axel_call, shell=True)
else:
with open(list_file_1, 'r+') as f:
for line in f:
axel_call = axel + line
# print ("no speed option set line send to subprocess is:" + axel_call)
subprocess.call(axel_call, shell=True)

Fully Pythonic way to read a file is the following:
with open(...) as f:
for line in f:
<do something with line>
The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
There should be one -- and preferably only one -- obvious way to do it.

A demonstration of using a loop to read a file is shown at http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bash to Python: flatten directory tree - python

Related

Get a string in Shell/Python using sys.argv

search strings in multiple textfile

Is there a way to force an auto-generated file to be committed along with the rest of the commit that triggers its generation?

Removing non-alphanumeric characters with bash or python

How do I convert this bash loop to python?

Categories

Resources