Fastest way to merge a directory tree - python

I have multiple directories of the form
foo/bar/baz/alpha_1/beta/gamma/files/uniqueFile1
foo/bar/baz/alpha_2/beta/gamma/files/uniqueFile2
foo/bar/baz/alpha_3/beta/gamma/files/uniqueFile3
What is the fastest way to merge these directories to a single directory structure like
foo/bar/baz/alpha/beta/gamma/files/uniqueFile1...uniqueFile3
I could write a python script to do that but is there a faster way to do that on a debian machine ? Can rsync help in this case ?
EDIT:
Apologies for not making it clear earlier, the depth in the examples is ~10-12 and I do not know the some directory names such as alpha*, these are randomly generated while throwing out logs. I was using find with wildcards to list these files earlier but now another level has been added in the path, that caused my find queries to take over a minute from 0.004s. So I am looking for a faster solution.
/known_fixed_path_5_levels/*/known_name*/*/fixed_path_2_levels/n_unique_files
has become
/known_fixed_path_5_levels/*/known_name*/*/xx*/fixed_path_2_levels/unique_file_1
/known_fixed_path_5_levels/*/known_name*/*/xx*/fixed_path_2_levels/unique_file_2
.
.
/known_fixed_path_5_levels/*/known_name*/*/xx*/fixed_path_2_levels/unique_file_n
I basically want to collect all those unique files into one place like how it was before.

With find:
mkdir --parents foo/bar/baz/alpha/beta/gamma/files; #create target directory if nessessary
find foo/bar/baz/alpha_[1-3]/beta/gamma/files -type f -exec cp {} foo/bar/baz/alpha/beta/gamma/files \;

As question is not clear about copying or moving, there is two ways, without copy! Even second part don't effectively copy your data!
Simple bash command
Simply:
cd foo/bar/baz
mv -it alpha/beta/gamma/files alpha_*/beta/gamma/files/uniqueFile*
with -i switch to prevent overwritting.
This will work perfectly for small bunch of files.
More robust and adaptive find syntax
Or by using find:
cd foo/bar/baz
find alpha_* -type f -mindepth 3 -exec mv -it alpha/beta/gamma/files {} +
Advantage of using find are
you could add a lot of flags like -name, -mtime and so on
find will never try to pass more files to command (mv) that command line could hold.
cp -al specific UN*X concept
Under Un*x, you could create hard-link wich is not symbolic links, but a secondary entry in folder tree, for the same inode.
Nota: As only one inode has to be referenced, this could work only on same filesystem.
By using
cp -ialt alpha/beta/gamma/files alpha_*/beta/gamma/files/uniqueFile*
You will copy in one directory all inodes references, but keeping only one file for each.
Using bash's globstar feature:
cd foo/bar/baz
shopt -s globstar
cp -alit alpha/beta/gamma/files alpha_*/**/uniqueFile*

Related

What can I safely remove in a python lib folder?

I am using:
mkdir -p build/python/lib/python3.6/site-packages
pipenv run pip install -r requirements.txt --target build/python/lib/python3.6/site-packages
to create a directory build with everything I need for my python project but I also need to save as much space as possible.
What can I safely remove in order to save space?
Maybe can I do find build -type d -iname "*.dist-info" -exec rm -R {} \; ?
Can I remove *.py if I leave *.pyc?
Thanks
Perhaps platform specific *.exe files, if your project doesn't need to run on Windows:
How to prevent *.exe ...
Delete *.pyc (byte-compiled files), with an impact to load-time: 100% supported, unlike your trick of the reverse: retain just *.pyc (and delete most *.py sources) in some python versions; not safe IMHO but never tried it.

How does im2rec works? I keep getting syntax error

I am trying to create lst files for aws image classification algorithm.
My main directory is train which has 20 sub-directories of 40 images each.
I want to create a train_1st which contains all the converted lst files.
But I am getting issues with the below code. I'm new to this .. So please help me.. what do i do?
I have tried changing the current working directory(cwd) as well. I tried setting cwd as train/ and also actual directory home/ec-2/sagemaker. Nothing helped.
%%bash
mkdir -p train_lst
for i in train/*; do
c=`basename $i`
mkdir -p train_lst/$c
for j in `ls $i/*.jpg | shuf | head -n 60`; do
mv $j train_lst/$c/
done
done
python im2rec.py --list --recursive train train_lst/
ls: cannot access train/*/*.jpg: No such file or directory
The error message tells us, that the variable i must already contain train/*, i.e. an unexpanded glob pattern. This means that there are no subdirectories below $PWD/train.
You can verify this by turning on
shopt -s failglob
at the start of your script. This will print an error message, whenever a pattern can not be expanded.
BTW, what is the weird %%bash in your script supposed to do?

Fortify scan for python project

Hot to generate Fortify for file for python files.
A similar question is Fortify, how to start analysis through command but it lists the steps for java.
To generate reports for python project, --python-path has to be used.
I tried following steps, but did not work.
Step 1: Clean,build
sourceanalyzer -64 -Xms1024M -Xmx10000M -b -verbose -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/working/9999/working/sca.log -clean
Step 2: Scan: This step should generate fpr file
sourceanalyzer -b 9999 -verbose -Xms1024M -Xmx10000M -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/9999/sca.log -python-path /path/to/python -f projec_999.fpr /local/proj/**/*.py
This did not generate any fpr file.
The second step gives the warning as:
[warning]: The -f option has no effect without the -scan option
[warning]: You may need to add some arguments to the -python-path argument to SCA.
I am not sure if I am using the correct command.
How to make sure that all python files are being scanned in the directory and subdirectories?
Is there any option to add multiple python paths?
The first step you did only does Clean, not the build step.
To perform the translation step for Python you need to specify the directories for the any Python references (-python-path) as well as the files to translate.
I am also not sure what you are doing with the ProjectRoot and WorkingDirectory, you know these are used to store temp data/intermediate files for sourceanalyzer and not the location of your source code, correct?
Something like
sourceanalyzer -b <buildId> -python-path <directories> <files to scan>'
<buildId> can be used to group different projects, you are somewhat doing this yourself when you do the ProjectRoot and WorkingDirectory (I am not sure if you need them both, can't remember and I no longer have access to test it out)
<directories> - this is where you can list out the directories that would normally be in your PythonPath environment variable (you might be able to actually call it here and save a lot of hassle). This is a comma-seperated list for Windows and a colon-seperated list for Linux
<files to scan> this is where you specify the files you want to translate/scan. You can specify individual files or use wildcard characters (* and **/* [recursive])
A sample command would look like:
sourceanalyzer -b MyApp -python-path %PYTHONPATH% ./MyApp/**/*
The other options you are putting in can be used and it would look something like this:
sourceanalyzer -b MyApp -Xms1024M -Xmx10G -logfile /local/proj/working/9999/working/sca.log -python-path %PYTHONPATH% ./MyApp/**/*
It is at this step you would check to see what files we translated from your program:
sourceanalyzer -b MyApp -show-files
Then you would perform the scan command
sourceanalyzer -b MyApp -logfile /local/proj/working/9999/working/sca.log -scan -f project.fpr
You may apply -python-path multiple times. This solves the problem which separator to use. The list of needed directories may be obtained with python:
import sys
print(sys.path)

subprocess cp leaves some files empty

I'm trying to copy some files from one directory to another. I want all files in one directory to end up in the root of another directory.
This command does exactly what I want when I run it in the terminal:
cp -rv ./src/CopyPasteIntoBuildDir/* ./build-root/src/
This line of python, however, copies most of the files just like the above command, but it leaves some of the new files empty. Specifically, files in subdirectories are left empty.
subprocess.check_call("cp -rv ./src/CopyPasteIntoBuildDir/* ./build-root/src/", shell=True)
It creates the files if they're not there, and it truncates them if they are.
What is going on?
Assuming that you're decided to use cp rather than native Python operations --
This code will be much more reliable if you write it to not invoke any shell whatsoever. To avoid the need for /* on the source (and the side effects of this -- ie. refusal to copy directories whose names exceed the ARG_MAX combined environment and command-line size storage limit), use . as the last element of the name of the directory whose contents are to be copied, instead of passing a wildcard that needs to be expanded by a shell.
subprocess.check_call(["cp", "-R", "--", '%s/.' % src, dest])
The use of cp -R rather than cp -rv is on account of -R, but not -r, being POSIX-standardized (and thus portable across all compliant UNIXlike platforms).
Demonstrating In Action (copy/pasteable code)
tempdir=$(mktemp -d -t testdir.XXXXXX)
trap 'rm -rf "$tempdir"' EXIT
cd "$tempdir"
mkdir -p ./src/CopyPasteIntoBuildDir/subdir-1 ./build-root/src/
touch ./src/CopyPasteIntoBuildDir/file-1
touch ./src/CopyPasteIntoBuildDir/subdir-1/file-2
script='
import sys, shutil, subprocess
src = sys.argv[1]
dest = sys.argv[2]
subprocess.check_call(["cp", "-R", "--", "%s/." % src, dest])
'
python -c "$script" ./src/CopyPasteIntoBuildDir ./build-root/src/
find ./build-root -type f -print
rm -rf "$tempdir"
...emits output akin to:
./build-root/src/file-1
./build-root/src/subdir-1/file-2
...showing that content was correctly recursively copied with no prefix.
So apparently this is a problem with sh. Using bash instead worked.
subprocess.check_call("cp -rv ./src/CopyPasteIntoBuildDir/* ./build-root/src/", shell=True, executable="/bin/bash")
EDIT: See accepted answer!

Bash or Python : Append and prepend a string recursively in all .tex files

I'm trying to find a way to change my master/child document workflow in LaTeX using the package "subfiles", so I'm trying to append and prepend to every file I already have the following lines:
Lines to prepend:
\documentclass[<mainfile>]{subfiles}
\begin{document}
Line to appen:
\end{document}
I was thinking of using bash but I could be nice too with Python, I don't know what would be best.
If you have any suggestion :-) ?
An intuitive way is to use cat
For example, first create two files prepend.tex and append.tex that contain the content you want to add. Concatenate them with the source file to create the desired target file
$ cat prepend.tex source.tex append.tex > target.tex
To apply it recursively to all existing files inside a directory say src/ you can use:
$ find src/ -type f -name "*.tex" -exec sh -c "cat prepend.tex '{}' append.tex > '{}'.tmp" \;
This will create a new .tmp file alongside each source file. Make sure that the result in those .tmp files are exaclty what you want before proceeding to overwrite the source files with the .tmp ones:
$ find src/ -type f -name "*.tmp" -exec rename -n 's/.tmp$//' '{}' \;
change the option from -n to -f of the rename command to force overwrite

Categories

Resources