Renaming files recursively to make the filename a concatenation of their path - python

Sorry if the title is unclear. An example folder structure to help understand:
/images/icons/654/323/64/64/icon.png
/images/icons/837/283/64/64/icon.png
to be renamed to
/images-icons-654-323-64-64-icon.png
/images-icons-837-283-64-64-icon.png
I'm not great at bash so all I have to start with is:
find . -name "*.png"
which will find all of the files, which I then am hoping to using -exec rename with, or whatever works. also open to using any language to get the job done!

Solution in bash:
for f in $(find images_dir -type f); do
mv -v "$f" ${f//\//-}
done
This finds all files in the images_dir directory, replaces any / in their path with - thanks to the parameter expansion, and moves the file to the new path.
For example, file images_dir/icons/654/321/b.png will be moved to images_dir-icons-654-321-a.png.
Note that if you execute find ., you will encounter an issue as find outputs filenames starting with ./, which means your files will be renamed to something like .-<filename>.
As #gniourf_gniourf notices in the comments, this will fail if your file names include spaces or newlines.
Whitespace-proof:
find images_dir -type f -exec bash -c 'for i; do mv -v "$i" "${i//\//-}; done' _ {} +

In python you could do it like so:
import fnmatch
import os
def find(base_dir, some_string):
matches = []
for root, dirnames, filenames in os.walk(base_dir):
for filename in fnmatch.filter(filenames, some_string):
matches.append(os.path.join(root, filename))
return matches
files = find('.', '*.png')
new_files = ['-'.join(ele.split('/')[1:]) for ele in files]
for idx, ele in enumerate(files):
os.rename(ele, new_files[idx])
And to give proper credit, the find function I took from this answer.

This should do it for you:
for file in `find image -iname "*.png"`
do
newfile=`echo "$file" | sed 's=/=-=g'`
mv -v $file $newfile
done
The backtick find command will expand to a list of files ending with ".png", the -iname performs the search in a case independent manner.
The sed command will replace all slashes with dashes, resulting in the new target name.
The mv does the heavy lifting. The -v is optional and will cause a verbose move.
To debug, you can put an echo statement in front of the mv.

Related

os.listdir print more files than command `ls` but less than `ls -a`

I'd like to count the command in path /Users/me/anaconda3/bin
In [3]: len(os.listdir("/Users/me/anaconda3/bin"))
Out[3]: 474
However, when I check with commands
In [5]: !count=0; for f in $(ls /Users/me/anaconda3/bin) ;do count=$(( $count + 1)); done; echo $count
470
However, if check all the files:
In [17]: ls -a /Users/me/anaconda3/bin | wc -l
476
What's the reason cause the difference?
Its easy if you read the documentation of os.listdir
Return a list containing the names of the entries in the directory
given by path. The list is in arbitrary order, and does not include
the special entries '.' and '..' even if they are present in the
directory.
That means the os.listdir command always has
no_of_elements_in(`ls -a`)-no_of_elements_in(".. and .")
that is
len('os.listdir') =no_of_elements_in(`ls -a`)-2
In your case 474=476-2

Command working in bash terminal but not in Python subprocess.Popen(); getting a 'paths must precede expression: `%p' error

I'm trying to find the location of the file that has been most recently modified. In bash, you can do this through
find /media/tiwa/usb/ -not -path '*/\.*' -type f -printf '%T# %p\n' 2> >(grep -v 'Permission denied' >&2) | sort -k1,1nr | head -1`
Indeed, on my system, this returns
1527379702.1060795850 /media/tiwa/usb/hi.txt
I intend to take the output of this command (within Python), split it on the first space, and parse the file path (yes, I could use awk, but the same errors get thrown regardless). So I did
import subprocess
bashCommand = "find /media/tiwa/usb/ -not -path '*/\.*' -type f -printf '%T# %p\n' 2> >(grep -v 'Permission denied' >&2) | sort -k1,1nr | head -1"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
print(output)
but this prints out
find: paths must precede expression: `%p'
Escaping the backslashes doesn't appear to help either.
What is causing this issue, and how do I solve it?
You have an entire shell command line, not just a single command plus its arguments, which means you need to use the shell=True option instead of (erroneously) splitting the string into multiple strings. (Python string splitting is not equivalent to the shell's word splitting, which is much more involved and complicated.) Further, since your command line contains bash-specific features, you need to tell Popen to use /bin/bash explicitly, rather than the default /bin/sh.
import subprocess
bashCommand = "find /media/tiwa/usb/ -not -path '*/\.*' -type f -printf '%T# %p\n' 2> >(grep -v 'Permission denied' >&2) | sort -k1,1nr | head -1"
path_to_bash = "/bin/bash" # or whatever is appropriate
process = subprocess.Popen(bashCommand,
stdout=subprocess.PIPE,
shell=True,
executable=path_to_bash)
output, error = process.communicate()
print(output)
(It would, however, be simpler and more robust to use os.walk() to get each file, and use os.stat() to get the modification time of each relevant file, and only keep the newest file found so far until you have examined every file.
import os
newest = (0, "")
for (dir, subdirs, fname) in os.walk("/media/tiwa/usb"):
if fname.startswith(".") or not os.path.isfile(fname):
continue
mtime = os.stat(fname).st_mtime
if mtime > newest[0]:
newest = (mtime, fname)
Or perhaps
def names_and_times(d):
for (_, _, fname) in os.walk(d):
if fname.startswith(".") or not os.path.isfile(fname):
continue
yield (os.stat(fname).st_mtime), fname)
newest = max(names_and_times("/media/tiwa/usb"))
)
Keep in mind that any of the preceding approaches will only return one file with the newest modification time.

How to delete files and folders but keep directory structure and leave behind a empty file instead of file itself?

I want to delete files and folders but leave directory structure intact.
But also I need to keep name of files in their current path. Something like, leaving behind a empty text with same name of file instead of that file itself.
My drive format is NTFS.
You can use os.walk to browse your directory structure and remplace each file with an empty one (overwrite the file:
import io
import os
work_dir = '.'
for root, dirnames, filenames in os.walk(work_dir):
for filename in filenames:
path = os.path.join(root, filename)
io.open(path, mode='w').close()
See the documentation: https://docs.python.org/3/library/os.html
In Bash 4+ you can do the following to zero out all the files under a certain path:
shopt -s globstar
for file in /mnt/c/path/to/clean/**; do
[[ -f $file ]] && : > "$file"
done
For a windows cmd solution
for /r "x:\path\to\clean" %a in (*) do ">%a" type nul
To run it from a batch file, percent signs need to be escaped (doubling them)
for /r "x:\path\to\clean" %%a in (*) do ">%%a" type nul
In Windows cmd, you might want to use the robocopy command:
rem // Create copy with zero-length files at temporary location:
robocopy "\path\to\your\dir" "\path\to\temp\dir" *.* /E /CREATE
rem // Move tree at temporary location onto original location:
robocopy "\path\to\temp\dir" "\path\to\your\dir" *.* /E /MOVE /IS

bash: copy files with the same pattern

I want to copy files scattered in separate directories into a single directory.
find . -name "*.off" > offFile
while read line; do
cp "${line}" offModels #offModels is the destination directory
done < offFile
While the file offFile has 1831 lines, but cd offModels and ls | wc -l gives 1827. I think four files end with ".off" are not copied.
At first, I think that because I use the double quote in shell script, files with names which contain dollor sign, backtick or backslash may be missed. Then I find one file named $.... But how to find another three? After cd offModels and ls > ../File, I write a python script like this:
fname1="offFile" #records files scattered
with open(fname1) as f1:
contents1=f1.readlines()
fname2="File"
with open(fname2) as f2:
contents2=f2.readlines()
visited=[0]*len(contents1)
for substr in contents2:
substr="/"+substr
for i, string in enumerate(contents1):
if string.find(substr)>=0:
visited[i]=1
break
for i,j in enumerate(visited):
if j==0:
print contents1[i]
The output gives four lines while they are wrong. But I can find all the four files in the destination directory.
Edit
As the comment and answers point out, there are four files with duplicated names with other four. One thing interest me now is that, with the bash script I used, the file with name $CROSS.off is copied. That really suprised me.
Looks like you have files with the same filenames, and cp just overwrites them.
You can use the --backup=numbered option for cp; here is a one-liner:
find -name '*.off' -exec cp --backup=numbered '{}' '/full/path/to/offModels' ';'
The -exec option allows you to execute a command on every file matched; you should use {} to get the file's name and end the command with ; (usually written as \; or ';', because bash treats semicolons as command separators).

Find files with same name but different content

I need to find files with the same name but different content in a linux folder structure with a lot of files.
Something like this does the job partially, how do i eliminate files with different content?
#!/bin/sh
dirname=/path/to/directory
find $dirname -type f | sed 's_.*/__' | sort| uniq -d|
while read fileName
do
find $dirname -type f | grep "$fileName"
done
(How to find duplicate filenames (recursively) in a given directory? BASH)
Thanks so much !
The first question is, how can you determine whether two files have the same content?
One obviously possibility is to read (or mmap) both files and compare them a block at a time. On some platforms, a stat is a lot faster than a read, so you may want to first compare sizes. And there are other optimizations that might be useful, depending on what you're actually doing (e.g., if you're going to run this thousands of times, and most of the files are the same every time, you could hash them and cache the hashes, and only check the actual files when the hashes match). But I doubt you're too worried about that kind of performance tweak if your existing code is acceptable (since it searches the whole tree once for every file in the tree), so let's just do the simplest thing.
Here's one way to do it in Python:
#!/usr/bin/env python3
import sys
def readfile(path):
with open(path, 'rb') as f:
return f.read()
contents = [readfile(fname) for fname in sys.argv[1:]]
sys.exit(all(content == contents[0] for content in contents[1:]))
This will exit with code 1 if all files are identical, code 0 if any pair of files are different. So, save this as allequal.py, make it executable, and your bash code can just run allequal.py on the results of that grep, and use the exit value (e.g., via $?) to decide whether to print those results for you.
I am facing the same problem as described in the question. In a large directory tree, some files have the same name and either same content or different content. The ones where the content differs need human attention to decide how to fix the situation in each case. I need to create a list of these files to guide the person doing this.
The code in the question and the code in the abernet's response are both helpful. Here is how one would combine both: Store the python code from abernet's response in some file, e.g. /usr/local/bin/do_these_files_have_different_content:
sudo tee /usr/local/bin/do_these_files_have_different_content <<EOF
#!/usr/bin/env python3
import sys
def readfile(path):
with open(path, 'rb') as f:
return f.read()
contents = [readfile(fname) for fname in sys.argv[1:]]
sys.exit(all(content == contents[0] for content in contents[1:]))
EOF
sudo chmod a+x /usr/local/bin/do_these_files_have_different_content
Then extend the bash code from Illusionist's question to call this program when needed, and react on its outcome:
#!/bin/sh
dirname=$1
find $dirname -type f | sed 's_.*/__' | sort| uniq -d|
while read fileName
do
if do_these_files_have_different_content $(find $dirname -type f | grep "$fileName")
then find $dirname -type f | grep "$fileName"
echo
fi
done
This will write to stdout the paths of all files with same name but different content. Groups of files with same name but different content are separated by empty lines. I store the shell script in /usr/local/bin/find_files_with_same_name_but_different_content and invoke it as
find_files_with_same_name_but_different_content /path/to/my/storage/directory

Categories

Resources