Removing randomly generated file extensions from .jpg files using python - python

I recently recovered a folder that i had accidentally deleted. It has .jpg and .tar.gz files. However, all of the files now have some sort of hash extension appended to them and it is different for every file. There are more than 600 files in the folders. So example names would be:
IMG001.jpg.3454637876876978068
IMG002.jpg.2345447786787689769
IMG003.jpg.3454356457657757876
and
folder1.tar.gz.45645756765876
folder2.tar.gz.53464575678588
folder3.tar.gz.42345435647567
I would like to have a script that could go in turn (maybe i can specify extension or it can have two iterations, one through the .jpg files and the other through the .tar.gz) and clean up the last part of the file name starting from the . right before the number. So the final file names would end in .jpg and .tar.gz
What I have so far in python:
import os
def scandirs(path):
for root, dirs, files in os.walk(path):
for currentFile in files:
os.path.splitext(currentFile)
scandirs('C:\Users\ad\pics')
Obviously it doesn't work. I would appreciate any help. I would also consider using a bash script, but I do not know how to do that.

shutil.move(currentFile,os.path.splitext(currentFile)[0])
at least I think ...

Here is how I would do it, using regular expressions:
import os
import re
pattern = re.compile(r'^(.*)\.\d+$')
def scandirs(path):
for root, dirs, files in os.walk(path):
for currentFile in files:
match = pattern.match(currentFile)
if match:
os.rename(
os.path.join(root, currentFile),
os.path.join(root, match.groups(1)[0])
)
scandirs('C:/Users/ad/pics')

Since you tagged with bash I will give you an answer that will remove the last extension for all files/directories in a directory:
for f in *; do
mv "$f" "${f%.*}"
done

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Apply procedure to files in many subdirectories

I am trying to apply a procedure to thousands of files, but in many subdirectories.
I was thinking using os.listdir() first to list all subdirectories, than go look in each subdirectory and apply my procedure. My arborescence is as follow:
subdir1 -> file, file, file, .....
subdir2 -> file, file, file, .....
Directory -> subdir3 -> file, file, file, .....
subdir4 -> file, file, file, .....
subdir5 -> file, file, file, .....
I can access the list of subdir with os.listdir() but not the files in the subdirectories, do you have an idea how to proceed ?
Thanks
EDIT:
When using MikeH method, in my case:
import os
from astropy.io import fits
ROOT_DIR='./'
for dirName, subdirList, fileList in os.walk(ROOT_DIR):
for fname in fileList:
hdul = fits.open(fname)
I get the error:
FileNotFoundError: [Errno 2] No such file or directory: 'lte08600-2.00+0.5.Alpha=+0.50.PHOENIX-ACES-AGSS-COND-2011-HiRes.fits'
And indeed if I try to check the path on the file, with print(os.path.abspath(fname) I can see that the path is wrong, it misses the subdirectories like /root/dir/fnam instead of root/dir/subdir/fname
What is wrong in this ?
EDIT2:
That's it I found out what was wrong, I have to join the path of the file, writing os.path.join(dirName,fname) instead of just fname each time.
Thanks !
Something like this should work for you:
import os
ROOT_DIR='./'
for dirName, subdirList, fileList in os.walk(ROOT_DIR):
for fname in fileList:
# fully qualified file name is ROOT_DIR/dirname/fname
performFunction(dirName, fname)

First Practice Project in Automate the Boring Stuff with Python, Ch. 9

So my friend and I have been having a problem with the first practice project of the above chapter of Automate the Boring Stuff with Python. The prompt goes: "Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from whatever location they are in to a new folder."
To simplify, we are trying to write a program that copies all of the .jpg files out of My Pictures to another directory. Here's our code:
#! python3
# moveFileType looks in My Puctures and copies .jpg files to my Python folder
import os, shutil
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for subfolder in subfolders:
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(folder + filename, '<destination>')
moveFileType('<source>')
We keep getting an error along the lines of "FileNotFoundError: [Errno 2] No such file or directory".
Edit: I added a "\" to the end of my source path (I'm not sure if that is what you meant, #Jacob H), and was able to copy all of the .jpg files in that directory, but received an error when it tried to copy a file within a subfolder of that directory. I added a for loop for subfolder in subfolders and I no longer get any errors, but it doesn't actually look in the subfolders for .jpg files.
There is a more fundamental problem with your code. When you use os.walk() it will already loop through every directory for you, so looping manually through the subfolders is going to produce the same results multiple times.
The other, and more immediate, problem is that os.walk() produces relative file names, so you need to glue them back together. Basically you are omitting the directory name and looking in the current directory for files which os.walk() is finding down in a subdirectory somewhere.
Here's a quick attempt at fixing your code:
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(os.path.join(folderName, filename), '<destination>')
Making the function accept a destination parameter as a second argument, instead of hardcoding <destination>, would make it a lot more useful for the future.
Make sure to type the source file destination address correctly. While i tested your code, i wrote
moveFileType('/home/anum/Pictures')
and i got error;
IOError: [Errno 2] No such file or directory:
and when i wrote
moveFileType('/home/anum/Pictures/')
the code worked perfectly...
Try doing that, hope that will do your work. M using Python 2.7
Herez the re defined code for walking into subfolders and copying ,jpg files from there aswell.
import os, shutil
def moveFileType(folder):
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith('.jpg'):
image_path=os.path.join(root,file) # get the path location of each jpeg image.
print 'location: ',image_path
shutil.copy(image_path, '/home/anum/Documents/Stackoverflow questions')
moveFileType('/home/anum/Pictures/')

How to list only regular files (excluding directories) under a directory in Python

One can use os.listdir('somedir') to get all the files under somedir. However, if what I want is just regular files (excluding directories) like the result of find . -type f under shell.
I know one can use [path for path in os.listdir('somedir') if not os.path.isdir('somedir/'+path)] to achieve similar result as in this related question: How to list only top level directories in Python?. Just wondering if there are more succinct ways to do so.
You could use os.walk, which returns a tuple of path, folders and files:
files = next(os.walk('somedir'))[2]
I have a couple of ways that i do such tasks. I cannot comment on the succinct nature of the solution. FWIW here they are:
1.the code below will take all files that end with .txt. you may want to remove the ".endswith" part
import os
for root, dirs, files in os.walk('./'): #current directory in terminal
for file in files:
if file.endswith('.txt'):
#here you can do whatever you want to with the file.
2.This code here will assume that the path is provided to the function and will append all .txt files to a list and if there are subdirectories in the path, it will append those files in the subdirectories to subfiles
def readFilesNameList(self, path):
basePath = path
allfiles = []
subfiles = []
for root, dirs, files in os.walk(basePath):
for f in files:
if f.endswith('.txt'):
allfiles.append(os.path.join(root,f))
if root!=basePath:
subfiles.append(os.path.join(root, f))
I know the code is just skeletal in nature but i think you can get the general picture.
post if you find the succinct way! :)
The earlier os.walk answer is perfect if you only want the files in the top-level directory. If you want subdirectories' files too, though (a la find), you need to process each directory, e.g.:
def find_files(path):
for prefix, _, files in os.walk(path):
for name in files:
yield os.path.join(prefix, name)
Now list(find_files('.')) is a list of the same thing find . -type f -print would have given you (the list is because find_files is a generator, in case that's not obvious).

Categories

Resources