Moving half of the files from one directory into another - python

I am new to Python, and I am trying to use shutil to move files from one directory to another. I understand how to do this for one file or for the entire directory, but how can I do this if I want to only move some of the files. For example, if I have a directory of 50 files and I only want to move half of those 25. Is there a way to specify them instead of doing
shutil.move(source, destination)
25 times?

shutil.move() takes a single file or directory for an argument, so you can't move more than one at a time. However, this is what loops are for!
Basically, first generate a list of files in the directory using os.listdir(), then loop through half the list, moving each file, like so:
import os, shutil
srcPath = './oldPath/'
destPath = './newPath/'
files = os.listdir(srcPath)
for file in files[:len(files)//2]:
shutil.move(srcPath + file, destPath + file)
You didn't mention what to do if there was an odd number of files which didn't divide evenly, so I rounded down. You can round up by adding 1 after the integer division.
One caveat with that code, it will move half the items in the directory, including subdirectories. If you have only files, there will be no effect, but if there is, and you don't want to move subdirectories, you'll want to remove the subdirectories from the "files" list first.

Specify the files you want to move into a collection such as a list, and then if after Python 3.4, you can also use pathlib's class Path to move file.
from pathlib import Path
SRC_DIR = "/src-dir"
DST_DIR = "/dst-dir"
FILES_TO_MOVE = ["file1", "file2", "file3", ..]
for file in FILES_TO_MOVE:
Path(f"{SRC_DIR}/{file}").rename(f"{DST_DIR}/{file}")
https://docs.python.org/3.4/library/pathlib.html#pathlib.Path.rename

Related

os.listdir getting slower over different runs

dir_ = "/path/to/folder/with/huge/number/of/files"
subdirs = [os.path.join(dir_, file) for file in os.listdir(dir_)]
# one of subdirs contain huge number of files
files = [os.path.join(file, f) for file in subdirs for f in os.listdir(file)]
The code ran smoothly first few times under 30 seconds but over different runs of the same code, the time increased to 11 minutes and now not even running in 11 minutes. The problem is in the 3rd line and I suspect os.listdir for this.
EDIT: Just want to read the files so that it can be sent as argument to a multiprocessing function. RAM is also not an issue as RAM is ample and not even 1/10th of RAM is used by the program
It might leads that os.listdir(dir_) reads the entire directory tree and returns a list of all the files and subdirectories in dir_. This process can take a long time if the directory tree is very large or if the system is under heavy load.
But instead that use either below method or use walk() method.
dir_ = "/path/to/folder/with/huge/number/of/files"
subdirs = [os.path.join(dir_, file) for file in os.listdir(dir_)]
# Create an empty list to store the file paths
files = []
for subdir in subdirs:
# Use os.scandir() to iterate over the files and directories in the subdirectory
with os.scandir(subdir) as entries:
for entry in entries:
# Check if the entry is a regular file
if entry.is_file():
# Add the file path to the list
files.append(entry.path)

Simple Python program that checks in each subfolder how many files there are and which extensions the file contains

I am writing a simple python script that looks in the subfolders of the selected subfolder for files and summarizes which extensions are used and how many.
I am not really familiar with os.walk and I am really stuck with the "for file in files" section
`
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
`
I thought a for loop was the best option for the loop that summarizes the files and extensions but it only takes the last subfolder and gives a output of the files that are in the last map.
Does anybody maybe have a fix or a different aproach for it?
FULL CODE:
`
import os
# Set file path using / {End with /}
root_path="C:/Users/me/Documents/"
# Initialize variables to keep track of file counts
total_file_count=0
file_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
# Initialize a count for each file type
file_counts[subfolder] = {}
# Iterate through all files in the subfolder
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
# Print total file count
print(f"There are a total of {total_file_count} files.")
# Print the file counts for each subfolder
for subfolder, counts in file_counts.items():
print(f"In the {subfolder} subfolder:")
for extension, count in counts.items():
print(f"There are {count} .{extension} files")
`
Thank you in advance :)
If I understand correctly, you want to count the extensions in ALL subfolders of the given folder, but are only getting one folder. If that is indeed the problem, then the issue is this loop
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
You are iterating through os.walk, but you keep overwriting the subfolder variable. So while it will print out every subfolder, it will only remember the LAST subfolder it encounters - leading to the code returning only on subfolder.
Solution 1: Fix the loop
If you want to stick with os.walk, you just need to fix the loop. First things first - define files as a real variable. Don't rely on using the temporary variable from the loop. You actually already have this: file_counts!
Then, you need someway to save the files. I see that you want to split this up by subfolder, so what we can do is use file_counts, and use it to map each subfolder to a list of files (you are trying to do this, but are fundamentally misunderstanding some python code; see my note below about this).
So now, we have a dictionary mapping each subfolder to a list of files! We would just need to iterate through this and count the extensions. The final code looks something like this:
file_counts = {}
extension_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
subfolder = root.split("/")[-1]
file_counts[subfolder] = files
extensions_counts[subfolder]={}
# Iterate through all subfolders, and then through all files
for subfolder in file_counts:
for file in file_counts[subfolder]:
total_file_count += 1
extension = file.split(".")[-1]
if extension not in extension_counts[subfolder]:
extension_counts[subfolder][extension] = 1
else:
extension_counts[subfolder][extension] += 1
Solution 2: Use glob
Instead of os.walk, you can use the glob module, which will return a list of all files and directories wherever you search. It is a powerful tool that uses wildcard matching, and you can read about it here
Note
In your code, you write
# Initialize a count for each file type
file_counts[subfolder] = {}
Which feels like a MATLAB coding scheme. First, subfolder is a variable, and not a vector, so this would only initialize a count for a single file type (and even if it was a list, you get an unhashable type error). Second, this seems to stem from the idea that continuously assigning a variable in a loop builds a list instead of overwriting, which is not true. If you want to do that, you need to initialize an empty list, and use .append().
Note 2: Electric Boogaloo
There are two big ways to make this code good, and here are hints
Look into default dictionaries. They will make your code less redundant
Do you REALLY need to save the numbers and THEN count? What if you counted directly?
Rather than using os.walk you could use the rglob and glob methods of Path object. E.g.,
from pathlib import Path
root_path="C:/Users/me/Documents/"
# get a list of all the directories within root (and recursively within those subdirectories
dirs = [d for d in Path().rglob(root_path + "*") if d.is_dir()]
dirs.append(Path(root_path)) # append root directory
# loop through all directories
for curdir in dirs:
# get suffixes (i.e., extensions) of all files in the directory
suffixes = set([s.suffix for s in curdir.glob("*") if s.is_file()])
print(f"In the {curdir}:")
# loop through the suffixes
for suffix in suffixes:
# get all the files in the currect directory with that extension
suffiles = curdir.glob(f"*{suffix}")
print(f"There are {len(list(suffiles))} {suffix} files")

Copy random files by 5 each to different folder

I have a big list of files, that I was able to make in a random order (with a custom column "random number"). (I even made the list of these files in a txt list file for some reason).
But now I need to put them into.. lets see....740 files divide by 5...
into 148 new folders. Ok, I can make new 148 folders with an extDir utiity, but how can I copy each 5 files into a one of a 148 folders separately
so the 1-5 files go to the dir1
the 6-10 files go to dir2
11-15 to dir3
etc
Yes, I tried to do it manually.. but got lost..also I need to repeat the operation with different files about ten times.... I tried to use Python for this, but I am a beginning programmer.
All I have is the text file of all files in the folder, and now I need to separate it into "modules" by 5 files and copy each one into different ascending folders.
Assuming the files are stored in the directory files, the following code iterates over all files in the directory and moves them to dirX, incrementing X every five files.
import os
import shutil
filecounter = 0
dircounter = 0
directory = "files"
for file in os.listdir(directory):
absoluteFilename = os.path.join(directory, file)
if filecounter % 5 == 0: # increment dir counter every five files processed
dircounter += 1
os.mkdir(os.path.join(directory, "dir"+str(dircounter)))
targetfile = os.path.join(directory, "dir"+str(dircounter), file) # builds absolute target filename
shutil.move(absoluteFilename, targetfile)
filecounter += 1
This uses the module operator to increment the dircounter every five files.
Note that the order of the files is arbitrary (see os.listdir). You might have to sort the list beforehand.

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.
You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))
Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

Finding duplicate folders and renaming them by prefixing parent folder name in python

I have a folder structure as shown below
There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.
e.g.
DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder.
eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt
What I have done till now ?
I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.
import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".txt"):
path_file = os.path.join(root)
print(path_file)
list_dir.append(path_file)
The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.
import os
cwd = os.getcwd()
to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
if to_be_renamed == set():
to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
else:
to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])
for rootdir in next(os.walk(cwd))[1]:
subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
for s in subdirs:
if s in to_be_renamed:
srcpath = os.path.join(cwd, rootdir, s)
dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
# First rename files
for f in next(os.walk(srcpath))[2]:
os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
# Now rename dir
os.rename(srcpath, dstpath)
print('Renamed', s, 'and files')
Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).
Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.
os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.
So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.
If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.
NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.

Categories

Resources