Shutil multiple files after reading with "pydicom"

Shutil multiple files after reading with "pydicom" - python

What I basicalling want is for myvar to vary between 1-280 so that I can use this to read the file using pydicom. I.e. I want to read the files between /data/lfs2/model-mie/inputDataTest/subj2/mp2rage/0-280_tfl3d1.IMA. Then if M is true in gender then I want to shutil them into a folder. Doesnt seem to be working with count.
Thanks for the help!
from pydicom import dicomio
myvar = str(count(0))
import shutil
file = "/data/lfs2/model-mie/inputDataTest/subj2/mp2rage/" + myvar + "_tfl3d1.IMA"
ds = dicomio.read_file(file)
gender = ds.PatientSex
print(gender)
if gender == "M":
shutil.copy(file, "/mnt/nethomes/s4232182/Desktop/New")

I think the range() function should do what you want, something like this:
import shutil
from pydicom import dicomio
for i in range(281):
filename = "/data/lfs2/model-mie/inputDataTest/subj2/mp2rage/" + str(i) + "_tfl3d1.IMA"
ds = dicomio.read_file(filename)
if ds.get('PatientSex') == "M":
shutil.copy(filename, "/mnt/nethomes/s4232182/Desktop/New" )
I've also used ds.get() to avoid problems if the dataset does not contain a PatientSex data element.
In one place in your question, the numbering is 1-280, in another it is 0-280. If the former, then use range(1, 281) instead.

Related

Python grab substring between two specific characters

I have a folder with hundreds of files named like:
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
Convention:
year_month_ID_zone_date_0_L2A_B01.tif ("_0_L2A_B01.tif", and "zone" never change)
What I need is to iterate through every file and build a path based on their name in order to download them.
For example:
name = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
path = "2017/5/S2B_7VEG_20170528_0_L2A/B01.tif"
The path convention needs to be: path = year/month/ID_zone_date_0_L2A/B01.tif
I thought of making a loop which would "cut" my string into several parts every time it encounters a "_" character, then stitch the different parts in the right order to create my path name.
I tried this but it didn't work:
import re
filename =
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
try:
found = re.search('_(.+?)_', filename).group(1)
except AttributeError:
# _ not found in the original string
found = '' # apply your error handling
How could I achieve that on Python ?

Since you only have one separator character, you may as well simply use Python's built in split function:
import os
items = filename.split('_')
year, month = items[:2]
new_filename = '_'.join(items[2:])
path = os.path.join(year, month, new_filename)

Try the following code snippet
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
found = re.sub('(\d+)_(\d+)_(.*)_(.*)\.tif', r'\1/\2/\3/\4.tif', filename)
print(found) # prints 2017/05/S2B_7VEG_20170528_0_L2A/B01.tif

No need for a regex -- you can just use split().
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
parts = filename.split("_")
year = parts[0]
month = parts[1]

Maybe you can do like this:
from os import listdir, mkdir
from os.path import isfile, join, isdir
my_path = 'your_soure_dir'
files_name = [f for f in listdir(my_path) if isfile(join(my_path, f))]
def create_dir(files_name):
for file in files_name:
month = file.split('_', '1')[0]
week = file.split('_', '2')[1]
if not isdir(my_path):
mkdir(month)
mkdir(week)
### your download code

filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
temp = filename.split('_')
result = "/".join(temp)
print(result)
result is
2017/05/S2B/7VEG/20170528/0/L2A/B01.tif

how to read file name with two variable parts

There is a folder with multiple Excel files in it. They are named in a systematic way. Like below:
a_b_12_2021043036548.xlsx
The a_b_12_ part is fixed
20210430 changes every day
36548 also change every day, and there is no rule for it, other than that it's always five digits
I have to read this Excel file every day from another script, and save it as a dataframe. How can I do this?
I tried the following lines but failed
datetime_format = datetime.datetime(2021, 4, 30) # I just want to change the date here to read the related excel
x = datetime_format.strftime("%Y%m%d")
file1 = r'C:/report/FG/a_b_12_' + x + * + '.xlsx' #failed
file1 = r'C:/report/FG/a_b_12_' + x + r'[\d\]+' + '.xlsx' #failed

Use glob.glob:
from glob import glob
pattern = rf'C:/report/FG/a_b_12_{x}*.xlsx'
matched_files = glob(pattern)
Assuming your assumption holds, and there is indeed exactly one such file, matched_files[0] will be it.

datetime_format = datetime.datetime(2021, 4, 30)
x=datetime_format.strftime("%Y%m%d")
dailyrunningnumber = 36548
file1 = 'C:/report/FG/a_b_12_{}.xlsx'.format(x)
file1 = 'C:/report/FG/a_b_12_{}{}.xlsx'.format(x, str(dailyrunningnumber))
Edited:
import glob
# listing all xlsx absolute dirs
f_glob = rf'C:/report/FG/a_b_12_*.xlsx'
f_names = glob.glob(f_glob)
print(f_names)

Assuming the second part is always a date, and that you know the date, just add a glob after that.
In Python, * in isolation is multiplication; the glob in your code needs to be a quoted string, like "*" or '*', just like you quote the rest of the file name.
import datetime
import glob
datetime_format = datetime.datetime(2021, 4, 30)
x = datetime_format.strftime("%Y%m%d")
matching_files = = glob.glob('C:/report/FG/a_b_12_' + x + '*.xlsx')
for file1 in matching_files:
# ...
If you are certain that the glob will always match a single file, of course, just use file1 = matching_files[0]
The temporary variables are nice for legibility, but really not particularly useful, so this can be refactored to
datetime_format = datetime.datetime(2021, 4, 30)
for file1 glob.glob('C:/report/FG/a_b_12_' + datetime_format.strftime("%Y%m%d") + '*.xlsx'):
# ...
You need an r'...' string when your string contains literal backslashes, but since you are using forward slash for the Windows directory separator (which is altogether more convenient anyway) I removed the r.
If you really want a regex solution, try
import re
import os
pattern = re.compile(r'^a_b_12_(\d+{8})(\d{5})\.xls$')
for file in os.scandir('C:/report/FG/'):
matched = pattern.match(file.name):
if matched:
file1 = file.name
date = matched.group(1)
suffix = matched.group(2)
break # or whatever

moving files to folders after they got read and separated by a variable

I have created a python script that reads a variable FILTER from files in a folder and puts the result on screen. However, there are 4 types of variables and I would like this script to separate them all to the corresponding folders. Like move all of the files to a folder named "V" if they have variable FILTER = V, if they have FILTER = B, then move all of the B ones to folder named "B" The script below works to see which files have which filter on screen.
import glob
import pyfits
import shutil
myList = []
for fitsName in glob.glob('*.fits'):
hdulist = pyfits.open(fitsName)
b = hdulist[0].header['FILTER']
c = b
myList.append(c)
hdulist.close()
for item in sorted(myList):
print item
Result on screen:
B
B
B
V
V
V
R
R
R
I
I
I
now with shutil the code i run;
import os
import glob
import pyfits
import shutil
myList = []
for fitsName in glob.glob('*.fits'):
hdulist = pyfits.open(fitsName)
hdu = hdulist[0]
prihdr = hdulist[0].header
a = hdulist[0].header['FILTER']
b = a
if b == "B":
shutil.move('/home/usr/Desktop/old/', '/home/usr/Desktop/new/B/')
myList.append(b)
hdulist.close()
Now this code works without problem but it moves all the files in Desktop/old/ to Desktop/new/B/ however, some files have b = V and other variables so what is the problem here? How can I specify the names of which files have the filters I desired so that it can automatically move?
so it is like from the code above, if c= FILTERNAME1 move to SOMEFOLDER1 if c = FILTERNAME2 move to SOMEFOLDER2 and so on.. I could not write a working code line for this so any help would be appreciated a lot.
Solution;
import os
import glob
import pyfits
import shutil
for fitsName in glob.glob('*.fits'):
hdulist = pyfits.open(fitsName)
hdu = hdulist[0]
a = hdulist[0].header['FILTER']
if a == "B":
shutil.move(fitsName, '/home/usr/Desktop/new/B/')
if a == "V":
shutil.move(fitsName, '/home/usr/Desktop/new/V/')
if a == "R":
shutil.move(fitsName, '/home/usr/Desktop/new/R/')
if a == "I":
shutil.move(fitsName, '/home/usr/Desktop/new/I/')

You can use the shutil module to move files.
shutil.move(source,destination)
Define the source file and the destination files as strings, then pass them to shutil.move() like so:
import shutil
if c == "A":
shutil.move(source, destA)
elif c == "B":
shutil.move(source, destB)
I would also recommend that you learn how if statements work. Here are some resources: https://www.tutorialspoint.com/python/python_if_else.htm, https://www.w3schools.com/python/python_conditions.asp, https://docs.python.org/3/tutorial/controlflow.html

How to change names of a list of numpy files?

I have list of numbpy files, I need to change their names, In fact, let's assume that I have this list of files:
AES_Trace=1_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
AES_Trace=2_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
AES_Trace=3_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
What I need to change is the number of files, as a result I must have:
AES_Trace=100001_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
AES_Trace=100002_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
AES_Trace=100003_key=hexaNumber_Plaintext=hexaNumber_Ciphertext=hexaNumber.npy
I have tried:
import os
import numpy as np
import struct
path_For_Numpy_Files='C:\\Users\\user\\My_Test_Traces\\1000_Traces_npy'
os.chdir(path_For_Numpy_Files)
list_files_Without_Sort=os.listdir(os.getcwd())
list_files_Sorted=sorted((list_files_Without_Sort),key=os.path.getmtime)
for file in list_files_Sorted:
print (file)
os.rename(file,file[11]+100000)
I think that is not the good solution, firstly It doesn't work, then it gives me this error:
os.rename(file,file[11]+100000)
IndexError: string index out of range

Your file variable is a str, so you can't add an int like 10000 to it.
>>> file = 'Tracenumber=01_Pltx5=23.npy'
>>> '{}=1000{}'.format(file.split('=')[0],file.split('=')[1:])
'Tracenumber=100001_Pltx5=23.npy'
So, you can rather use
os.rename(file,'{}=1000{}'.format(file.split('=')[0],file.split('=')[1:]))

I'm sure that you can do this in one line, or with regex but I think that clarity is more valuable. Try this:
import os
path = 'C:\\Users\\user\\My_Test_Traces\\1000_Traces_npy'
file_names = os.listdir(path)
for file in file_names:
start = file[0:file.index("Trace=")+6]
end = file[file.index("_key"):]
num = file[len(start): file.index(end)]
new_name = start + str(100000+int(num)) + end
os.rename(os.path.join(path, file), os.path.join(path, new_name))
This will work with numbers >9, which the other answer will stick extra zeros onto.

Python - Error when opening two files [duplicate]

I'm creating a program that will create a file and save it to the directory with the filename sample.xml. Once the file is saved when i try to run the program again it overwrites the old file into the new one because they do have the same file name. How do I increment the file names so that whenever I try to run the code again it will going to increment the file name. and will not overwrite the existing one. I am thinking of checking the filename first on the directory and if they are the same the code will generate a new filename:
fh = open("sample.xml", "w")
rs = [blockresult]
fh.writelines(rs)
fh.close()

I would iterate through sample[int].xml for example and grab the next available name that is not used by a file or directory.
import os
i = 0
while os.path.exists("sample%s.xml" % i):
i += 1
fh = open("sample%s.xml" % i, "w")
....
That should give you sample0.xml initially, then sample1.xml, etc.
Note that the relative file notation by default relates to the file directory/folder you run the code from. Use absolute paths if necessary. Use os.getcwd() to read your current dir and os.chdir(path_to_dir) to set a new current dir.

Sequentially checking each file name to find the next available one works fine with small numbers of files, but quickly becomes slower as the number of files increases.
Here is a version that finds the next available file name in log(n) time:
import os
def next_path(path_pattern):
"""
Finds the next free path in an sequentially named list of files
e.g. path_pattern = 'file-%s.txt':
file-1.txt
file-2.txt
file-3.txt
Runs in log(n) time where n is the number of existing files in sequence
"""
i = 1
# First do an exponential search
while os.path.exists(path_pattern % i):
i = i * 2
# Result lies somewhere in the interval (i/2..i]
# We call this interval (a..b] and narrow it down until a + 1 = b
a, b = (i // 2, i)
while a + 1 < b:
c = (a + b) // 2 # interval midpoint
a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)
return path_pattern % b
To measure the speed improvement I wrote a small test function that creates 10,000 files:
for i in range(1,10000):
with open(next_path('file-%s.foo'), 'w'):
pass
And implemented the naive approach:
def next_path_naive(path_pattern):
"""
Naive (slow) version of next_path
"""
i = 1
while os.path.exists(path_pattern % i):
i += 1
return path_pattern % i
And here are the results:
Fast version:
real 0m2.132s
user 0m0.773s
sys 0m1.312s
Naive version:
real 2m36.480s
user 1m12.671s
sys 1m22.425s
Finally, note that either approach is susceptible to race conditions if multiple actors are trying to create files in the sequence at the same time.

def get_nonexistant_path(fname_path):
"""
Get the path to a filename which does not exist by incrementing path.
Examples
--------
>>> get_nonexistant_path('/etc/issue')
'/etc/issue-1'
>>> get_nonexistant_path('whatever/1337bla.py')
'whatever/1337bla.py'
"""
if not os.path.exists(fname_path):
return fname_path
filename, file_extension = os.path.splitext(fname_path)
i = 1
new_fname = "{}-{}{}".format(filename, i, file_extension)
while os.path.exists(new_fname):
i += 1
new_fname = "{}-{}{}".format(filename, i, file_extension)
return new_fname
Before you open the file, call
fname = get_nonexistant_path("sample.xml")
This will either give you 'sample.xml' or - if this alreay exists - 'sample-i.xml' where i is the lowest positive integer such that the file does not already exist.
I recommend using os.path.abspath("sample.xml"). If you have ~ as home directory, you might need to expand it first.
Please note that race conditions might occur with this simple code if you have multiple instances running at the same time. If this might be a problem, please check this question.

Try setting a count variable, and then incrementing that variable nested inside the same loop you write your file in. Include the count loop inside the name of the file with an escape character, so every loop ticks +1 and so does the number in the file.
Some code from a project I just finished:
numberLoops = #some limit determined by the user
currentLoop = 1
while currentLoop < numberLoops:
currentLoop = currentLoop + 1
fileName = ("log%d_%d.txt" % (currentLoop, str(now())))
For reference:
from time import mktime, gmtime
def now():
return mktime(gmtime())
which is probably irrelevant in your case but i was running multiple instances of this program and making tons of files. Hope this helps!

The two ways to do it are:
Check for the existence of the old file and if it exists try the next file name +1
save state data somewhere
an easy way to do it off the bat would be:
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
filenum+=1
my_next_file = open(filename+str(filenum)+".py",'w')
as a design thing, while True slows things down and isn't a great thing for code readability
edited: #EOL contributions/ thoughts
so I think not having .format is more readable at first glance - but using .format is better for generality and convention so.
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
filenum+=1
my_next_file = open("{}{}.py".format(filename, filenum),'w')
# or
my_next_file = open(filename + "{}.py".format(filenum),'w')
and you don't have to use abspath - you can use relative paths if you prefer, I prefer abs path sometimes because it helps to normalize the paths passed :).
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(filename+str(filenum)+".py"):
filenum+=1
##removed for conciseness

Another solution that avoids the use of while loop is to use os.listdir() function which returns a list of all the files and directories contained in a directory whose path is taken as an argument.
To answer the example in the question, supposing that the directory you are working in only contains "sample_i.xlm" files indexed starting at 0, you can easily obtain the next index for the new file with the following code.
import os
new_index = len(os.listdir('path_to_file_containing_only_sample_i_files'))
new_file = open('path_to_file_containing_only_sample_i_files/sample_%s.xml' % new_index, 'w')

You can use a while loop with a counter which checks if a file with a name and the counter's value exists if it does then move on else break and make a file.
I have done it in this way for one of my projects:`
from os import path
import os
i = 0
flnm = "Directory\\Filename" + str(i) + ".txt"
while path.exists(flnm) :
flnm = "Directory\\Filename" + str(i) + ".txt"
i += 1
f = open(flnm, "w") #do what you want to with that file...
f.write(str(var))
f.close() # make sure to close it.
`
Here the counter i starts from 0 and a while loop checks everytime if the file exists, if it does it moves on else it breaks out and creates a file from then you can customize. Also make sure to close it else it will result in the file being open which can cause problems while deleting it.
I used path.exists() to check if a file exists.
Don't do from os import * it can cause problem when we use open() method as there is another os.open() method too and it can give the error. TypeError: Integer expected. (got str)
Else wish u a Happy New Year and to all.

Without storing state data in an extra file, a quicker solution to the ones presented here would be to do the following:
from glob import glob
import os
files = glob("somedir/sample*.xml")
files = files.sorted()
cur_num = int(os.path.basename(files[-1])[6:-4])
cur_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()
This will also keep incrementing, even if some of the lower numbered files disappear.
The other solution here that I like (pointed out by Eiyrioü) is the idea of keeping a temporary file that contains your most recent number:
temp_fh = open('somedir/curr_num.txt', 'r')
curr_num = int(temp_fh.readline().strip())
curr_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()

Another example using recursion
import os
def checkFilePath(testString, extension, currentCount):
if os.path.exists(testString + str(currentCount) +extension):
return checkFilePath(testString, extension, currentCount+1)
else:
return testString + str(currentCount) +extension
Use:
checkFilePath("myfile", ".txt" , 0)

I needed to do something similar, but for output directories in a data processing pipeline. I was inspired by Vorticity's answer, but added use of regex to grab the trailing number. This method continues to increment the last directory, even if intermediate numbered output directories are deleted. It also adds leading zeros so the names will sort alphabetically (i.e. width 3 gives 001 etc.)
def get_unique_dir(path, width=3):
# if it doesn't exist, create
if not os.path.isdir(path):
log.debug("Creating new directory - {}".format(path))
os.makedirs(path)
return path
# if it's empty, use
if not os.listdir(path):
log.debug("Using empty directory - {}".format(path))
return path
# otherwise, increment the highest number folder in the series
def get_trailing_number(search_text):
serch_obj = re.search(r"([0-9]+)$", search_text)
if not serch_obj:
return 0
else:
return int(serch_obj.group(1))
dirs = glob(path + "*")
num_list = sorted([get_trailing_number(d) for d in dirs])
highest_num = num_list[-1]
next_num = highest_num + 1
new_path = "{0}_{1:0>{2}}".format(path, next_num, width)
log.debug("Creating new incremented directory - {}".format(new_path))
os.makedirs(new_path)
return new_path
get_unique_dir("output")

Here is one more example. Code tests whether a file exists in the directory or not if exist it does increment in the last index of the file name and saves
The typical file name is: Three letters of month_date_lastindex.txt ie.e.g.May10_1.txt
import time
import datetime
import shutil
import os
import os.path
da=datetime.datetime.now()
data_id =1
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime("%b%d")
data_id=str(data_id)
filename = st+'_'+data_id+'.dat'
while (os.path.isfile(str(filename))):
data_id=int(data_id)
data_id=data_id+1
print(data_id)
filename = st+'_'+str(data_id)+'.dat'
print(filename)
shutil.copyfile('Autonamingscript1.py',filename)
f = open(filename,'a+')
f.write("\n\n\n")
f.write("Data comments: \n")
f.close()

Continues sequence numbering from the given filename with or without the appended sequence number.
The given filename will be used if it doesn't exist, otherwise a sequence number is applied, and gaps between numbers will be candidates.
This version is quick if the given filename is not already sequenced or is the sequentially highest numbered pre-existing file.
for example the provided filename can be
sample.xml
sample-1.xml
sample-23.xml
import os
import re
def get_incremented_filename(filename):
name, ext = os.path.splitext(filename)
seq = 0
# continue from existing sequence number if any
rex = re.search(r"^(.*)-(\d+)$", name)
if rex:
name = rex[1]
seq = int(rex[2])
while os.path.exists(filename):
seq += 1
filename = f"{name}-{seq}{ext}"
return filename

My 2 cents: an always increasing, macOS-style incremental naming procedure
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir ; then
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (1) ; then
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (2) ; etc.
If ./some_new_dir (2) exists but not ./some_new_dir (1), then get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (3) anyways, so that indexes always increase and you always know which is the latest
from pathlib import Path
import re
def get_increased_path(file_path):
fp = Path(file_path).resolve()
f = str(fp)
vals = []
for n in fp.parent.glob("{}*".format(fp.name)):
ms = list(re.finditer(r"^{} \(\d+\)$".format(f), str(n)))
if ms:
m = list(re.finditer(r"\(\d+\)$", str(n)))[0].group()
vals.append(int(m.replace("(", "").replace(")", "")))
if vals:
ext = " ({})".format(max(vals) + 1)
elif fp.exists():
ext = " (1)"
else:
ext = ""
return fp.parent / (fp.name + ext + fp.suffix)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shutil multiple files after reading with "pydicom" - python

Related

Python grab substring between two specific characters

how to read file name with two variable parts

moving files to folders after they got read and separated by a variable

How to change names of a list of numpy files?

Python - Error when opening two files [duplicate]

Categories

Resources