Python, Opening files in loop (dicom) - python

I am currently reading in 200 dicom images manually using the code:
ds1 = dicom.read_file('1.dcm')
so far, this has worked but I am trying to make my code shorter and easier to use by creating a loop to read in the files using this code:
for filename in os.listdir(dirName):
dicom_file = os.path.join("/",dirName,filename)
exists = os.path.isfile(dicom_file)
print filename
ds = dicom.read_file(dicom_file)
This code is not currently working and I am receiving the error:
"raise InvalidDicomError("File is missing 'DICM' marker. "
dicom.errors.InvalidDicomError: File is missing 'DICM' marker. Use
force=True to force reading
Could anyone advice me on where I am going wrong please?

I think the line:
dicom_file = os.path.join("/",dirName,filename)
might be an issue? It will join all three to form a path rooted at '/'. For example:
os.path.join("/","directory","file")
will give you "/directory/file" (an absolute path), while:
os.path.join("directory","file")
will give you "directory/file" (a relative path)
If you know that all the files you want are "*.dcm"
you can try the glob module:
import glob
files_with_dcm = glob.glob("*.dcm")
This will also work with full paths:
import glob
files_with_dcm = glob.glob("/full/path/to/files/*.dcm")
But also, os.listdir(dirName) will include everything in the directory including other directories, dot files, and whatnot
Your exists = os.path.isfile(dicom_file) line will filter out all the non files if you use an "if exists:" before reading.
I would recommend the glob approach, if you know the pattern, otherwise:
if exists:
try:
ds = dicom.read_file(dicom_file)
except InvalidDicomError as exc:
print "something wrong with", dicom_file
If you do a try/except, the if exists: is a bit redundant, but doesn't hurt...

Try adding:
dicom_file = os.path.join("/",dirName,filename)
if not dicom_file.endswith('.dcm'):
continue

Related

Rename all xml files within a given directory with Python

I have lot of xml files which are named like:
First_ExampleXML_Only_This_Should_Be_Name_20211234567+1234565.xml
Second_ExampleXML_OnlyThisShouldBeName_202156789+55684894.xml
Third_ExampleXML_Only_This_Should_Be_Name1_2021445678+6963696.xml
Fourth_ExampleXML_Only_This_Should_Be_Name2_20214567+696656.xml
I have to make a script that will go through all of the files and rename them, so only this is left from the example:
Only_This_Should_Be_Name.xml
OnlyThisShouldBeName.xml
Only_This_Should_Be_Name1xml
Only_This_Should_Be_Name2.xml
At the moment I have something like this but really struggling to get exactly what I need, guess that have to count from second _ up to _202, and take everything in between.
fnames = listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml):
Anyone has idea what would be the best approach to do it?
You can strip the contents by splitting with underscores for all xml files and rename with the first value in the list as below.
import os
fnames = os.listdir('.')
for fname in fnames:
# replace .xml with type of file you want this to have impact on
if fname.endswith('.xml'):
newName = '_'.join(fname.split("_")[2:-1])
os.rename(fname, newName+".xml")
else:
continue
here you are eliminating the values which are before and after "_".
There are two problems here:
Finding files of one kind in the directory
Whilst listdir will work, you might as well glob them:
from pathlib import Path
for fn in Path("/path").glob("*.xml"):
....
Renaming files
In this case your files are named "file_name_NUMBERS.xml" and we want to strip the numbers out, so we'll use a regex: Edit: this is not the best way in this case. Just split and combine as in the other answer
import re
from pathlib import Path
for fn in Path("dir").glob("*.xml"):
new_name = re.search(r"(.*?)_[0-9]+", fn.stem).group(1)
fn.rename(fn.with_name(new_name + ".xml"))
Edit: don't know why I overcomplicted things. I'll leave the re solution there for more difficult cases, but in this case you can just do:
new_name = "_".join(fn.stem.split("_")[:-1])
Which is greately superior as it doesn't depend on the precise naming of the files.
Note that you can do all this without pathlib, but you asked for the best way ;)
Lastly, to answer an implicit question, nothing stops you wrapping all this in a function and passing an argument to glob for different types of files.
I think regex will be the simplest approach here, which in python can be accomplished with the re module.
import os
import re
fnames = os.listdir('.')
for fname in fnames:
result = re.sub(r"^.*?_ExampleXML_(.*?)_[\d+]+\.xml$", r"\1.xml", fname)
if result != fname:
os.rename(fname, result)
There are several pattern matching strategies you could employ, depending on your use case.
For instance you could try variants like the following, depending on how specific/general you need to be:
^.*?_ExampleXML_(.*?)_\d+\.xml$ (https://regex101.com/r/hYOLMF/1)
^.*?_ExampleXML_(.*?)_2021\d+\.xml$ (https://regex101.com/r/UzEsbO/1)
^.*?_ExampleXML_(.*?)_[^_]+\.xml$ (https://regex101.com/r/lKzYhq/1)

How to expand a code to delete files via python glob for multiple filetypes?

I've done the following code to delete one filetype in a directory and its sub directories using python. But I need to expand it for multiple file types. Can anybody help me?
import os import glob fileList = glob.glob('D:\\analises para integrar\\pasta para teste do scrip\\**\\*.bin', recursive=True) for filePath in fileList: try: os.remove(filePath) except OSError: print("Error while deleting file")
Please take the time to format your code correctly when posting in the future.
Globbing. You can imagine the * you use before .bin to be recognized as any number of any number of any character. So if you want to glob another type of file, replace the .bin portion with any extension you want and it will be captured the same way.

Renaming/copying in windows python

I am trying to copy and rename some PDFs with absolute paths.
ie: c:\users\andrew\pdf\p.pdf gets copied to c:\users\pdf\ORGp.pdf
Leaving two files in the directory p.pdf and ORGp.pdf
I've been working on this issue for the past hour and I can't seem to nail it.
Is there a more pythonic way to do it then splitting the string into a list and rejoining them after adding ORG on the last element?
Using python 2.7 on windows 8.
Your question is a bit ambiguous, but I will try to answer it anyway.
This is a python code sample that will copy under the new names, all files under a particular folder, specified at the beginning of the script:
import os
import shutil
folder_name = "c:\\users\\andrew\\pdf"
for root_folder, _, file_names in os.walk(folder_name):
for file_n in file_names:
new_name = os.path.join(root_folder, "ORG" + file_n)
old_name = os.path.join(root_folder, file_n)
print "We will copy at ", new_name, old_name
shutil.copyfile(old_name, new_name)
This code will copy and rename a list of absolute file paths:
import os
import shutil
files_to_rename = ["c:\\users\\andrew\\pdf\\p.pdf", "c:\\users\\andrew\\pdf2\\p2.pdf"]
for file_full_path in files_to_rename:
folder_n, file_n = os.path.split(file_full_path)
new_name = os.path.join(folder_n, "ORG" + file_n)
print "We will copy at ", new_name, file_full_path
shutil.copyfile(file_full_path, new_name)
I testing this script on Mac OS, with Python 2.7.7, but I think it should work nicely also on Windows.
You can try
import os
.......some logic.....
os.rename(filename, newfilename)
Splitting the string into a list and rejoining (after removing 'andrew' from the list and prefixing 'ORG' to the last element) is quite Pythonic. It's an explicit and obvious way to do it.
You can use the standard str and list methods to do it. However, there are various dedicated file path manipulation functions in the os.path module which you should become familiar with, but the str and list methods are fine when you are sure that all the file names you're processing are sane. os.path also has other useful file-related functions: you can check if a file exists, whether it's a file or a directory, get a file's timestamps, etc.
To actually copy the file once you've generated the new name, use shutil.copyfile(). You may also wish to check first that the file doesn't already exist using os.path.exists(). Unfortunately, some metadata gets lost in this process, eg file owners, as mentioned in the warning in the shutil docs.
This is what I ended up doing to do the rename. I'm not sure how pythonic it is, but it works.
split=fle.split('\\')
print split
pdf=split[len(split)-1]
pdf='ORG%s' % pdf
print pdf
del split[len(split)-1]
split.append(pdf)
fle1 = '\\'.join(split)
try:
shutil.copy(fle, fle1)
except:
print('failed copy')
return''

Python parse dirs, remove subdirectory tree

I have a long list of directories, with something like this
C:\Users\vanstrie\Desktop\ntnu\SCHEMA\2012\07_paper\results\026\onsets
I want to parse through folders 001-040 (026 shown above) and remove the onsets subdirectory with all files and subfolders that are in it. I am unsure how to achieve this with python 3. If you have a solution, please advise. Many thanks in advance.
Niels
I would think that something like this should work...
import glob
import os.path
import shutil
files_dirs = glob.glob(r'C:\Users\vanstrie\Desktop\ntnu\SCHEMA\2012\07_paper\results\*')
for d in files_dirs:
head,tail = os.path.split(d)
try:
if (0 < int(tail) < 41) and (len(tail) == 3): #don't want to delete `\results\3\onsets` I guess...
print("about to delete:",d)
shutil.rmtree(os.path.join(d,'onsets'),ignore_errors=True)
except ValueError: #apparently we got a non-integer. Leave that directory.
pass
As with anything when deleting files, I would definitely print the things that would be deleted on a first pass -- Just to make sure the script is actually working as expected (and to make sure you don't delete something you want to keep).
import shutil, os.path
root_folder = "C:\\Users\\vanstrie\\Desktop\\ntnu\\SCHEMA\\2012\\07_paper\\results"
suffix = "onsets"
for i in range(1,41):
folder = os.path.join( root_folder, "%03d" % i, suffix )
shutil.rmtree( folder, ignore_errors=True, onerror=None )

Problem reading text files without extensions in python

I have written a piece of a code which is supposed to read the texts inside several files which are located in a directory. These files are basically text files but they do not have any extensions.But my code is not able to read them:
corpus_path = 'Reviews/'
for infile in glob.glob(os.path.join(corpus_path,'*.*')):
review_file = open(infile,'r').read()
print review_file
To test if this code works, I put a dummy text file, dummy.txt. which worked because it has extension. But i don't know what should be done so files without the extensions could be read.
can someone help me? Thanks
Glob patterns don't work the same way as wildcards on the Windows platform. Just use * instead of *.*. i.e. os.path.join(corpus_path,'*'). Note that * will match every file in the directory - if that's not what you want then you can revise the pattern accordingly.
See the glob module documentation for more details.
Just use * instead of *.*.
The latter requires an extension to be present (more precisely, there needs to be a dot in the filename), the former doesn't.
You could search for * instead of *.*, but this will match every file in your directory.
Fundamentally, this means that you will have to handle cases where the file you are opening is not a text file.
it seems that you need
from os import listdir
from filename in ( fn for fn in listdir(corpus_path) if '.' not in fn):
# do something
you could write
from os import listdir
for fn in listdir(corpus_path):
if '.' not in fn:
# do something
but the former with a generator spares one indentation level

Categories

Resources