itertools ChainFromInterables Backslash Normalisation? - python

I posted a question a while back:
Writing List To Text File
I was able to write a list of files from a dictionary into a text file.
So my code looks like this:
simonDuplicates = chain.from_iterable([files for files in file_dict.values() if len(files) > 1])
text_file.write("Duplicates Files:%s" % '\n'.join(simonDuplicates))
This basically prints out directories and files, for example:
C:/Users/Simon/Desktop/myfile.jpg
The problem is the forward slashes. I want them to be backslashes (as used in Windows). I tried using os.path.normpath but it doesn't work
simonDuplicates = chain.from_iterable([files for files in file_dict.values() if len(files) > 1])
os.path.normpath(simonDuplicates)
text_file.write("Duplicates Files:%s" % '\n'.join(simonDuplicates))
I get the following error:
Traceback (most recent call last):
File "duff.py", line 125, in <module>
os.path.normpath(duplicates)
File "C:\Python27\lib\ntpath.py", line 402, in normpath
path = path.replace("/", "\\")
AttributeError: 'itertools.chain' object has no attribute 'replace'
I think it doesn't work because I should be using iSlice?
Martijn Pieter's answer to this question looks about right:
Troubleshooting 'itertools.chain' object has no attribute '__getitem__'
Any suggestions?

You are trying to apply os.path.normpath on chain.from_iterable,
which returns a generator object. So, you have to iterate till the generator exhausts, to get the complete list of filenames which are of type string.
You can use list comprehension like this
simonDuplicates = [os.path.normpath(path) for path in chain.from_iterable(files for files in file_dict.values() if len(files) > 1)]
Or you can use map function like this
simonDuplicates = map(os.path.normpath, chain.from_iterable(files for files in file_dict.values() if len(files) > 1))

Related

How can I rename every file that matches a regex?

I want to rename filenames of the form xyz.ogg.mp3 to xyz.mp3.
I have a regex that looks for .ogg in every file then it replaces the .ogg with an empty string but I get the following error:
Traceback (most recent call last):
File ".\New Text Document.py", line 7, in <module>
os.rename(files, '')
TypeError: rename() argument 1 must be string, not _sre.SRE_Match
Here is what I tried:
for file in os.listdir("./"):
if file.endswith(".mp3"):
files = re.search('.ogg', file)
os.rename(files, '')
How can I make this loop look for every .ogg in each file then replace it with an empty string?
The file structure looks like this: audiofile.ogg.mp3
You can do something like this:
for file in os.listdir("./"):
if file.endswith(".mp3") and '.ogg' in file:
os.rename(file, file.replace('.ogg',''))
Would be far more quicker to write a command line :
rename 's/\.ogg//' *.ogg.mp3
(perl's rename)
An example using Python 3's pathlib (but not regular expressions, as it's kind of overkill for the stated problem):
from pathlib import Path
for path in Path('.').glob('*.mp3'):
if '.ogg' in path.stem:
new_name = path.name.replace('.ogg', '')
path.rename(path.with_name(new_name))
A few notes:
Path('.') gives you a Path object pointing to the current working directory
Path.glob() searches recursively, and the * there is a wildcard (so you get anything ending in .mp3)
Path.stem gives you the file name minus the extension (so if your path were /foo/bar/baz.bang, the stem would be baz)

Can anyone tell me why I'm getting the attributeError? 'str' object has no attribute 'real'

def getFFtMag(data):
maglist = []
for x in range(0, len(data)):
dataVal = data[x]
firstVal = dataVal.real
secondVal = dataVal.imag
mag = math.sqrt(firstVal*firstVal + secondVal*secondVal)
maglist.append(mag)
return maglist
>>> getFFtMag("25 - Copy.xlsx")
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
getFFtMag("25 - Copy.xlsx")
File "<pyshell#10>", line 5, in getFFtMag
firstVal = dataVal.real
AttributeError: 'str' object has no attribute 'real'
>>>
So can anyone tell me why my code is wrong? The attached error seems to be popping up. I'm new to python and learning what is going on. Is the problem with my data that I inputted? Thanks.
You're attempting to read data out of a file and not first parsing it into the complex number format. Data read out of a file will either be a string or a byte-like object depending on how you opened the file. But here, you haven't even opened the file yet.
In order to tell you how to specifically resolve this problem we will need to see the format of the file you are trying to operate on.
You're calling getFFtMag function with a string (apparently a filename) as argument. This function however requires a list (or, in general, iterable) with elements having .real and .imag attributes. Elements of a string are one-character strings, and strings don't have those attributes - complex numbers have. You need to read in contents of your file and somehow convert them to complex.
You are passing a string to your function, and not data with a .real and .imag , and since a string does not have those fields, you get the according error.
You have to read the contents of your files first. I suggest to have a look at pandas .read_excel() function

Python: how to pass a file from a zip to a function that reads data from that file

I have a zip-file that contains .nrrd type files. The pynrrd lib comes with a read function. How can I pull the .nrrd file from the zip and pass it to the nrrd.read() function?
I tried following, but that gives the following error at the nrrd.read() line:
TypeError was unhandled by user code, file() argument 1 must be
encoded string without NULL bytes, not str
in_dir = r'D:\Temp\Slikvideo\JPEG\SV_4_1_mask'
zip_file = 'Annotated.mitk'
zf = zipfile.ZipFile(in_dir + '\\' + zip_file)
f_name = 'datafile.nrrd' # .nrrd file in zip
file_nrrd = zf.read(f_name) # pull the file from the zip
img_nrrd, options = nrrd.read(file_nrrd) # read the .nrrd image data from the file
I could write the file pulled from the .zip to disk, and then read it from disk with nrrd.read() but I am sure there is a better way.
I think that your is a good way...
Here there is a similar question:
Similar question
Plus answer:
I think that the problem maybe is that when you use zipfile.ZipFile you not set the attribute:
Try using:
zipfile.ZipFile (path,"r")
The following works:
file_nrrd = zf.extract(f_name) # extract the file from the zip

Ignore bracket in csv file python

I wrote a python script for a friend that:
takes a CSV of photos she's been cataloging that has the name of the photos in an ordered list
finds the image files on the filesystem
matches the files in the csv with files on the system
copies the images on the filesystem to a folder with a figure name in the order the files appear in the CSV
So essentially, it does:
INPUT: myphoto1.tiff, mypainting.jpeg, myphoto9.jpg, orderedlist.csv
OUTPUT: fig001.jpg, fig002.tiff, fig003.jpeg
This code is going to run on a mac. This works fine except we ran into an issue where some of the files (all by the same photographer) have 1 bracket in them, e.g.
myphoto[fromitaly.jpg
This seems to break my regular expression search:
The relevant code:
orderedpaths = [path for item in target for path in filenames if re.search(item, path)]
Where filenames is a list of the photo files on the system and target is the list from the CSV. This code is supposed to match the CSV file name (and it's subsequent order in the list) to the filename to give an ordered list of the filenames on the system.
The error:
Traceback (most recent call last):
File "renameimages.py", line 43, in <module>
orderedpaths = [path for item in target for path in filenames if re.search(item, path)]
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: unexpected end of regular expression
I tried or considered:
Changing the filenames/csv, but this isn't scalable and ideally her
department will be using this script more in the future
Investigating treating the files as "raw" -- but it didn't seem like
that was possible for input from CSV
Deleting the [ character from the input, but the problem is that
then the input won't match the actual files on the system.
I suppose I should mention I only suspect this was the issue: by printing out the progress of the code, it appears as if the code gets to the CSV item with the bracket and errors.
The relevant code is the part where you buld a regular expression using a user input, without sanitizing it. You should not do that.
I believe you don't need to use RE at all. you can find matching string using if item in path or path.endswith(item) or something like that.
The best option is to use your library:
from os.path import basename
orderedpaths = [ ... if basename(path) == item]
If you insist on using REs, you should escape your input using re.escape():
orderedpaths = [path for item in target for path in filenames
if re.search(re.escape(item), path)]

Reading results from glob into a python function

I am trying to automate some plotting using python and fortran together.
I am very close to getting it to work, but I'm having problems getting the result from a glob search to feed into my python function.
I have a .py script that says
import glob
run=glob.glob('JUN*.aijE*.nc')
from plot_check import plot_check
plot_check(run)
But I am getting this error
plot_check(run)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "plot_check.py", line 7, in plot_check
ncfile=Dataset(run,'r')
File "netCDF4.pyx", line 1328, in netCDF4.Dataset.__init__ (netCDF4.c:6336)
RuntimeError: No such file or directory
I checked that the glob is doing its job and it is, but I think it's the format of my variable "run" that's screwing me up.
In python:
>>run
>>['JUN3103.aijE01Ccek0kA.nc']
>>type(run)
<type 'list'>
So my glob is finding the file name of the file I want to put into my function, but something isn't quite working when I try to input the variable "run" in to my function "plot_check".
I think it might be something to do with the format of my variable "run", but I'm not quite sure how to fix it.
Any help would be greatly appreciated!
glob.glob returns a list of all matching filenames. If you know there's always going to be exactly one file, you can just grab the first element:
filenames = glob.glob('JUN*.aijE*.nc')
plot_check(filenames[0])
Or, if it might match more than one file, then iterate over the results:
filenames = glob.glob('JUN*.aijE*.nc')
for filename in filenames:
plot_check(filename)
Perhaps Dataset expects to be passed a single string filename, rather than a list with one element?
Try using run[0] instead (though you may want to check to make sure your glob actually matches a file before you do that).

Categories

Resources