Python - processing all files in a specific folder - python

I'm somewhat newish to python (which is the only programming language I know), and I've got a bunch of spectral data saved as .txt files, where each row is a data point, the first number being the wavelength of light used and separated by a tab, the second number is the instrument signal/response to that wavelength of light.
I want to be able to take all the data files I have in a folder, and print a file that's an an average of all the signal/response column entries for each wavelength of light (they all contain data for responses from 350-2500nm light). Is there any way to do this? If it weren't for the fact that I need to average together 103 spectra, I'd just do it by hand, but....
EDIT: I realize I worded this terribly. I now realize I can probably just use os to access all the files in a given folder. The thing is that I want to average the signal values for each wavelength. Ie, I want to read all the data from the folder and get an average value for the signal/response at 350nm, 351nm, etc.
I'm thinking this something I could do with a loop once i get all the files read into python, but I'm not 100% sure. I'm also hesitant because I'm worried that will slow down the program a lot.

Something like this (assuming all your txt files are formatted the same, and that all files have the same range of wavelength values )
import os
import numpy as np
dat_dir = '/my/dat/dir'
fnames = [ os.path.join(x,dat_dir) for x in os.listdir(dat_dir) if x.endswith('.txt') ]
data = [ np.loadtxt( f) for f in fnames ]
xvals = data[0][:,0] #wavelengths, should be the same in each file
yvals = [ d[:,1] for d in data ] #measurement
y_mean = np.mean(yvals, axis=0 )
np.savetxt( 'spectral_ave.txt', zip(xvals, y_mean) , fmt='%.4f') # something like that

import os
dir = "./" # Your directory
lengths = 0
responses = 0
total = 0
for x in os.listdir(dir):
# Check if x has *.txt extension.
if os.path.splitext(x)[1]!=".txt": continue
fullname = os.path.join(dir, x)
# We don't want directories ending with *.txt to mess up our program (although in your case this is very unlikely)
if os.path.isdir(fullpath): continue
# Now open and read the file as binary
file = open(fullname, "rb")
content = file.read()
file.close()
# Take two entries:
content = content.split()
l = float(content[0])
r = float(content[1])
lengths += l; responses += r
total += 1
print "Avg of lengths:", lengths/total
print "Avg of responses:", responses/total
If you want it to enter the subdirectories put it into function and make it recurse when os.path.isdir(fullname) is True.
Although I wrote you the code, SO is not for that. Mind that in your next question.

If you're on anything but Windows, a common way to do this would be to write a python program that handles all the files you put on the command line. Then you can run it on results/* to process everything, or just on a single file, or just on a few files.
This would be the more Unixy way to go about things. There are many unix programs that can handle multiple input files (cat, sort, awk, etc.), but most of them leave the directory traversal to the shell.
http://www.diveintopython.net/scripts_and_streams/command_line_arguments.html has some examples of getting at the command line args for your program.
import sys
for arg in sys.argv[1:]: # argv[0] is the script's name; skip it
# print arg
sum_file(arg) # or put the code inline here, so you don't need global variables to keep state between calls.
print "totals ..."
See also this question: What is "argv", and what does it do?

Related

Delete every 1 of 2 or 3 files on a folder with Python

What I'm trying to do is to write a code that will delete a single one of 2 [or 3] files on a folder. I have batch renamed that the file names are incrementing like 0.jpg, 1.jpg, 2.jpg... n.jpg and so on. What I had in mind for the every single of two files scenario was to use something like "if %2 == 0" but couldn't figure out how actually to remove the files from the list object and my folder obviously.
Below is the piece of NON-WORKING code. I guess, it is not working as the file_name is a str.
import os
os.chdir('path_to_my_folder')
for f in os.listdir():
file_name, file_ext = os.path.splitext(f)
print(file_name)
if file_name%2 == 0:
os.remove();
Yes, that's your problem: you're trying to use an integer function on a string. SImply convert:
if int(file_name)%2 == 0:
... that should fix your current problem.
Your filename is a string, like '0.jpg', and you can’t % 2 a string.1
What you want to do is pull the number out of the filename, like this:
name, ext = os.path.splitext(filename)
number = int(name)
And now, you can use number % 2.
(Of course this only works if every file in the directory is named in the format N.jpg, where N is an integer; otherwise you’ll get a ValueError.)
1. Actually, you can do that, it just doesn’t do what you want. For strings, % means printf-style formatting, so filename % 2 means “find the %d or similar format spec in filename and replace it with a string version of 2.
Thanks a lot for the answers! I have amended the code and now it looks like this;
import os
os.chdir('path_to_the_folder')
for f in os.listdir():
name, ext = os.path.splitext(f)
number = int(name)
if number % 2 == 0:
os.remove()
It doesn't give an error but it also doesn't remove/delete the files from the folder. What in the end I want to achieve is that every file name which is divisible by two will be removed so only 1.jpg, 3.jpg, 5.jpg and so on will remain.
Thanks so much for your time.
A non-Python method but sharing for future references;
cd path_to_your_folder
mkdir odd; mv *[13579].png odd
also works os OSX. This reverses the file order but that can be re-corrected easily. Still want to manage this within Python though!

Given a filename, go to the next file in a directory

I am writing a method that takes a filename and a path to a directory and returns the next available filename in the directory or None if there are no files with names that would sort after the file.
There are plenty of questions about how to list all the files in a directory or iterate over them, but I am not sure if the best solution to finding a single next filename is to use the list that one of the previous answers generated and then find the location of the current file in the list and choose the next element (or None if we're already on the last one).
EDIT: here's my current file-picking code. It's reused from a different part of the project, where it is used to pick a random image from a potentially nested series of directories.
# picks a file from a directory
# if the file is also a directory, pick a file from the new directory
# this might choke up if it encounters a directory only containing invalid files
def pickNestedFile(directory, bad_files):
file=None
while file is None or file in bad_files:
file=random.choice(os.listdir(directory))
#file=directory+file # use the full path name
print "Trying "+file
if os.path.isdir(os.path.join(directory, file))==True:
print "It's a directory!"
return pickNestedFile(directory+"/"+file, bad_files)
else:
return directory+"/"+file
The program I am using this in now is to take a folder of chatlogs, pick a random log, starting position, and length. These will then be processed into a MOTD-like series of (typically) short log snippets. What I need the next-file picking ability for is when the length is unusually long or the starting line is at the end of the file, so that it continues at the top of the next file (a.k.a. wrap around midnight).
I am open to the idea of using a different method to choose the file, since the above method does not discreetly give a separate filename and directory and I'd have to go use a listdir and match to get an index anyway.
You should probably consider rewriting your program to not have to use this. But this would be how you could do it:
import os
def nextFile(filename,directory):
fileList = os.listdir(directory)
nextIndex = fileList.index(filename) + 1
if nextIndex == 0 or nextIndex == len(fileList):
return None
return fileList[nextIndex]
print(nextFile("mail","test"))
I tweaked the accepted answer to allow new files to be added to the directory on the fly and for it to work if a file is deleted or changed or doesn't exist. There are better ways to work with filenames/paths, but the example below keeps it simple. Maybe it's helpful:
import os
def next_file_in_dir(directory, current_file=None):
file_list = os.listdir(directory)
next_index = 0
if current_file in file_list:
next_index = file_list.index(current_file) + 1
if next_index >= len(file_list):
next_index = 0
return file_list[next_index]
file_name = None
directory = "videos"
user_advanced_to_next = True
while user_advanced_to_next:
file_name = next_file_in_dir(directory=directory, current_file=file_name )
user_advanced_to_next = play_video("{}/{}".format(directory, file_name))
finish_and_clean_up()

How to go through files in a directory and delete them depending on if the "time" column is less than local time

I am very new to python and right now I am trying to go through a directory of 8 "train" text files in which, for example one that says: Paris - London 19.10
What I want to do is create a code (probably some sort of for loop) to automatically go through and delete the files in which the time column is less than the local time. In this case, when the train has left. I want this to happen when i start my code. What I have manage to do is for this to happen only when I give an input to try to open the file, but I do not manage to make it happen without any input given from the user.
def read_from_file(textfile):
try:
infile = open(textfile + '.txt', 'r')
infotrain = infile.readline().rstrip().split(' ')
localtime = time.asctime(time.localtime(time.time()))
localtime = localtime.split(' ')
if infotrain[2] < localtime[3]:
os.remove(textfile + '.txt')
print('This train has left the station.')
return None, None
except:
pass
(Be aware that this is not the whole function as it is very long and contains code that does not relate to my question)
Does anyone have a solution?
os.listdir() gives you all of the the file names in a directory.
import os
file_names = os.listdir(".")
for fname in file_names:
# do your time checking stuff here

Combine a .txt's name, and second interior string to create a dictionary in Python

I've written a batch script that is deployed to our network via Chocolatey and FOG that acquires the serial number of the machine and then ejects it via .txt in a file bearing the name of the PC that the serial number belongs to:
net use y: \\192.168.4.104\shared
wmic bio get serialnumber > Y:\IT\SSoftware\Serial_Numbers\%COMPUTERNAME%.txt
net use y: /delete
The folder Serial_Numbers is subsequently filled with .txts bearing the names of every computer on Campus. With this in mind I'd like to write a Python script to go through and grab every .txt name and second interior string to form a dictionary, where you can call for the PC's name, and have the serial number returned.
I'm aware as to how I'd create the dictionary, and call from it, but I'm having troubles figuring out how to properly grab the .txt's name and second interior string, any help would be greatly appreciated.
Format of .txt documents:
SerialNumber
#############
You can use os.listdir to list the directory files nad list comprehension to filter them.
Use glob to list the files in your directory.
You can simply read the first line and stop using the file while populating the dictionary and you're done:
import glob
d = {}
# loop over '.txt' files only
for filename in glob.glob('/path_to_Serial_Numbers_folder/*.txt'):
with open(filename, 'r') as f:
file_name_no_extension = '.'.join(filename.split('.')[:-1])
d[file_name_no_extension] = f.readline().strip()
print d
import glob
data = {}
for fnm in glob.glob('*.txt'):
data[fnm[:-4]] = open(fnm).readlines()[1].strip()
or, more succinctly
import glob
data = {f[:-4]:open(f).readlines()[1].strip() for f in glob.glob('*.txt')}
In the dictionary comprehension above,
f[:-4] is the filename except the last four characters (i.e., ".txt"),
open(f).readlines()[1].strip() is the second line of the file
object and eventually
f is an element of the list of filenames returned by glob.glob().

Finding and printing file name of zero length files in python

I'm learning about the os module and I need to work out to print the file names of only zero length files and the count.
So far I've figured the easiest way to do it is to generate a list or a tuple of files and their sizes in this format:
((zerotextfile1.txt, 0), (notazerotextfile.txt, 15))
Then use an if statement to only print out only files with zero length.
Then use a sum function to add the number of list items to get the count of zero length files.
So far, I've got bits and pieces - it's how to put them together I'm having trouble with.
Some of my bits (viable code I've managed to write, not much, I know):
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
print (files)
Then there is stuff like os.path.getsize() which requires I put in a filename, so I figure I've got to use a for loop to print a list of the file names in this function in order to get it to work, right?
Any tips or pointing in the right direction would be vastly appreciated!
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for root, dirs, files in os.walk(place):
for f in files:
file_path = os.path.join(root, f) #Full path to the file
size = os.path.getsize(file_path) #pass the full path to getsize()
if size == 0:
print f, file_path
are you looking for the following ?
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
if os.path.getsize(b[0] + '\\' +b[2][0]) == 0:
print (files)

Categories

Resources