Opening and performing an operation on multiple files - python

I have 3 files that contain lists of other files in the directory. I'm trying take the files that are in the lists and copy them to a new directory. I think I'm tripping up on the best way to open the files as I get a IOError: [Errno 2] No such file or directory. I had a play around using with to open the files but couldn't get my operation to work. Here's my code and a bit of one of the files I'm trying to read.
import shutil
import os
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
res_files=[line.split()[1] for line in f]
f=close()
os.mkdir(os.path.expanduser('~/Clustered/polymorph_matches'))
for file in res_files:
shutil.copy(file, (os.path.expanduser('~/Clustered/polymorph_matches')) + "/" + file)
PENCEN.res 2.res number molecules matched: 15 rms deviation 0.906016
PENCEN.res 3.res number molecules matched: 15 rms deviation 1.44163
PENCEN.res 5.res number molecules matched: 15 rms deviation 0.867366
Edit: I used Ayas code below to fix this but now get IOError: [Errno 2] No such file or directory: 'p'. I'm guessing its reading the first character of the file name and failing there but I can't figure out why.
res_files = []
for filename in 'polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt':
res_files += [line.split()[1] for line in open(filename)]

Python treats consecutive string constants as a single string, so the line...
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
...is actually interpreted as...
f=open('polymorphI_hits.txtpolymorphII_hits.txtpolymorphIII_hits.txt')
...which presumably refers to a non-existent file.
I don't believe there's a way to use open() to open multiple files in one call, so you'll need to change...
f=open('polymorphI_hits.txt' 'polymorphII_hits.txt' 'polymorphIII_hits.txt')
res_files=[line.split()[1] for line in f]
f=close()
...to something more like...
res_files = []
for filename in 'polymorphI_hits.txt', 'polymorphII_hits.txt', 'polymorphIII_hits.txt':
res_files += [line.split()[1] for line in open(filename)]
The rest of the code looks okay, though.

Related

Seach a directory for a file that fits a filemask, search it line by line for specific text and return that line

In Python, its the vagueness that I struggle with.
Lets start with what I know. I know that I want to search a specific directory for a file. And I want to search that file for a specific line that contains a specific string, and return only that line.
Which brings me to what I don't know. I have a vague description of the specific filename:
some_file_{variable}_{somedatestamp}_{sometimestamp}.log
So I know the file will start with some_file followed by a known variable and ending in .log. I don't know the date stamp or the time stamp. And to top it off, the file might not still exist at the time of searching, so I need to cover for that eventuality.
To better describe the problem, I have the line of the BASH script that accomplishes this:
ls -1tr /dir/some_file_${VARIABLE}_*.log | tail -2 | xargs -I % grep "SEARCH STRING" %
So basically, I want to recreate that line of code from the BASH script in Python, and throw a message in the case that the search returns no files.
Some variant of this will work. This will work for an arbitrary folder structure and will search all subfolders.... results will hold path to directory, filename, and txt, line number (1-based).
Some key pieces:
os.walk is beautiful for searching directory trees. The top= can be relative or absolute
use a context manager with ... as ... to open files as it closes them automatically
python iterates over text-based files in line-by line format
Code
from os import walk, path
magic_text = 'lucky charms'
results = []
for dirpath, _, filenames in walk(top='.'):
for f in filenames:
# check the name. can use string methods or indexing...
if f.startswith('some_name') and f[-4:] == '.log':
# read the lines and look for magic text
with open(path.join(dirpath, f), 'r') as src:
# if you iterate over a file, it returns line-by-line
for idx, line in enumerate(src):
# strings support the "in" operator...
if magic_text in line:
results.append((dirpath, f, idx+1, line))
for item in results:
print(f'on path {item[0]} in file {item[1]} on line {item[2]} found: {item[3]}')
In a trivial folder tree, I placed the magic words in one file and got this result:
on path ./subfolder/subsub in file some_name_331a.log on line 2 found: lucky charms are delicious
See if this works for you:
from glob import glob
import os
log_dir = 'C:\\Apps\\MyApp\\logs\\'
log_variable = input("Enter variable:")
filename = "some_file"+log_variable
# Option 1
searched_files1 = glob(log_dir+filename+'*.log')
print(f'Total files found = {len(searched_files1)}')
print(searched_files1)
# Option 2
searched_files2 = []
for object in os.listdir(log_dir):
if (os.path.isfile(os.path.join(log_dir,object)) and object.startswith(filename) and object.endswith('.log')):
searched_files2.append(object)
print(f'Total files found = {len(searched_files2)}')
print(searched_files2)

Selectively replacing csv header names

I have been searching for a solution for this and haven't been able to find one. I have a directory of folders which contain multiple, very-large csv files. I'm looping through each csv in each folder in the directory to replace values of certain headers. I need the headers to be consistent (from file to file) in order to run a different script to process all the data properly.
I found this solution that I though would work: change first line of a file in python.
However this is not working as expected. My code:
from_file = open(filepath)
# for line in f:
# if
data = from_file.readline()
# print(data)
# with open(filepath, "w") as f:
print 'DBG: replacing in file', filepath
# s = s.replace(search_pattern, replacement)
for i in range(len(search_pattern)):
data = re.sub(search_pattern[i], replacement[i], data)
# data = re.sub(search_pattern, replacement, data)
to_file = open(filepath, mode="w")
to_file.write(data)
shutil.copyfileobj(from_file, to_file)
I want to replace the header values in search_pattern with values in replacement without saving or writing to a different file - I want to modify the file. I have also tried
shutil.copyfileobj(from_file, to_file, -1)
As I understand it that should copy the whole file rather than breaking it up in chunks, but it doesn't seem to have an effect on my output. Is it possible that the csv is just too big?
I haven't been able to determine a different way to do this or make this way work. Any help would be greatly appreciated!
this answer from change first line of a file in python you copied from doesn't work in windows
On Linux, you can open a file for reading & writing at the same time. The system ensures that there's no conflict, but behind the scenes, 2 different file objects are being handled. And this method is very unsafe: if the program crashes while reading/writing (power off, disk full)... the file has a great chance to be truncated/corrupt.
Anyway, in Windows, you cannot open a file for reading and writing at the same time using 2 handles. It just destroys the contents of the file.
So there are 2 options, which are portable and safe:
create a file in the same directory, once copied, delete first file, and rename the new one
Like this:
import os
import shutil
filepath = "test.txt"
with open(filepath) as from_file, open(filepath+".new","w") as to_file:
data = from_file.readline()
to_file.write("something else\n")
shutil.copyfileobj(from_file, to_file)
os.remove(filepath)
os.rename(filepath+".new",filepath)
This doesn't take much longer, because the rename operation is instantaneous. Besides, if the program/computer crashes at any point, one of the files (old or new) is valid, so it's safe.
if patterns have the same length, use read/write mode
like this:
filepath = "test.txt"
with open(filepath,"r+") as rw_file:
data = rw_file.readline()
data = "h"*(len(data)-1) + "\n"
rw_file.seek(0)
rw_file.write(data)
Here we, read the line, replace the first line by the same amount of h characters, rewind the file and write the first line back, overwriting previous contents, keeping the rest of the lines. This is also safe, and even if the file is huge, it's very fast. The only constraint is that the pattern must be of the exact same size (else you would have remainders of the previous data, or you would overwrite the next line(s) since no data is shifted)

Python - IOError: [Errno 13] Permission denied

Im trying to get a local directory from argv and iterate through the folder and print the contents of each file within. However i am getting a [Errno] 13 saying permission denied. Ive tried researching the problem but have come up empty handed.
#!/usr/bin/python
import os
import sys
path = open(sys.argv[1],'r') #'inputs/' path to working input dir
file_list = os.listdir(path) #create list of filenames in path dir
for fn in file_list:
file = open(path+'/'+fn) #open each file in dir for manipulation
for line in file:
print(line)
os.listdir(), as its name implies, returns a list of all occupants of the given directory, including both files and directories (and, if you're on Unix/Linux, other stuff like symlinks and devices and whatnot). You are then blindly trying to open() each item in the list and print() its contents. Unfortunately, open() only works on file-like objects, and specifically does not work on directories, hence Errno 13, Permission Denied.
An alternative is to use os.scandir(), which works a little bit differently. Instead of returning a flat list that you can read immediately, os.scandir() returns a generator which essentially gives you objects as you ask for them, instead of giving them all to you at once. In fact, the following code adapted from the docs is a good starting place for what you need:
for entry in os.scandir(path):
if entry.is_file():
print(entry.name)
os.scandir() is returning DirEntry objects. Simply use os.path.join() to create a full pathname out of the path argument you pass to os.listdir() in your original code, and entry.name from the code above, and then, using the with context manager, open() the file and display its contents:
for entry in os.scandir(path):
if entry.is_file():
with open(os.path.join(path, entry), "r") as f:
for line in f:
print(line)
One of the advantages of using with is that you don't have to remember to close the file handle that is assigned when you use something like this:
f = open("myfile.txt, "r")
# do stuff with f
...
f.close()
Otherwise, you have a dangling file handle that could potentially cause problems, depending on how many there are and what you've done with them. It's just a good practice to close() what you open(). With with, you don't have to worry about it - the file handle is closed as soon as you exit the block.

How to search string in files recursively in python

I am trying to find all log files in my C:\ and then in these log file find a string. If the string is found the output should be the abs path of the log file where the string is found. below is what I have done till now.
import os
rootdir=('C:\\')
for folder,dirs,file in os.walk(rootdir):
for files in file:
if files.endswith('.log'):
fullpath=open(os.path.join(folder,files),'r')
for line in fullpath.read():
if "saurabh" in line:
print(os.path.join(folder,files))
Your code is broken at:
for line in fullpath.read():
The statement fullpath.read() will return the entire file as one string, and when you iterate over it, you will be iterating a character at a time. You will never find the string 'saurabh' in a single character.
A file is its own iterator for lines, so just replace this statement with:
for line in fullpath:
Also, for cleanliness, you might want to close the file when you're done, either explicitly or by using a with statement.
Finally, you may want to break when you find a file, rather than printing the same file out multiple times (if there are multiple occurrences of your string):
import os
rootdir=('C:\\')
for folder, dirs, files in os.walk(rootdir):
for file in files:
if file.endswith('.log'):
fullpath = os.path.join(folder, file)
with open(fullpath, 'r') as f:
for line in f:
if "saurabh" in line:
print(fullpath)
break

searching files from a lot of files for a keyword and printing the sentence containing keyword,filename in python

import os
path = 'C:\\Users\\Kabeer\\Documents\\testdata'
listing = os.listdir(path)
for infile in listing:
read_f = open(infile)
for line in read_f:
if 'arch' in line:
print(line)
print ("current file is: " + infile)
I have placed all the files in a folder. I want to read each file for a keyword. If it contains the keyword then print the name of file and entire sentence containing keyword.
I am absolute beginner in python. The code above is also picked from stackoverflow forum only.
I am getting error. File not found error read_f=open(infile).No such file or directory.
I know error is in loop. How to put each file through loop? I tried also putting all the files in list. But I am not able to get each file and read from loop.
Thanks
Kabeer
Try use read_f = open(os.path.join(path,infile))

Categories

Resources