Python 3 loop through subprocess output to search for filename - python

I am running a terminal command to list a directory, I would like to loop through each line returned and search for a particular filename, I have this so far...
import subprocess
for line in subprocess.check_output(['ls', '-l']):
if "(myfile.txt)" in line:
print("File Found")
But this is just outputing the list and doesn't seem to be searching for the file, anyone have an example they can point me at?

Calling ls from within subprocess would return a Bytes Object.
So, first, You might want to convert the returned value to a String.
And then split the String with New-Line ("\n") as delimiter.
Afterwards, you can iterate and search for your Needle in the List-Values.
import subprocess
# CALLING "subprocess.check_output(['ls', '-l']" RETURNS BYTES OBJECT...
# SO WE DECODE THE BYTES INTO STRING, FIRST
# AND THEN SPLIT AT THE NEW-LINE BOUNDARY TO CONVERT IT TO A LIST
for line in bytes.decode(subprocess.check_output(['ls', '-l'])).split(sep="\n"):
# NOW WE CAN CHECK IF THE DESIRED FILE IS IN THE LINE
if "(myfile.txt)" in line:
print("File Found")

You can try to pass in the encoding utf-8 and split it by \n.
for line in subprocess.check_output(['ls', '-l'], encoding="utf-8").split("\n"):
# print(line)
if "myfile.txt" in line:
print("File Found")
As originally, check_output was returning bytes, thus we pass in encoding here. Also, since you want to search it line by line, we split it with \n. (Tested on Python 3.)
subprocess.check_output: ... By default, this function will return
the data as encoded bytes. The actual encoding of the output data may
depend on the command being invoked, so the decoding to text will
often need to be handled at the application level.
This behaviour may be overridden by setting universal_newlines to True
as described above in Frequently Used Arguments. -- cited from https://docs.python.org/3/library/subprocess.html#subprocess.check_output

Why not use something that is more reliable such as os.listdir or glob:
import glob
if glob.glob('myfile.txt'):
print('File found')
else:
print('File not found')
The glob.glob function returns a list of files that match the wildcard. In this case, you will have ['myfile.txt'] if the file exists, or [] if not.

import os
def find(name):
for root, dirs, files in os.walk('C:\\');
if name in files:
print(root,name)
print("FINISH")
input()
try:
s=input("name: ")
find(s)
except:
None

to output the contents of a directory, i would recommend the os module.
import os
content = os.listdir(os.getcwd())
then you have a searchable list.
But are you sure, your file ist named (myfile.txt) ??

Related

Read all the text files in a folder and change a character in a string if it presents

I have a folder with csv formated documents with a .arw extension. Files are named as 1.arw, 2.arw, 3.arw ... etc.
I would like to write a code that reads all the files, checks and replaces the forwardslash / with a dash -. And finally creates new files with the replaced character.
The code I wrote as follows:
for i in range(1,6):
my_file=open("/path/"+str(i)+".arw", "r+")
str=my_file.read()
if "/" not in str:
print("There is no forwardslash")
else:
str_new = str.replace("/","-")
print(str_new)
f = open("/path/new"+str(i)+".arw", "w")
f.write(str_new)
my_file.close()
But I get an error saying:
'str' object is not callable.
How can I make it work for all the files in a folder? Apparently my for loop does not work.
The actual error is that you are replacing the built-in str with your own variable with the same name, then try to use the built-in str() after that.
Simply renaming the variable fixes the immediate problem, but you really want to refactor the code to avoid reading the entire file into memory.
import logging
import os
for i in range(1,6):
seen_slash = False
input_filename = "/path/"+str(i)+".arw"
output_filename = "/path/new"+str(i)+".arw"
with open(input_filename, "r+") as input, open(output_filename, "w") as output:
for line in input:
if not seen_slash and "/" in line:
seen_slash = True
line_new = line.replace("/","-")
print(line_new.rstrip('\n')) # don't duplicate newline
output.write(line_new)
if not seen_slash:
logging.warn("{0}: No slash found".format(input_filename))
os.unlink(output_filename)
Using logging instead of print for error messages helps because you keep standard output (the print output) separate from the diagnostics (the logging output). Notice also how the diagnostic message includes the name of the file we found the problem in.
Going back and deleting the output filename when you have examined the entire input file and not found any slashes is a mild wart, but should typically be more efficient.
This is how I would do it:
for i in range(1,6):
with open((str(i)+'.arw'), 'r') as f:
data = f.readlines()
for element in data:
element.replace('/', '-')
f.close()
with open((str(i)+'.arw'), 'w') as f:
for element in data:
f.write(element)
f.close()
this is assuming from your post that you know that you have 6 files
if you don't know how many files you have you can use the OS module to find the files in the directory.

How to deal with invalid utf8 in fileinput?

I have basically the following code:
def main():
for filename in fileinput.input():
filename = filename.strip()
process_file(filename)
The script takes a newline-separated list of file names as its input. However, some of the file names contain invalid utf8, which causes fileinput.input() to implode. I've read about the surrogateescape error handler, which I think is what I want, but I don't know how to set the error handler for fileinput.
In short: how do I get fileinput to deal with invalid Unicode?
filenames on POSIX may be arbitrary sequences of bytes (except b'\0' and b'/') i.e., no character encoding can decode them in the general case (that is why os.fsdecode() exists that uses surrogateescape error handler).
You could use a binary mode to read the filenames then either skip undecodable filenames if the input shouldn't contain them or pass them as is (or os.fsdecode()) to functions that expect filenames:
for filename in fileinput.input(mode='rb'):
process_file(os.fsdecode(filename).strip())
Beware, there were several known Python bugs related to using a binary mode and fileinput e.g.:
fileinput should use stdin.buffer for "rb" mode
fileinput.FileInput.readline() always returns str object at the end even if in 'rb' mode
Following documentation please use opening hook:
def main():
for filename in fileinput.input(openhook=fileinput.hook_encoded("utf-8")):
filename = filename.strip()
process_file(filename)

python zipfile module with TextIOWrapper

I wrote the following piece of code to read a text file inside of a zipped directory. Since I don't want the output in bytes I added the TextIOWrapper to display the output as a string. Assuming that this is the right way to read a zip file line by line (if it isn't let me know), then why does the output print a blank line? Is there any way to get rid of it?
import zipfile
import io
def test():
zf = zipfile.ZipFile(r'C:\Users\test\Desktop\zip1.zip')
for filename in zf.namelist():
words = io.TextIOWrapper(zf.open(filename, 'r'))
for line in words:
print (line)
zf.close()
test()
>>>
This is a test line...
This is a test line...
>>>
The two lines in the file inside of the zipped folder are:
This is a test line...
This is a test line...
Thanks!
zipfile.open opens the zipped file in binary mode, which doesn't strip out carriage returns (i.e. '\r'), and neither did the defaults for TextIOWrapper in my test. Try configuring TextIOWrapper to use universal newlines (i.e. newline=None):
import zipfile
import io
zf = zipfile.ZipFile('data/test_zip.zip')
for filename in zf.namelist():
with zf.open(filename, 'r') as f:
words = io.TextIOWrapper(f, newline=None)
for line in words:
print(repr(line))
Output:
'This is a test line...\n'
'This is a test line...'
The normal behavior when iterating a file by line in Python is to retain the newline at the end. The print function also adds a newline, so you'll get a blank line. To just print the file you could instead use print(words.read()). Or you could use the end option of the print function: print(line, end='').

f.read coming up empty

I'm doing all this in the interpreter..
loc1 = '/council/council1'
file1 = open(loc1, 'r')
at this point i can do file1.read() and it prints the file's contents as a string to standard output
but if i add this..
string1 = file1.read()
string 1 comes back empty.. i have no idea what i could be doing wrong. this seems like the most basic thing!
if I go on to type file1.read() again, the output to standard output is just an empty string. so, somehow i am losing my file when i try to create a string with file1.read()
You can only read a file once. After that, the current read-position is at the end of the file.
If you add file1.seek(0) before you re-read it, you should be able to read the contents again. A better approach, however, is to read into a string the first time and then keep it in memory:
loc1 = '/council/council1'
file1 = open(loc1, 'r')
string1 = file1.read()
print string1
You do not lose it, you just move offset pointer to the end of file and try to read some more data. Since it is the end of the file, no more data is available and you get empty string. Try reopening file or seeking to zero position:
f.read()
f.seek(0)
f.read()
Using with is the best syntax to use because it closes the connection to the file after using it(since python 2.5):
with open('/council/council1', 'r') as input_file:
text = input_file.read()
print(text)
To quote the official documentation on read():
To read a file’s contents, call f.read(size)
When size is omitted or negative, the entire contents of the file will
be read and returned;
And the most relevant part:
If the end of the file has been reached, f.read() will return an empty
string ('').
Which means that if you use read() twice consecutively, it is expected that the second time you'll get an empty string. Either store it the first time or use f.seek(0) to go back to the start. Together, they provide a lower level API to give you greater control.
Besides using a context manager to automatically open and close the file, there's another way to read a whole text file, using pathlib, example below:
#!/usr/bin/env python3
from pathlib import Path
txt_file = Path("myfile.txt")
try:
content = txt_file.read_text()
except FileNotFoundError:
print("Could not find file")
else:
print(f"The content is: {content}")
print(f"I can also read again: {txt_file.read_text()}")
As you can see, you can call read_text() several times and you'll get the full content, no surprises. Of course you wouldn't want to do that in production code since read_text() opens and closes the file each time, it's still best to store it. I could recommend pathlib highly when dealing with files and file paths.
It's outside the scope, but it may be worth noting a difference when reading line by line. Unlike the file object obtained by open(), PosixPath returned by Path() is not iterable. The equivalent of:
with open('file.txt') as f:
for line in f:
print(line)
Would be something like:
for line in Path('file.txt').read_text().split('\n'):
print(line)
One advantage of the first approach, with open, is that the entire file is not read into memory at once.
make sure your location is correct. Do you actually have a directory called /council under your root directory (/) ?. also use, os.path.join() to create your path
loc1 = os.path.join("/path","dir1","dir2")

Replace string in a specific line using python

I'm writing a python script to replace strings from a each text file in a directory with a specific extension (.seq). The strings replaced should only be from the second line of each file, and the output is a new subdirectory (call it clean) with the same file names as the original files, but with a *.clean suffix. The output file contains exactly the same text as the original, but with the strings replaced. I need to replace all these strings: 'K','Y','W','M','R','S' with 'N'.
This is what I've come up with after googling. It's very messy (2nd week of programming), and it stops at copying the files into the clean directory without replacing anything. I'd really appreciate any help.
Thanks before!
import os, shutil
os.mkdir('clean')
for file in os.listdir(os.getcwd()):
if file.find('.seq') != -1:
shutil.copy(file, 'clean')
os.chdir('clean')
for subdir, dirs, files in os.walk(os.getcwd()):
for file in files:
f = open(file, 'r')
for line in f.read():
if line.__contains__('>'): #indicator for the first line. the first line always starts with '>'. It's a FASTA file, if you've worked with dna/protein before.
pass
else:
line.replace('M', 'N')
line.replace('K', 'N')
line.replace('Y', 'N')
line.replace('W', 'N')
line.replace('R', 'N')
line.replace('S', 'N')
some notes:
string.replace and re.sub are not in-place so you should be assigning the return value back to your variable.
glob.glob is better for finding files in a directory matching a defined pattern...
maybe you should be checking if the directory already exists before creating it (I just assumed this, this could not be your desired behavior)
the with statement takes care of closing the file in a safe way. if you don't want to use it you have to use try finally.
in your example you where forgetting to put the sufix *.clean ;)
you where not actually writing the files, you could do it like i did in my example or use fileinput module (which until today i did not know)
here's my example:
import re
import os
import glob
source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]
# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
os.makedirs(target_dir)
for source_file in source_files:
target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
with open(source_file,'r') as sfile:
with open(target_file,'w') as tfile:
lines = sfile.readlines()
# do the replacement in the second line.
# (remember that arrays are zero indexed)
lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
tfile.writelines(lines)
print "DONE"
hope it helps.
You should replace line.replace('M', 'N') with line=line.replace('M', 'N'). replace returns a copy of the original string with the relevant substrings replaced.
An even better way (IMO) is to use re.
import re
line="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
line=re.sub("K|Y|W|M|R|S",'N',line)
print line
Here are some general hints:
Don't use find for checking the file extension (e.g., this would also match "file1.seqdata.xls"). At least use file.endswith('seq'), or, better yet, os.path.splitext(file)[1]
Actually, don't do that altogether. This is what you want:
import glob
seq_files = glob.glob("*.seq")
Don't copy the files, it's much easier to use just one loop:
for filename in seq_files:
in_file = open(filename)
out_file = open(os.path.join("clean", filename), "w")
# now read lines from in_file and write lines to out_file
Don't use line.__contains__('>'). What you mean is
if '>' in line:
(which will call __contains__ internally). But actually, you want to know wether the line starts with a `">", not if there's one somewhere within the line, be it at the beginning or not. So the better way would be this:
if line.startswith(">"):
I'm not familiar with your file type; if the ">" check really is just for determining the first line, there's better ways to do that.
You don't need the if block (you just pass). It's cleaner to write
if not something:
do_things()
other_stuff()
instead of
if something:
pass
else:
do_things()
other_stuff()
Have fun learning Python!
you need to allocate the result of the replacement back to "line" variable
line=line.replace('M', 'N')
you can also use the module fileinput for inplace edit
import os, shutil,fileinput
if not os.path.exists('clean'):
os.mkdir('clean')
for file in os.listdir("."):
if file.endswith(".seq"):
shutil.copy(file, 'clean')
os.chdir('clean')
for subdir, dirs, files in os.walk("."):
for file in files:
f = fileinput.FileInput(file,inplace=0)
for n,line in enumerate(f):
if line.lstrip().startswith('>'):
pass
elif n==1: #replace 2nd line
for repl in ["M","K","Y","W","R","S"]:
line=line.replace(ch, 'N')
print line.rstrip()
f.close()
change inplace=0 to inplace=1 for in place editing of your files.
line.replace is not a mutator, it leaves the original string unchanged and returns a new string with the replacements made. You'll need to change your code to line = line.replace('R', 'N'), etc.
I think you also want to add a break statement at the end of your else clause, so that you don't iterate over the entire file, but stop after having processed line 2.
Lastly, you'll need to actually write the file out containing your changes. So far, you are just reading the file and updating the line in your program variable 'line'. You need to actually create an output file as well, to which you will write the modified lines.

Categories

Resources