Pretty new to Python and have been writing up a script to pick out certain lines of a basic log file
Basically the function searches lines of the file and when it finds one I want to output to a separate file, adds it into a list, then also adds the next five lines following that. This then gets output to a separate file at the end in a different funcition.
What I've been trying to do following that is jump the loop to continue on from the last of those five lines, rather than going over them again. I thought the last line in the code would solved the problem, but unfortunately not.
Are there any recommended variations of a for loop I could use for this purpose?
def readSingleDayLogs(aDir):
print 'Processing files in ' + str(aDir) + '\n'
lineNumber = 0
try:
open_aDirFile = open(aDir) #open the log file
for aLine in open_aDirFile: #total the num. lines in file
lineNumber = lineNumber + 1
lowerBound = 0
for lineIDX in range(lowerBound, lineNumber):
currentLine = linecache.getline(aDir, lineIDX)
if (bunch of logic conditions):
issueList.append(currentLine)
for extraLineIDX in range(1, 6): #loop over the next five lines of the error and append to issue list
extraLine = linecache.getline(aDir, lineIDX+ extraLineIDX) #get the x extra line after problem line
issueList.append(extraLine)
issueList.append('\n\n')
lowerBound = lineIDX
You should use a while loop :
line = lowerBound
while line < lineNumber:
...
if conditions:
...
for lineIDX in range(line, line+6):
...
line = line + 6
else:
line = line + 1
A for-loop uses an iterator over the range, so you can have the ability to change the loop variable.
Consider using a while-loop instead. That way, you can update the line index directly.
I would look at something like:
from itertools import islice
with open('somefile') as fin:
line_count = 0
my_lines = []
for line in fin:
line_count += 1
if some_logic(line):
my_lines.append(line)
next_5 = list(islice(fin, 5))
line_count += len(next_5)
my_lines.extend(next_5)
This way, by using islice on the input, you're able to move the iterator ahead and resume after the 5 lines (perhaps fewer if near the end of the file) are exhausted.
This is based on if I'm understanding correctly that you can read forward through the file, identify a line, and only want a fixed number of lines after that point, then resume looping as per normal. (You may not even require the line counting if that's all you're after as it only appears to be for the getline and not any other purpose).
If you indeed you want to take the next 5, and still consider the following line, you can use itertools.tee to branch at the point of the faulty line, and islice that and let the fin iterator resume on the next line.
Related
I would like to make a script that read a text line by line and based on lines if it finds a certain parameter populates an array. The idea is this
Read line
if Condition 1
#True
nested if Condition 2
...
else Condition 1 is not true
read next line
I can't get it to work though. I'm using readline () to read the text line by line, but the main problem is that the command never works to make it read the next line. Can you help me? Below an extract of my actual code:
col = 13 # colonne
rig = 300 # righe
a = [ [ None for x in range(col) ] for y in range(rig) ]
counter = 1
file = open('temp.txt', 'r')
files = file.readline()
for line in files:
if 'bandEUTRA: 32' in line:
if 'ca-BandwidthClassDL-EUTRA: a' in line:
a[counter][5] = 'DLa'
counter = counter + 1
else:
next(files)
else:
next(files)
print('\n'.join(map(str, a)))
Fixes for the code you asked about inline, and some other associated cleanup, with comments:
col = 13 # colonne
rig = 300 # righe
a = [[None] * col for y in range(rig)] # Innermost repeated list of immutable
# can use multiplication, just don't do it for
# outer list(s), see: https://stackoverflow.com/q/240178/364696
counter = 1
with open('temp.txt') as file: # Use with statement to get guaranteed file closure; 'r' is implicit mode and can be omitted
# Removed: files = file.readline() # This makes no sense; files would be a single line from the file, but your original code treats it as the lines of the file
# Replaced: for line in files: # Since files was a single str, this iterated characters of the file
for line in file: # File objects are iterators of their own lines, so you can get the lines one by one this way
if 'bandEUTRA: 32' in line and 'ca-BandwidthClassDL-EUTRA: a' in line: # Perform both tests in single if to minimize arrow pattern
a[counter][5] = 'DLa'
counter += 1 # May as well not say "counter" twice and use +=
# All next() code removed; next() advances an iterator and returns the next value,
# but files was not an iterator, so it was nonsensical, and the new code uses a for loop that advances it for you, so it was unnecessary.
# If the goal is to intentionally skip the next line under some conditions, you *could*
# use next(files, None) to advance the iterator so the for loop will skip it, but
# it's rare that a line *failing* a test means you don't want to look at the next line
# so you probably don't want it
# This works:
print('\n'.join(map(str, a)))
# But it's even simpler to spell it as:
print(*a, sep="\n")
# which lets print do the work of stringifying and inserting the separator, avoiding
# the need to make a potentially huge string in memory; it *might* still do so (no documented
# guarantees), but if you want to avoid that possibility, you could do:
sys.stdout.writelines(map('{}\n'.format, a))
# which technically doesn't guarantee it, but definitely actually operates lazily, or
for x in a:
print(x)
# which is 100% guaranteed not to make any huge strings
You can do:
with open("filename.txt", "r") as f:
for line in f:
clean_line = line.rstrip('\r\n')
process_line(clean_line)
Edit:
for your application of populating an array, you could do something like this:
with open("filename.txt", "r") as f:
contains = ["text" in l for l in f]
This will give you a list of length number of lines in filename.txt, the contents of the array will be False for each line that doesn't contain text, and True for each line that does.
Edit 2: To reflect #ShadowRanger's comments, I've changed my code to not do iterate over each line in the file without reading the whole thing at once.
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
I have a text file I wish to analyze. I'm trying to find every line that contains certain characters (ex: "#") and then print the line located 3 lines before it (ex: if line 5 contains "#", I would like to print line 2)
This is what I got so far:
file = open('new_file.txt', 'r')
a = list()
x = 0
for line in file:
x = x + 1
if '#' in line:
a.append(x)
continue
x = 0
for index, item in enumerate(a):
for line in file:
x = x + 1
d = a[index]
if x == d - 3:
print line
continue
It won't work (it prints nothing when I feed it a file that has lines containing "#"), any ideas?
First, you are going through the file multiple times without re-opening it for subsequent times. That means all subsequent attempts to iterate the file will terminate immediately without reading anything.
Second, your indexing logic a little convoluted. Assuming your files are not huge relative to your memory size, it is much easier to simply read the whole into memory (as a list) and manipulate it there.
myfile = open('new_file.txt', 'r')
a = myfile.readlines();
for index, item in enumerate(a):
if '#' in item and index - 3 >= 0:
print a[index - 3].strip()
This has been tested on the following input:
PrintMe
PrintMe As Well
Foo
#Foo
Bar#
hello world will print
null
null
##
Ok, the issue is that you have already iterated completely through the file descriptor file in line 4 when you try again in line 11. So line 11 will make an empty loop. Maybe it would be a better idea to iterate the file only once and remember the last few lines...
file = open('new_file.txt', 'r')
a = ["","",""]
for line in file:
if "#" in line:
print(a[0], end="")
a.append(line)
a = a[1:]
For file IO it is usually most efficient for programmer time and runtime to use reg-ex to match patterns. In combination with iteration through the lines in the file. your problem really isn't a problem.
import re
file = open('new_file.txt', 'r')
document = file.read()
lines = document.split("\n")
LinesOfInterest = []
for lineNumber,line in enumerate(lines):
WhereItsAt = re.search( r'#', line)
if(lineNumber>2 and WhereItsAt):
LinesOfInterest.append(lineNumber-3)
print LinesOfInterest
for lineNumber in LinesOfInterest:
print(lines[lineNumber])
Lines of Interest is now a list of line numbers matching your criteria
I used
line1,0
line2,0
line3,0
#
line1,1
line2,1
line3,1
#
line1,2
line2,2
line3,2
#
line1,3
line2,3
line3,3
#
as input yielding
[0, 4, 8, 12]
line1,0
line1,1
line1,2
line1,3
I have some CSV files that I have to modify which I do through a loop. The code loops through the source file, reads each line, makes some modifications and then saves the output to another CSV file. In order to check my work, I want the first line and the last line saved in another file so I can confirm that nothing was skipped.
What I've done is put all of the lines into a list then get the last one from the index minus 1. This works but I'm wondering if there is a more elegant way to accomplish this.
Code sample:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
check_count = 0
check_list = []
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
check_list.append(line)
check_count += 1
if check_count == 1:
check.write(line)
[CSV modifications become a string called "newline"]
fb.write(newline)
final_check = check_list[len(check_list)-1]
check.write(final_check)
fb.close()
If you actually need check_list for something, then, as the other answers suggest, using check_list[-1] is equivalent to but better than check_list[len(check_list)-1].
But do you really need the list? If all you want to keep track of is the first and last lines, you don't. If you keep track of the first line specially, and keep track of the current line as you go along, then at the end, the first line and the current line are the ones you want.
In fact, since you appear to be writing the first line into check as soon as you see it, you don't need to keep track of anything but the current line. And the current line, you've already got that, it's line.
So, let's strip all the other stuff out:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
first_line = True
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
if first_line:
check.write(line)
first_line = False
[CSV modifications become a string called "newline"]
fb.write(newline)
check.write(line)
fb.close()
You can enumerate the csv rows of inpunt file, and check the index, like this:
def CVS1():
with open('C:\\HP\\WS\\final-cir.csv','wb') as fb, open('C:\\HP\\WS\\check-all.csv','wb') as check, open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for idx,line in enumerate(skip_first_line):
if idx==0 or idx==len(skip_first_line):
check.write(line)
#[CSV modifications become a string called "newline"]
fb.write(newline)
I've replaced the open statements with with block, to delegate to interpreter the files handlers
you can access the index -1 directly:
final_check = check_list[-1]
which is nicer than what you have now:
final_check = check_list[len(check_list)-1]
If it's not an empty or 1 line file you can:
my_file = open(root_to file, 'r')
my_lines = my_file.readlines()
first_line = my_lines[0]
last_line = my_lines[-1]
I'm only just beginning my journey into Python. I want to build a little program that will calculate shim sizes for when I do the valve clearances on my motorbike. I will have a file that will have the target clearances, and I will query the user to enter the current shim sizes, and the current clearances. The program will then spit out the target shim size. Looks simple enough, I have built a spread-sheet that does it, but I want to learn python, and this seems like a simple enough project...
Anyway, so far I have this:
def print_target_exhaust(f):
print f.read()
#current_file = open("clearances.txt")
print print_target_exhaust(open("clearances.txt"))
Now, I've got it reading the whole file, but how do I make it ONLY get the value on, for example, line 4. I've tried print f.readline(4) in the function, but that seems to just spit out the first four characters... What am I doing wrong?
I'm brand new, please be easy on me!
-d
To read all the lines:
lines = f.readlines()
Then, to print line 4:
print lines[4]
Note that indices in python start at 0 so that is actually the fifth line in the file.
with open('myfile') as myfile: # Use a with statement so you don't have to remember to close the file
for line_number, data in enumerate(myfile): # Use enumerate to get line numbers starting with 0
if line_number == 3:
print(data)
break # stop looping when you've found the line you want
More information:
with statement
enumerate
Not very efficient, but it should show you how it works. Basically it will keep a running counter on every line it reads. If the line is '4' then it will print it out.
## Open the file with read only permit
f = open("clearances.txt", "r")
counter = 0
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time
## till the file is empty
while line:
counter = counter + 1
if counter == 4
print line
line = f.readline()
f.close()