I have a big log file, and I want to read the relevant part from this log.
Every section start with ###start log###, so I need to search the last occurrence of ###start log###, and read the lines until the end of the file.
I see a solution that can search a line by it seek (number), but I don't know it, I know only the content of the line.
What is the best solution for this case?
I'd suggest reading the file backwards until the first occurrence of the start tag.
You may do it in one of two ways: if the file fits into memory try this: Read a file in reverse order using python
If the file is too large - you may find this link helpful:
http://code.activestate.com/recipes/120686-read-a-text-file-backwards/
Given the size of the file, you basically need to read the file in reverse order. There are some posts on how to read a file in reverse order in python; If you are on a unix system, you may also take a look at unix tac command, then read the output through a pipe and stop when you hit the start of the log:
>>> from subprocess import PIPE, Popen
>>> from itertools import takewhile
>>> with Popen(['tac', 'tmp.txt'], stdout=PIPE) as proc:
... iter = takewhile(lambda line: line != b'###start log###\n', proc.stdout)
... lines = list(iter)
Then the last log lines in correct order would be:
>>> list(reversed(lines))
with open(filename) as handle:
text = handle.read()
lines = text.splitlines()
lines.reverse()
i = next(i for i, line in enumerate(lines) if line == '###start log###')
relevant_lines = lines[:i]
relevant_lines.reverse()
Related
I have a file of configs. I am trying to get my python code to search for two different strings in a text file, copy (Cut would make my life so much easier) and paste them into a text file without duplicates. My code is working for just one string and every time I try to make it do two it will either not work or only find the lines with both strings.
What am I doing wrong?
import sys
with open("ns-batch.bak.txt") as f:
lines = f.readlines()
lines = [l for l in lines if "10.42.88.192"
in l]
with open("Py_parse2.txt", "w") as f1:
f1.writelines(lines)
Okay, here's my take on things.
Assuming that you are looking for certain strings within each line, and then want to "copy" those lines to another file to see in which lines those strings were found, this, for example, should work:
lines = list()
with open("ns-batch.bak.txt", "r") as orig_file:
for line in orig_file:
if ("12.32.45.1" in line) or ("27.82.1.0" in line): #if "12.32.45.1" in line:
lines.append(line)
with open("Py_parse2.txt", "x") as new_file:
for line in lines:
new_file.write(line + '\n')
Depending on how many strings you are looking for on each line, you can either add or remove in statements on line 5 of my example code (I also provided an example line of code on the same line that demonstrates only needing to find one string on a line, which I commented out). The import sys statement does absolutely nothing in this case; the sys module/package is not needed to do this work, so do not include that import statement. If you want to learn more about file I/O, check out this link ( https://docs.python.org/3/tutorial/inputoutput.html?highlight=write ) and go to section "7.2 Reading and Writing Files".
I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())
I am trying to write a python script to read in a large text file from some modeling results, grab the useful data and save it as a new array. The text file is output in a way that has a ## starting each line that is not useful. I need a way to search through and grab all the lines that do not include the ##. I am used to using grep -v in this situation and piping to a file. I want to do it in python!
Thanks a lot.
-Tyler
I would use something like this:
fh = open(r"C:\Path\To\File.txt", "r")
raw_text = fh.readlines()
clean_text = []
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line)
Or you could also clean the newline and carriage return non-printing characters at the same time with a small modification:
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line.rstrip("\r\n"))
You would be left with a list object that contains one line of required text per element. You could split this into individual words using string.split() which would give you a nested list per original list element which you could easily index (assuming your text has whitespaces of course).
clean_text[4][7]
would return the 5th line, 8th word.
Hope this helps.
[Edit: corrected indentation in loop]
My suggestion would be to do the following:
listoflines = [ ]
with open(.txt, "r") as f: # .txt = file, "r" = read
for line in f:
if line[:2] != "##": #Read until the second character
listoflines.append(line)
print listoflines
If you're feeling brave, you can also do the following, CREDITS GO TO ALEX THORNTON:
listoflines = [l for l in f if not l.startswith('##')]
The other answer is great as well, especially teaching the .startswith function, but I think this is the more pythonic way and also has the advantage of automatically closing the file as soon as you're done with it.
I need to get a specific line number from a file that I am passing into a python program I wrote. I know that the line I want will be line 5, so is there a way I can just grab line 5, and not have to iterate through the file?
If you know how many bytes you have before the line you're interested in, you could seek to that point and read out a line. Otherwise, a "line" is not a first class construct (it's just a list of characters terminated by a character you're assigning a special meaning to - a newline). To find these newlines, you have to read the file in.
Practically speaking, you could use the readline method to read off 5 lines and then read your line.
Why are you trying to do this?
you can to use linecache
import linecache
get = linecache.getline
print(get(path_of_file, number_of_line))
I think following should do :
line_number=4
# Avoid reading the whole file
f = open('path/to/my/file','r')
count=1
for i in f.readline():
if count==line_number:
print i
break
count+=1
# By reading the whole file
f = open('path/to/my/file','r')
lines = f.read().splitlines()
print lines[line_number-1] # Index starts from 0
This should give you the 4th line in the file.
I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())