I am looking to get the last line produced from the line variable
bash-4.1$ cat file1_dup.py
#!/usr/bin/python
with open("file1.txt") as f:
lines = f.readlines()
for line in lines:
if "!" in line:
line = line.split()[-1].strip()
print line
output i am getting is as follows ..
-122.1058
-123.1050
-125.10584323
The result i wanted to be printed out is
-125.10584323
Moreover, i got the hint from some goghling and getting the output
desired but that seems bit complicated to me at the point ..
bash-4.1$ cat file2_dup.py
#!/usr/bin/python
def file_read_from_tail(fname,n):
with open(fname) as f:
f=f.read().splitlines()
lines=[x for x in f]
for i in range(len(lines)-n,len(lines)):
line = lines[i].split()[-1]
#line = line.split()[-1]
print line
file_read_from_tail('file1.txt',1)
this yeilds teh desired as folows..
bash-4.1$ ./file2_dup.py
-125.10584323
PS: i just borrow the question for the sake of intrest from:
how to read a specific line and print a specific position in this line using python
You could test if the new line is smaller as the one before like this
#!/usr/bin/python
res_line = 0
with open("file1.txt") as f:
lines = f.readlines()
for line in lines:
if "!" in line:
line = float(line.split()[-1].strip())
if res_line > line:
res_line = line
print res_line
Edit:
you can use enumerate() to get the lines indexed in a loop:
with open("file1.txt", "rt") as f:
lines = f.readlines()
for line, content in enumerate(lines):
# apply your logic to line and/or content here
# by adding ifs to select the lines you want...
print line, content.strip() # do your thing
will output (just to illustrate because I didn't specify any conditions in code above):
0 -122.1058
1 -123.1050
2 -125.10584323
3
or in alternative select your specific line with a condition in a listcomp
by using this code instead:
with open("file1.txt", "rt") as f:
lines = f.readlines()
result = [ content.strip() for line, content in enumerate(lines)
if line == len(lines) - 2] # creates a list with
# only the last line
print result[0]
that will output:
-125.10584323
Try the following:
print [line.split()[-1].strip() for line in lines if '!' in line][-1]
I see a better way by creating an empty list and appending the value comes from the condition and then choose the Index of your choice and list the output , this is good in the sense that it can be used for any line of your intrest which you require to pick.
Lets Suppose i wana the last second line then it can be resuable putting the value in the print section print(lst[-2]) , which will print the last index of second line..
#!/usr/bin/python
file = open('file1.txt', 'r')
lst = list()
for line in file:
if "!" in line:
x= line.split()
lst.append(x[-1])
print(lst[-1])
Related
I have this text file and let's say it contains 10 lines.
Bye
Hi
2
3
4
5
Hi
Bye
7
Hi
Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said.
My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)
text_file = open(filename)
for i, line in enumerate(text_file):
if i == 0:
var_Line1 = line
if i = 1:
var_Line2 = line
if i > 1:
if line == var_Line2:
del line
text_file.close()
It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well
You could use dict.fromkeys to remove duplicates and preserve order efficiently:
with open(filename, "r") as f:
lines = dict.fromkeys(f.readlines())
with open(filename, "w") as f:
f.writelines(lines)
Idea from Raymond Hettinger
Using a set & some basic filtering logic:
with open('test.txt') as f:
seen = set() # keep track of the lines already seen
deduped = []
for line in f:
line = line.rstrip()
if line not in seen: # if not seen already, write the lines to result
deduped.append(line)
seen.add(line)
# re-write the file with the de-duplicated lines
with open('test.txt', 'w') as f:
f.writelines([l + '\n' for l in deduped])
I am trying to parse huge 50K lined file in which I have to remove any line that starts with the word present in a predefined list.
Currently I have tried the below and the output file (DB12_NEW) is not working as desired -
rem = ['remove', 'remove1', 'remove2'....., 'removen']
inputFile = open(r"C:\file", "r")
outputFile = open(r"C:\file_12", "w")
lines = inputFile.readlines()
inputFile.close()
for line in lines:
for i in rem:
if line.startswith(i):
outputFile.write('\n')
else:
outputFile.write(line)
I am getting the same file as output that I initially put in... the script is not removing the lines that start with any of the strings present in the list.
Can you please help understand how to achieve this?
Use a tuple instead of list for str.startswith.
# rem = ['remove', 'rem-ove', 'rem ove']
rem = ('remove', 'rem-ove', 'rem ove')
with open('DB12', 'r') as inputFile, open('DB12_NEW', 'w') as outputFile:
for line in inputFile.readlines():
if not line.startswith(rem):
outputFile.writelines(line)
Currently you check if the line starts with the a word from the remove list one at a time. For example:
If the line starts with "rem ABCDF..." and in your loop you check if the line starts with 'remove' then your if-statement returns false and writes the line in your outputfile.
You could try something like this:
remove = ['remove', 'rem-ove', 'rem', 'rem ove' ...... 'n']
inputFile = open(r"C:\DB12", "r")
outputFile = open(r"C:\DB12_NEW", "w")
for line in inputFile.splitlines():
if not any(line.startswith(i) for i in remove):
outputFile.write(line)
The any keyword only returns False if all elements are also False.
Sometimes this could be caused by leading/trailing spaces.
Try stripping off empty spaces using strip() and check.
rem = [x.strip() for x in rem]
lines = [line.strip() for line in lines]
I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part.
The first line of each part starts with #, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.
Small example:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3#BFADGD55F?#GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/
I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A
I wrote the following code:
infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
if line_number ==2 or line_number ==4:
new_line.append(infile[line_number])
with open('out_file.fastq', 'w') as f:
for item in new_line:
f.write("%s\n" % item)
but it does not return what I want. How to fix it to get the expected output?
This code will achieve what you want -
from itertools import islice
with open('bio.txt', 'r') as infile:
while True:
lines_gen = list(islice(infile, 4))
if not lines_gen:
break
a,b,c,d = lines_gen
b = b[0:65]+'\n'
d = d[0:65]+'\n'
with open('mod_bio.txt', 'a+') as f:
f.write(a+b+c+d)
How it works?
We first make a generator that gives 4 lines at a time as you mention.
Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.
I think some itertools.cycle could be nice here:
import itertools
with open("transformed.file.fastq", "w+") as output_file:
with open("file.fastq", "r") as input_file:
for i in itertools.cycle((1,2,3,4)):
line = input_file.readline().strip()
if not line:
break
if i in (2,4):
line = line[:65]
output_file.write("{}\n".format(line))
readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.
By modifying your code, try this
infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
if i == 1 or i == 3:
new_lines[i] = new_lines[i][:65]
with open('out_file.fastq', 'w') as f:
for item in new_lines:
f.write("%s" % item)
I am trying to print next 3 lines after a match
for example input is :
Testing
Result
test1 : 12345
test2 : 23453
test3 : 2345454
so i am trying to search "Result" string in file and print next 3 lines from it:
Output will be :-
test1 : 12345
test2 : 23453
test3 : 2345454
my code is :
with open(filename, 'r+') as f:
for line in f:
print line
if "Benchmark Results" in f:
print f
print next(f)
its only giving me the output :
testing
how do i get my desired output, help please
First you need to check that the text is in the line (not in the fileobj f), and you can utilise islice to take the next 3 lines from f and print them, eg:
from itertools import islice
with open(filename) as f:
for line in f:
if 'Result' in line:
print(''.join(islice(f, 3)))
The loop will continue from the line after the three printed. If you don't want that - put a break inside the if.
I would suggest opening the file and spliting its content in lines, assigning the outcome to a variable so you can manipulate the data more comfortably:
file = open("test.txt").read().splitlines()
Then you can just check which line contains the string "Result", and print the three following lines:
for index, line in enumerate(file):
if "Result" in line:
print(file[index+1:index+4])
You are testing (and printing) "f" instead of "line". Be careful about that. 'f' is the file pointer, line has your data.
with open(filename, 'r+') as f:
line = f.readline()
while(line):
if "Benchmark Results" in line:
# Current line matches, print next 3 lines
print(f.readline(),end="")
print(f.readline(),end="")
print(f.readline(),end="")
line = f.readline()
It is waiting for the first "Result" in the file and then prints the rest of the input:
import re, sys
bool = False
with open("input.txt", 'r+') as f:
for line in f:
if bool == True:
sys.stdout.write(line)
if re.search("Result",line): #if it should match whole line, than it is also possible if "Result\n" == line:
bool = True
If you want end after first 3 prints, you may add variable cnt = 0 and change this part of code (for example this way):
if bool == True:
sys.stdout.write(line)
cnt = cnt+1
if cnt == 3:
break
with open('data', 'r') as f:
lines = [ line.strip() for line in f]
# get "Result" index
ind = lines.index("Result")
# get slice, add 4 since upper bound is non inclusive
li = lines[ind:ind+4]
print(li)
['Result', 'test1 : 12345', 'test2 : 23453', 'test3 : 2345454']
or as exercise with regex:
import re
with open('data', 'r') as f:
text = f.read()
# regex assumes that data as shown, ie, no blank lines between 'Result'
# and the last needed line.
mo = re.search(r'Result(.*?\n){4}', text, re.MULTILINE|re.DOTALL)
print(mo.group(0))
Result
test1 : 12345
test2 : 23453
test3 : 2345454
I was wondering if it was possible to make python disregard the first 4 lines of my text file. Like if I had a text file which looked like this:
aaa
aaa
aaa
aaa
123412
1232134
Can I make it so python starts working from the numbers?
Use next and a loop:
with open("/path/to/myfile.txt") as myfile:
for _ in range(4): # Use xrange here if you are on Python 2.x
next(myfile)
for line in myfile:
print(line) # Just to demonstrate
Because file objects are iterators in Python, a line will be skipped each time you do next(myfile).
This should do the trick
f = open('file.txt')
for index,line in enumerate(f):
if index>3:
print(line)
assuming you know the number of lines to discard, you can use this method:
for i, line in enumerate(open('myfile')):
if i < number_of_lines_to_discard:
continue
# do your stuff here
or if you just want to disregard non numeric lines:
for line in open('myfile'):
if not re.match('^[0-9]+$\n', line):
continue
# do your stuff here
A more robust solution, not relying on the exact number of lines:
with open(filename) as f:
for line in f:
try:
line = int(line)
except ValueError:
continue
# process line