Print line numbers for lines containing a specific integer - python

I have tried this
import itertools
import numpy as np
with open('base.txt','r') as f:
lst = map(int, itertools.imap(float, f))
num=1200
for line in lst:
if num == line:
print (line)
Just prints 1200...
I thought of re than
import re
import itertools
with open('base.txt','r') as f:
lst = map(int, itertools.imap(float, f))
p = re.compile(r'(\d+)')
num=1200
for line in lst:
if num in p.findall(line):
print line
But I got
File "a7.py", line 12, in <module>
if num in p.findall(line) :
TypeError: expected string or buffer
What I want is the all line numbers that contain 1200.File has numerical inputs one by line,I have checked this.

Staying as close to your proposed solution as possible, this should print out the line numbers for all lines containing your chosen num.
import itertools
with open('base.txt','r') as f:
lst = map(int, itertools.imap(float, f))
num=1200
line_number = 1
for line in lst:
if num == line:
print (line_number)
line_number += 1
Edit
However, your code just truncates the floats in your file - it will not round them correctly. 1200.9 becomes 1200 instead of 1201, for instance.
If this isn't a problem in your case, that is fine. However, in general it would be better to change your
lst = map(int, itertools.imap(float, f))
function call to something like
lst = map(int,map(round, itertools.imap(float, f)))

You can use enumerate():
with open('base.txt', 'r') as f:
for i, line in enumerate(f):
if num == int(line):
print i

If you just want to print the line numbers, then you need to keep track of what line you are on.
Also, this code doesn't read the entire contents of the file into memory at once. (Useful for large files).
num = 1200
line_num = 0
with open('base.txt','r') as f:
line_num += 1
for line in f:
if int(line) == num:
print line_num

Related

How do I properly create new files and apply multiple lines of input to it?

I am trying to create 3 different files that I am then attempting to print ten random letters to print out 3 times in the file.
Here is my code:
import string
import random
for i in range(3):
with open('data%i.txt' % i, 'w+') as f:
line = 0
while line < 3:
for j in range(10):
myStr = random.choice(string.ascii_lowercase)
f.write(myStr)
line = line + 1
and output:
dfzanwalkccdipukrbwsrzrbheceqi
I've tried to print them with including the newline character in the file write, but that instead prints 30 lines with a single letter per line. Any help is appreciated!
Write the newline in the line loop, not the character loop.
You can also change while line < 3: to for line in range(3):
for i in range(3):
with open('data%i.txt' % i, 'w+') as f:
for line in range(3):
for j in range(10):
myStr = random.choice(string.ascii_lowercase)
f.write(myStr)
f.write("\n")
You can also use random.choices() to get 10 random characters at once, instead of looping.
for i in range(3):
with open('data%i.txt' % i, 'w+') as f:
for line in range(3):
myStr = "".join(random.choices(string.ascii_lowercase, k=10))
f.write(myStr + "\n")

For loop under With statement not working correctly

I'm trying to get some sample lines from a file, and this is my approach
import gzip, random
random_set = []
with gzip.open('/home/qsnake/Downloads/bigfile.txt.gz') as f:
lc = sum(1 for x in f)
random_set += random.sample(xrange(lc), 3)
for i, x in enumerate(f):
if i in random_set:
print "First loop", str(i)
break
with gzip.open('/home/qsnake/Downloads/biggfile.txt.gz') as f:
for i, x in enumerate(f):
if i in random_set:
print "Second loop", str(i)
break
Here is the result
Second loop 4
I don't know why the for loop in the first With statement not working, if I remove
lc = sum(1 for x in f)
It works again.
Many thanks!!!
You have already read the file once when you have this line in the code:
lc = sum(1 for x in f)
Now, when you try to enumerate on the file again, the pointer is at the end of the file and hence can not read anything.
If you want to read the file again from start in the same with condition, you can set the pointer to 0 before enumerate:
f.seek(0)

Read a file from a specic line and N lines at a time

I am going to read a file starting from a specific line and read N number of lines at a time. So far I read N number of line at a time like this:
from itertools import islice
n = 10
with open(fname, 'r') as f:
while True:
next_n_lines = list(islice(f, n))
for line in next_n_lines:
print line.rstrip()
if not next_n_lines:
break
Any help on start reading it from a specific line number.
There is a simple solution using itertools.islice:
N = 100 # starting line number
n = 10 # size of a chunk
with open(fname) as f:
f = islice(f, N, None) # creates an iterator that starts after N lines
while True:
next_n_lines = list(islice(f, n))
for line in next_n_lines:
print line.rstrip()
if not next_n_lines:
break
Can you use fileinput as shown below?
startNo = 1
N = 10
for line in fileinput.input("fileName"):
if fileinput.lineno() > startNo + N:
break
if fileinput.lineno() >= startNo:
print fileinput.lineno(),line

Access each index result from enumerate() - Python

I have data, that looks like this:
Name Nm1 * *
Ind1 AACTCAGCTCACG
Ind2 GTCATCGCTACGA
Ind3 CTTCAAACTGACT
I need to grab the letter from each position marked by an asterix in the "Name"-line and print this, along with the index of the asterix
So the result would be
Ind1, 12, T
Ind2, 12, A
Ind3, 12, C
Ind1, 17, T
Ind2, 17, T
Ind3, 17, T
I'm trying to use enumerate() to retrieve the positions of the asterix's, and then my thought was, that I could use these indexes to grab the letters.
import sys
import csv
input = open(sys.argv[1], 'r')
Output = open(sys.argv[1]+"_processed", 'w')
indlist = (["Individual_1,", "Individual_2,", "Individual_3,"])
with (input) as searchfile:
for line in searchfile:
if '*' in line:
LocusID = line[2:13]
LocusIDstr = LocusID.strip()
hit = line
for i, x in enumerate(hit):
if x=='*':
position = i
print position
for item in indlist:
Output.write("%s%s%s\n" % (item, LocusIDstr, position))
Output.close()
If the enumerate()outputs e.g.
12
17
How do I access each index seperately?
Also, when I print the position, I get the numbers I want. When I write to the file, however, only the last position is written. Why is this?
----------------EDIT-----------------
After advice below, I have edited split up my code to make it a bit more simple (for me) to understand.
import sys
import csv
input = open(sys.argv[1], 'r')
Output = open(sys.argv[1]+"_FGT_Data", 'w')
indlist = (["Individual_1,", "Individual_2,", "Individual_3,"])
with (input) as searchfile:
for line in searchfile:
if '*' in line:
LocusID = line[2:13]
LocusIDstr = LocusID.strip()
print LocusIDstr
hit = line
for i, x in enumerate(hit):
if x=='*':
position = i
#print position
input = open(sys.argv[1], 'r')
with (input) as searchfile:
for line in searchfile:
if line [0] == ">":
print line[position], position
with (Output) as writefile:
for item in indlist:
writefile.write("%s%s%s\n" % (item, LocusIDstr, position))
Output.close()
I still do not have a solution for how to acces each of the indexes, though.
Edit
changed to work with the file you gave me in your comment. if you have made this file yourself, consider working with columns next time.
import sys
read_file = sys.argv[1]
write_file = "%s_processed.%s"%(sys.argv[1].split('.')[0],sys.argv[1].split('.')[1])
indexes = []
lines_to_write = []
with open(read_file,'r') as getindex:
first_line = getindex.readline()
for i, x in enumerate(first_line):
if x == '*':
indexes.append(i-11)
with open(read_file,'r') as getsnps:
for line in getsnps:
if line.startswith(">"):
sequence = line.split(" ")[1]
for z in indexes:
string_to_append = "%s\t%s\t%s"%(line.split(" ")[0],z+1,sequence[z])
lines_to_write.append(string_to_append)
with open(write_file,"w") as write_file:
write_file.write("\n".join(lines_to_write))

Extract from current position until end of file

I want to pull all data from a text file from a specified line number until the end of a file. This is how I've tried:
def extract_values(f):
line_offset = []
offset = 0
last_line_of_heading = False
if not last_line_of_heading:
for line in f:
line_offset.append(offset)
offset += len(line)
if whatever_condition:
last_line_of_heading = True
f.seek(0)
# non-functioning pseudocode follows
data = f[offset:] # read from current offset to end of file into this variable
There is actually a blank line between the header and the data I want, so ideally I could skip this also.
Do you know the line number in advance? If so,
def extract_values(f):
line_number = # something
data = f.readlines()[line_number:]
If not, and you need to determine the line number based on the content of the file itself,
def extract_values(f):
lines = f.readlines()
for line_number, line in enumerate(lines):
if some_condition(line):
data = lines[line_number:]
break
This will not be ideal if your files are enormous (since the lines of the file are loaded into memory); in that case, you might want to do it in two passes, only storing the file data on the second pass.
Your if clause is at the wrong position:
for line in f:
if not last_line_of_heading:
Consider this code:
def extract_values(f):
rows = []
last_line_of_heading = False
for line in f:
if last_line_of_heading:
rows.append(line)
elif whatever_condition:
last_line_of_heading = True
# if you want a string instead of an array of lines:
data = "\n".join(rows)
you can use enumerate:
f=open('your_file')
for i,x in enumerate(f):
if i >= your_line:
#do your stuff
here i will store line number starting from 0 and x will contain the line
using list comprehension
[ x for i,x in enumerate(f) if i >= your_line ]
will give you list of lines after specified line
using dictionary comprehension
{ i:x for i,x in enumerate(f) if i >= your_line }
this will give you line number as key and line as value, from specified line number.
Try this small python program, LastLines.py
import sys
def main():
firstLine = int(sys.argv[1])
lines = sys.stdin.read().splitlines()[firstLine:]
for curLine in lines:
print curLine
if __name__ == "__main__":
main()
Example input, test1.txt:
a
b
c
d
Example usage:
python LastLines.py 2 < test1.txt
Example output:
c
d
This program assumes that the first line in a file is the 0th line.

Categories

Resources