I have this text file:
>P1;3RVYA
sequence:: : : : :::-1.00:-1.00
MYLRITNIVESSFFTKFIIYLIVLNGITMGLETSKTFMQSFGVYTTLFNQIVITIFTIEIILR-IYVHRISFFKD
PWSLFDFFVVAISLVPTSS---GFEILRVLRVLRLFRLVTAVPQMRKIVSALISVIPGMLSVIALMTLFFYIFAI
MATQLFGERFP---------------------------------------------EWFGTLGESFYTLFQVMTL
ESWSMGIVRP-LMEVYPYAWVFFIPFIFVVTFVMINLVVAICVDAM*
>P1;Dominio1
sequence:: : : : :::-1.00:-1.00
EWPPFEYMILATIIANCIVLALEQH---LPDDDKTPMSERLDDTEPYFIGIFCFEAGIKIIALGFAFHKGSYLRN
GWNVMDFVVVLTGILATVGTEFDLRTLRAVRVLRPLKLVSGIPSLQVVLKSIMKAMIPLLQIGLLLFFAILIFAI
IGLEFYMGKFHTTCFEEGTDDIQGESPAPCGTEEPARTCPNGTKCQPYWEGPNNGITQFDNILFAVLTVFQCITM
EGWTDLLYNSNDASGNTWNWLYFIPLIIIGSFFMLNLVLGVLSGEF*
I need to replace the word "sequence", it is under of the word 3RVYA (only that)
I have this command:
a="3RVYA"
for line in file('%s'%I):
if a in line:
print line
But just printed "3RV", I need only print the next line, that have the word "sequence", I need it to replace "sequence" for "structure".
I'm beginner in python, so...Can somebody help me please?
Thanks so much
You appear to be trying to modify a text file in-place by selectively changing some parts of it with other strings of different length. This isn't really possible, because of the way filesystems work nowadays (that is, byte-by-byte). However, the stdlib module fileinput simulates it well (behind the curtains it writes a new file, then at end end atomically replaces the old file with the new one). So...:
import fileinput
replacing = False
for line in fileinput.input('thefile.txt', inplace=True):
if replacing and 'sequence' in line:
line = line.replace('sequence', 'structure')
replacing = False
elif '3RVYA' in line:
replacing = True
print line,
This is Python 2; in Python 3, the last line becomes, instead:
print(line, end='')
Related
I need do to print all that lines in which python appears using find() command in a file. Using python.
these are my file content
python is fun
python java
sai python
sachin
ganesha
currently its printing first 2 lines only
What I have tried:
fhand=open('demo.txt')
for line in fhand:
line=line.rstrip()
if(line.find('python')):
continue
print(line)
find returns the position of the string or -1 if not found, so:
line.find('python')
will always return a non-zero unless the line starts by python, and you'll enter the if. So every line is skipped except the ones that start with python (your first 2 lines match)
You need:
if line.find('python') == -1: # != -1: the string is there
# python is not in the line
continue
but it's much better to just write:
if "python" not in line:
# python is not in the line
continue
since you don't need to know where is python located in the line.
Also: perform the rstrip() operation only if you need to print the line. Else it just wastes CPU since the result of find doesn't depend on it.
So to sum it up here's how I would write it:
with open('demo.txt') as fhand:
for line in fhand:
if "python" in line:
print(line.rstrip())
I have the following code.
import fileinput
map_dict = {'*':'999999999', '**':'999999999'}
for line in fileinput.FileInput("test.txt",inplace=1):
for old, new in map_dict.iteritems():
line = line.replace(old, new)
sys.stdout.write(line)
I have a txt file
1\tab*
*1\tab**
Then running the python code generates
1\tab999999999
9999999991\tab999999999
However, I want to replace "cell" (sorry if this is not standard terminology in python. I am using the terminology of Excel) not string.
The second cell is
*
So I want to replace it.
The third cell is
1*
This is not *. So I don't want to replace it.
My desired output is
1\tab999999999
*1\tab999999999
How should I make this? The user will tell this program which delimiter I am using. But the program should replace only the cell not string..
And also, how to have a separate output txt rather than overwriting the input?
Open a file for writing, and write to it.
Since you want to replace the exact complete values (for example not touch 1*), do not use replace. However, to analyze each value split your lines according to the tab character ('\t').
You must also remove end of line characters (as they may prevent matching last cells in a row).
Which gives
import fileinput
MAPS = (('*','999999999'),('**','999999999'))
with open('output.txt','w') as out_file:
for line in open("test.txt",'r'):
out_list = []
for inp_cell in line.rstrip('\n').split('\t'):
out_cell = inp_cell
for old, new in MAPS:
if out_cell == old:
out_cell = new
out_list.append(out_cell)
out_file.write( "\t".join(out_list) + "\n" )
There are more condensed/compact/optimized ways to do it, but I detailed each step on purpose, so that you may adapt to your needs (I was not sure this is exactly what you ask for).
the csv module can help:
#!python3
import csv
map_dict = {'*':'999999999','**':'999999999'}
with open('test.txt',newline='') as inf, open('test2.txt','w',newline='') as outf:
w = csv.writer(outf,delimiter='\t')
for line in csv.reader(inf,delimiter='\t'):
line = [map_dict[item] if item in map_dict else item for item in line]
w.writerow(line)
Notes:
with will automatically close files.
csv.reader parses and splits lines on a delimiter.
A list comprehension translates line items in the dictionary into a new line.
csv.writer writes the line back out.
This code, borrowed from another place on stackoverflow, removes all of the places that the csv has "None" written in it. However, it also adds an extra line to the csv. How can I change this code to remove that extra line? I think the problem is caused by inplace, but when I take inplace away the file is no longer altered by running the code.
def cleanOutputFile(filename):
for line in fileinput.FileInput(filename,inplace=1):
line = line.replace('None',"")
print line
Thanks!
If you want to replace all the None's:
with open(filename) as f:
lines = f.read().replace("None","")
with open(filename,"w") as f1:
f1.write(lines)
Using rstrip with fileinput should also work:
import fileinput
for line in fileinput.FileInput(fileinput,inplace=1):
print line.replace('None',"").rstrip() # remove newline character to avoid adding extra lines in the output
The problem here has nothing to do with fileinput, or with the replace.
Lines read from a file always end in a newline.
print adds a newline, even if the thing you're printing already ends with a newline.
You can see this without even a file involved:
>>> a = 'abc'
>>> print a
abc
>>> a = 'abc\n'
>>> print a
abc
>>>
The solution is any of the following:
rstrip the newlines off the input: print line.rstrip('\n') (or do the strip earlier in your processing)
Use the "magic comma" to prevent print from adding the newline: print line,
Use Python 3-style print with from __future__ import print_function, so you can use the more flexible keyword arguments: print(line, end='')
Use sys.stdout.write instead of print.
Completely reorganize your code so you're no longer writing to stdout at all, but instead writing directly to a temporary file, or reading the whole file into memory and then writing it back out, etc.
Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.
I need to get a specific line number from a file that I am passing into a python program I wrote. I know that the line I want will be line 5, so is there a way I can just grab line 5, and not have to iterate through the file?
If you know how many bytes you have before the line you're interested in, you could seek to that point and read out a line. Otherwise, a "line" is not a first class construct (it's just a list of characters terminated by a character you're assigning a special meaning to - a newline). To find these newlines, you have to read the file in.
Practically speaking, you could use the readline method to read off 5 lines and then read your line.
Why are you trying to do this?
you can to use linecache
import linecache
get = linecache.getline
print(get(path_of_file, number_of_line))
I think following should do :
line_number=4
# Avoid reading the whole file
f = open('path/to/my/file','r')
count=1
for i in f.readline():
if count==line_number:
print i
break
count+=1
# By reading the whole file
f = open('path/to/my/file','r')
lines = f.read().splitlines()
print lines[line_number-1] # Index starts from 0
This should give you the 4th line in the file.