I want to add some letters to the beginning and end of each line using python.
I found various methods of doing this, however, whichever method I use the letters I want to add to then end are always added to the beginning.
input = open("input_file",'r')
output = open("output_file",'w')
for line in input:
newline = "A" + line + "B"
output.write(newline)
input.close()
output.close()
I have used varios methods I found here. With each one of them both letters are added to the front.
inserting characters at the start and end of a string
''.join(('L','yourstring','LL'))
or
yourstring = "L%sLL" % yourstring
or
yourstring = "L{0}LL".format(yourstring)
I'm clearly missing something here. What can I do?
When reading lines from a file, python leaves the \n on the end. You could .rstrip it off however.
yourstring = 'L{0}LL\n'.format(yourstring.rstrip('\n'))
Related
I have a complete_list_of_records which has a length of 550
this list would look something like this:
Apples
Pears
Bananas
The issue is that when i use:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i)
the outcome of the file is 393 long and the structure someplaces looks like so
Apples
PearsBananas
Pineapples
I have tried with "w" instead of "a" append and manually inserted "\n" for each item in the list but this just creates blank spaces on every second row and still som rows have the same issue with dual lines in one.
Anyone who has encountered something similar?
From the comments seen so far, I think there are strings in the source list that contain newline characters in positions other than at the end. Also, it seems that some strings end with newline character(s) but not all.
I suggest replacing embedded newlines with some other character - e.g., underscore.
Therefore I suggest this:
with open("recordedlines.txt", "w") as recorded_lines:
for line in complete_list_of_records:
line = line.rstrip() # remove trailing whitespace
line = line.replace('\n', '_') # replace any embedded newlines with underscore
print(line, file=recorded_lines) # print function will add a newline
You could simply strip all whitespaces off in any case and then insert a newline per hand like so:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i.strip() + "\n")
you need to use
file.writelines(listOfRecords)
but the list values must have '\n'
f = open("demofile3.txt", "a")
li = ["See you soon!", "Over and out."]
li = [i+'\n' for i in li]
f.writelines(li)
f.close()
#open and read the file after the appending:
f = open("demofile3.txt", "r")
print(f.read())
output will be
See you soon!
Over and out.
you can also use for loop with write() having '\n' at each iteration
[Soln][1]
complete_list_of_records =['1.Apples','2.Pears','3.Bananas','4.Pineapples']
with open("recordedlines.txt", "w") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i+"\n")
I think it should work.
Make sure that, you write as a string.
Issue
Hello all,
in a text file i need to replace an unknown string by another,
first to find it i need to find the line before it 'name Blur2'
as there is many line beginnig by 'xpos':
name Blur2
xpos 12279 # 12279 is the end of line to find and put in a variable
Code to get unknow string:
#string to find:
keyString = ' name Blur2'
f2 = open("output_file.txt", 'w+')
with open("input_file.txt", 'r+') as f1:
lines = f1.readlines()
for i in range(0, len(lines)):
line = lines[i]
if keyString in line:
nextLine = lines[i + 1]
print ' nextLine: ',nextLine #result: nextLine: xpos 12279
number = nextLine.rsplit(' xpos ', 1)[1]
print ' number: ',number #result: number: 12279
#convert string to float:
newString = '{0}\n'.format(int(number)+ 10)
print ' newString: ',newString #result: newString: 12289
f2.write("".join([nextLine.replace(number, str(newString))])) #this line isn't working
f1.close()
f2.close()
so, i had completely change of method but the last line: f2.write... isn't working as expected, did someone know why?
thanks again for your help :)
regex seems like it would help, https://regex101.com/.
Regex searches a string with a language that defines a pattern. I listed the most important ones for learning the pattern itself, but it is sometimes a better alternative than python's native string manipulation.
You first describe the pattern that you will be using, then actually compile the pattern. For the string check, I defined it as a raw string using r''. This means I don't have to escape a \ within a string (example: printing \ would be print('\') instead of print(r'').
There are a couple of parts to this regex.
\s for whitespace(characters like space, ' ')
\n or \r for newline and carriage return, [^] defines which characters not to look for (so [^\n\r] searches for anything not containing a newline or carriage return), the * indicates it can have 0 or more of the characters indicated. $ in the regex string accounts for everything before the line end.
so the pattern searches for 'name Blur2' specifically with any number of whitespaces afterwards and a newline. The parentheses allow this to be group 1 (explained later). The second part '([^\n\r]*$)' captures any number of characters that aren't a newline or carriage return up until the end of that line.
Groups account for the parentheses, so '(name blue\n)' is group 1, and the line you want replaced '([^\n\r]*$)' is group 2. checkre.sub should replace the whole text with group 1
and the new string, so it replaces the first line with the first line, and replaces the second line with your new string
import re
check = r'(name Blur2\s*\n)([^\n\r]*$)'
checkre = re.compile(check, re.MULTILINE)
checkre.sub(\g<1>+newstring, file)
You need to set re.MULTILINE since you're checking multiple lines, if the '\n' isn't matched, you could use [\n\r\z] which gets one of either end of the line, carriage return, or absolute end of the string.
rioV8's comment works, but you could also use '.{5}$', which accounts for any 5 characters before the end of the line. It could be helpful within a re
It should be possible to get the old string with
oldstring = checkre.search(filestring).group(1)
I have not played with span yet, but
stringmatch = checkre.search(filestring)
oldstring = stringmatch.group(2)
newfilestring = filestring[0:stringmatch.span[0]] + stringmatch.group(1) + newstring + filestring[stringmatch.span[1]]:]
should be pretty close to what you're looking for, although the splice may not be exactly correct.
The initial program was pretty close. I edited a little bit of it to tweak a few things that were wrong.
You weren't initially writing the lines that needed to be replaced, I'm not sure why you needed to join things. Just replacing the number directly seemed to work. Python doesn't allow changes to the i in a for loop, and you need to skip one line so it isn't written to the file, so I changed it to a while loop. Anyway ask any questions you have, but the below code seems to work.
#string to find:
keyString = ' name Blur2'
f2 = open("output_file.txt", 'w+')
with open("test.txt", 'r+') as f1:
lines = f1.readlines()
i=0
while i <len(lines):
line = lines[i]
if keyString in line:
f2.write(line)
nextLine = lines[i + 1]
#end of necessary 'i' calls, increment i to avoid reprinting writing the replaced line string
i+=1
print (' nextLine: ',nextLine )#result: nextLine: xpos 12279
number = nextLine.rsplit(' xpos ', 1)[1]
#as was said in a comment, this coula also be number = nextLine[-5:]
print (' number: ',number )#result: number: 12279
#convert string to float:
newString = '{0}\n'.format(int(number)+ 10)
print (' newString: ',newString) #result: newString: 12289
f2.write(nextLine.replace(number, str(newString))) #this line isn't working
else:
f2.write(line)
i+=1
f1.close()
f2.close()
I am a biologist and need to make a quick script to process some files.
The file format is fasta:
>line1
ACCGAGCTACTAGXXXXX
>line2
ACGTAX
et cetera.
I want to remove all X characters and quickly put toghether this script:
print """Input file must be named FILE.fasta"""
fasta_file = raw_input('Input file name:') # Input fasta file
char = raw_input('Which sequence should be stripped?:')
OutFileName = fasta_file.strip('.fasta') + '_stripped.fasta'
OutFile = open(OutFileName, 'w')
WriteOutFile = True
data = open(fasta_file, "r")
for line in data:
if line.startswith('>'):
OutPut = line
else:
OutPut = line.strip(char)
print OutPut
OutFile.write(OutPut)
print(char)
OutFile.close()
quit()
It does not work and I can't figure out why. any help?
P.S. sorry for the terrible code.
The other answers specified better alternatives. But in your case, [Python 3.Docs]: Built-in Types - str.strip([chars]) didn't work because each line in a file ends with the EOLN terminator, so X is not actually at the end of the string.
The option that requires minimum of code changes, is to modify the 3rd line from:
char = raw_input('Which sequence should be stripped?:')
to:
char = raw_input('Which sequence should be stripped?:') + "\n"
Beware: the line fasta_file.strip('.fasta') might not do what you think it does. Here, it would be recommended to use:
fasta_file.replace('.fasta', '_stripped.fasta')
EDIT0:
I think that you need to add the EOLN back when writing to the output file, so you also need to replace this line:
OutPut = line.strip(char)
by:
OutPut = line.strip(char) + "\n"
Use line.replace(char,'') instead line.strip(char)
Strip function removes characters only from sides https://docs.python.org/2/library/string.html#string.strip
You could do this using regex:
import re
pattern = re.compile("(\w[^X]+)") # This groups everything but X
stripped = pattern.match(line).group()
For your case you can do something similar in the 'else' section of your code and replace the 'X' in "(\w[^X]+)" by your 'char' variable:
pattern = re.compile("(\w[^" + char + "]+)")
Salutations, I am trying to write a function that prints data from a text file line by line. The output needs to have the number of the line followed by a colon and a space. I came up with the following code;
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number)+": "+line, end=' ')
line_number += 1
The issue is when I run this function using test text files I created, the first line is not on the same indentation level as the rest of the lines in the output, ie. the outputs look kind of like
1: 9874234,12.5,23.0,50.0
2: 7840231,70,60,85.4
3: 3845913,55.5,60.5,80.0
4: 3849511,20,60,50
Where am I going wrong? Thanks
Replace the value of end argument with empty string instead of space. As end argument is a space, it's printing a space after every line. So latter lines have a space at the beginning of the line.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line, end='')
line_number += 1
Another way you can do this, is strip the new lines and print without passing any value to end argument. This will remove the \n it has at the end of the line and a new line will be printed as end="\n" by default.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line.strip("\n"))
line_number += 1
This has to do with your print statement.
print(str(line_number)+": "+line, end=' ')
You probably saw that when printing your lines there was an extra line between them and then you tried to work around this by using end=' '.
If you want to remove the 'empty' lines you should use line.strip(). This removes them.
Use this:
print(str(line_number)+": "+line.strip())
strip can also take an argument. This is from the documentation:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:
Whats up with that?
The lines in your file are not separated into different lines by nothing. On linux a newline is represented by \n. Normal editors convert these by pushing the text down into a new line.
When reading a file Python separates lines on exactly these \n but doesn't throw them away. When printing they will be interpreted again and combined with the newline a print adds there will be one newline 'too much'.
The end parameter in your print statement simply changes what print will use after printing a line. Default is \n.
Check what it does when you use end=" !":
1: aaa
!2: bbb
!3: ccc
You can see the \n after 'aaa' causing a newline (which is part of the string) and after that print adds the contents of end. So it adds a !. The next line is printed in the same line because there is no other newline that would cause a line break before printing it.
You specified end argument as a space. So after first line each has this extra space.
line that your read from file looks somehting like this:
'9874234,12.5,23.0,50.0\n'
Look at the ending. Line translation happens is due to original line.
So to get what you want you just need to change end argument of print to empty string( not space)
Moreover, I advise you to change the implementation of the function and use enumerate for line numbering.
def print_numbered_lines(filename):
data = open(filename)
for i, line in enumerate(data):
print(str(i+1)+": "+line, end='')
How can I reduce multiple blank lines in a text file to a single line at each occurrence?
I have read the entire file into a string, because I want to do some replacement across line endings.
with open(sourceFileName, 'rt') as sourceFile:
sourceFileContents = sourceFile.read()
This doesn't seem to work
while '\n\n\n' in sourceFileContents:
sourceFileContents = sourceFileContents.replace('\n\n\n', '\n\n')
and nor does this
sourceFileContents = re.sub('\n\n\n+', '\n\n', sourceFileContents)
It's easy enough to strip them all, but I want to reduce multiple blank lines to a single one, each time I encounter them.
I feel that I'm close, but just can't get it to work.
This is a reach, but perhaps some of the lines aren't completely blank (i.e. they have only whitespace characters that give the appearance of blankness). You could try removing all possible whitespace between newlines.
re.sub(r'(\n\s*)+\n+', '\n\n', sourceFileContents)
Edit: realized the second '+' was superfluous, as the \s* will catch newlines between the first and last. We just want to make sure the last character is definitely a newline so we don't remove leading whitespace from a line with other content.
re.sub(r'(\n\s*)+\n', '\n\n', sourceFileContents)
Edit 2
re.sub(r'\n\s*\n', '\n\n', sourceFileContents)
Should be an even simpler solution. We really just want to a catch any possible space (which includes intermediate newlines) between our two anchor newlines that will make the single blank line and collapse it down to just the two newlines.
Your code works for me. Maybe there is a chance of carriage return \r would be present.
re.sub(r'[\r\n][\r\n]{2,}', '\n\n', sourceFileContents)
You can use just str methods split and join:
text = "some text\n\n\n\nanother line\n\n"
print("\n".join(item for item in text.split('\n') if item))
Very simple approach using re module
import re
text = 'Abc\n\n\ndef\nGhijk\n\nLmnop'
text = re.sub('[\n]+', '\n', text) # Replacing one or more consecutive newlines with single \n
Result:
'Abc\ndef\nGhijk\nLmnop'
If the lines are completely empty, you can use regex positive lookahead to replace them with single lines:
sourceFileContents = re.sub(r'\n+(?=\n)', '\n', sourceFileContents)
If you replace your read statement with the following, then you don't have to worry about whitespace or carriage returns:
with open(sourceFileName, 'rt') as sourceFile:
sourceFileContents = ''.join([l.rstrip() + '\n' for l in sourceFile])
After doing this, both of your methods you tried in the OP work.
OR
Just write it out in a simple loop.
with open(sourceFileName, 'rt') as sourceFile:
lines = ['']
for line in (l.rstrip() for l in sourceFile):
if line != '' or lines[-1] != '\n':
lines.append(line + '\n')
sourceFileContents = "".join(lines)
I guess another option which is longer, but maybe prettier?
with open(sourceFileName, 'rt') as sourceFile:
last_line = None
lines = []
for line in sourceFile:
# if you want to skip lines with only whitespace, you could add something like:
# line = line.lstrip(" \t")
if last_line != "\n":
lines.append(line)
last_line = line
contents = "".join(lines)
I was trying to find some clever generator function way of writing this, but it's been a long week so I can't.
Code untested, but I think it should work?
(edit: One upside is I removed the need for regular expressions which fixes the "now you have two problems" problem :) )
(another edit based on Marc Chiesa's suggestion of lingering whitespace)
For someone who can't do regex like me, if the code to process is python:
import autopep8
autopep8.fixcode('your_code')
Another quick solution, just in case your code isn't Python:
for x in range(100):
content.replace(" ", " ") # reduce the number of multiple whitespaces
# then
for x in range(20):
content.replace("\n\n", "\n") # reduce the number of multiple white lines
Note that if you have more than 100 consecutive whitespaces or 20 consecutive new lines, you'll want to increase the repetition times.
If decoding from unicode, watch out for non-breaking spaces which show up in cat -vet as M-BM-:
sourceFileContents = sourceFile.read()
sourceFileContents = re.sub(r'\n(\s*\n)+','\n\n',sourceFileContents.replace("\xc2\xa0"," "))