'''
This is single line.
This is second long line
... continue from previous line.
This third single line.
'''
I want to join lines which separated by ellipsis(...). This I want to do in Python. The long line is separated by new line (\n) and ellipsis (...). I am reading this file line by line and doing some operation on specific lines, but continue line ends with new line (\n) and next line starts with ellipsis (...). Because of this I am not able to get full line to do specific operation.
The lines I have took as example were from big file (lines more than 800). The python utility parse the files, search lines with specific keywords and replace some portion of the line with new syntax. This I want to do on multiple files.
Please advise me.
You can simply do:
delim = '...'
text = '''This is single line.
This is second long line
... continue from previous line.
This third single line.
'''
# here we're building a list containing each line
# we'll clean up the leading and trailing whitespace
# by mapping Python's `str.strip` method onto each
# line
# this gives us:
#
# ['This is single line.', 'This is second long line',
# '... continue from previous line.', 'This third single line.', '']
cleaned_lines = map(str.strip, text.split('\n'))
# next, we'll join our cleaned string on newlines, so we'll get back
# the original string without excess whitespace
# this gives us:
#
# This is single line.
# This is second long line
# ... continue from previous line.
# This third single line.
cleaned_str = '\n'.join(cleaned_lines)
# now, we'll split on our delimiter '...'
# this gives us:
#
# ['This is single line.\nThis is second long line\n',
# ' continue from previous line.\nThis third single line.\n']
split_str = cleaned_str.split(delim)
# lastly, we'll now strip off trailing whitespace (which includes)
# newlines. Then, we'll join our list together on an empty string
new_str = ''.join(map(str.rstrip, split_str))
print new_str
which outputs
This is single line.
This is second long line continue from previous line.
This third single line.
You can split the lines on line breaks, and then loop through and add the ellipses lines to the previous line, like so:
lines = lines.split('\n')
for i, line in enumerate(lines):
line = line.strip().lstrip()
if line.startswith('...') and i != 0:
lines[i - 1] = lines[i - 1].strip().lstrip() + line.replace('...', '')
del lines[i]
Related
I have a file which contains the following content:
This is the first line.
&
This is the second line
but without separator.
&
This is the third line.
...
Each line terminates with a \n. I want to convert this file input into the following list:
['This is the first line.', 'This is the second line but without separator.', 'This is the third line.', ...]
My actual code looks like:
file = open("/path/to/file", "r")
list = [line.rstrip() for line in file if not line.rstrip() is "&"]
The problem is that the multi line section gets separated in the list but I want it togehter with or without a \n in it.
I hope someone can give me a hint. Thanks!
just split the whole file by & and remove whitespace (assuming that they should just be separated by &)
l = [s.strip().replace('\n', ' ') for s in file.read().split('&')]
Here is a working example. You already know how to read the file, here is how you might parse the contents.
file_contents = """This is the first line.
&
This is the second line
but without separator.
&
This is the third line."""
all_lines = []
for l in file_contents.split('&'):
all_lines.append(" ".join(l.split('\n')).rstrip())
print(all_lines)
Prints:
['This is the first line.', ' This is the second line but without separator.', ' This is the third line.']
How about read all the lines and join them as a single string, then use String.split("&")
with open("test.txt") as file:
lines = file.read()
print(lines.split("&"))
# to remove the \n
print(lines.replace("\n", "").split("&"))
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
Salutations, I am trying to write a function that prints data from a text file line by line. The output needs to have the number of the line followed by a colon and a space. I came up with the following code;
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number)+": "+line, end=' ')
line_number += 1
The issue is when I run this function using test text files I created, the first line is not on the same indentation level as the rest of the lines in the output, ie. the outputs look kind of like
1: 9874234,12.5,23.0,50.0
2: 7840231,70,60,85.4
3: 3845913,55.5,60.5,80.0
4: 3849511,20,60,50
Where am I going wrong? Thanks
Replace the value of end argument with empty string instead of space. As end argument is a space, it's printing a space after every line. So latter lines have a space at the beginning of the line.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line, end='')
line_number += 1
Another way you can do this, is strip the new lines and print without passing any value to end argument. This will remove the \n it has at the end of the line and a new line will be printed as end="\n" by default.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line.strip("\n"))
line_number += 1
This has to do with your print statement.
print(str(line_number)+": "+line, end=' ')
You probably saw that when printing your lines there was an extra line between them and then you tried to work around this by using end=' '.
If you want to remove the 'empty' lines you should use line.strip(). This removes them.
Use this:
print(str(line_number)+": "+line.strip())
strip can also take an argument. This is from the documentation:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:
Whats up with that?
The lines in your file are not separated into different lines by nothing. On linux a newline is represented by \n. Normal editors convert these by pushing the text down into a new line.
When reading a file Python separates lines on exactly these \n but doesn't throw them away. When printing they will be interpreted again and combined with the newline a print adds there will be one newline 'too much'.
The end parameter in your print statement simply changes what print will use after printing a line. Default is \n.
Check what it does when you use end=" !":
1: aaa
!2: bbb
!3: ccc
You can see the \n after 'aaa' causing a newline (which is part of the string) and after that print adds the contents of end. So it adds a !. The next line is printed in the same line because there is no other newline that would cause a line break before printing it.
You specified end argument as a space. So after first line each has this extra space.
line that your read from file looks somehting like this:
'9874234,12.5,23.0,50.0\n'
Look at the ending. Line translation happens is due to original line.
So to get what you want you just need to change end argument of print to empty string( not space)
Moreover, I advise you to change the implementation of the function and use enumerate for line numbering.
def print_numbered_lines(filename):
data = open(filename)
for i, line in enumerate(data):
print(str(i+1)+": "+line, end='')
Having a hard time understanding what is happening in this snippet of code. Particularly with the 2nd line of code.
for line in infile:
data = line.strip('\n').split(':')
user_dict[data[0]] = data[1]
The line sets the variable data equal to the string represented by the variable line with the new line character '\n' removed and then split anywhere a : occurs.
It parses a file having this structure:
a:52
b:hi
key:value
for line in infile: is a loop for each line in the file. Each line (except for the last maybe) ends with new-line symbol \n.
line.strip('\n') removes the new-line symbol.
.split(':') splits the string into strings there were separated by :. For example: "qwe:rty:uio".split(':') -> ["qwe", "rty", "uio"]
user_dict[data[0]] = data[1] obviously saves the data into the dicionary user_dict taking the first string as a key, and second one as a value.
For the file mentioned above this code creates the following dictionary:
{"a": "52", "b": "hi", "key": "value"}
line.strip('\n') is removing all the \n (new line) from the string and the split(':') it is going to split your string using :
as the delimeter into an array of strings.
Above code is storing the file into the dictionary. Content of the file is like below
key1:value1
key2:value2
.
.
.
key3:value3
Second line is stripping off the \n character from the line and then splitting the each line by : character. However you should try to understand and debug the code line by line
line.strip('\n') will remove all the newlines from the string.
and
split(':') will split your string using ':' into array of strings.
data = line.strip('\n').split(':')
There are two string functions in one line. You also can separate the calls. This should be the same:
my_line = line.strip('\n')
my_line1 = my_line.split(':')
line.strip --> removes the new line character at the end of a line
line.split(':') --> splits the values at colon character and return a list of each record
It is easier to understand with concrete values.
Your file look like this and you loop through each line.
Name: Paul
Age: 18
Gender: Male
At the end of each line you have a "new line" character which will remove line.strip('\n').
Then you split the values at ":"
You finally create a dictionary (line 3) where key is the left side and the value is the right side.
dict['Name'] = 'Paul'
dict['Age'] = '18'
Basically line.strip('\n') removes leading consecutive newlines and trailing consecutive newlines, but leaves embedded newlines alone. from line; and then split(':') separates anywhere ":" is.This is then stored as a list in the variable named data.
I'm trying to append lines to an empty list reading from a file, and I've already stripped the lines of returns and newlines, but what should be one line is being entered as two separate items into the list.
DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
line = line.rstrip()
line = line.lstrip()
if line.startswith('>')==True:
DNAID.append(line)
if line.startswith('>')==False:
DNASEQ.append(line)
print DNAID
print DNASEQ
And here's the output
['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808']
['CCTGCGGAAGATCGGCACTAGA', 'TCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCA', 'ATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAG', 'TGGGAACCTGCGGGCAGTAGGTGGAAT']
I want it to look like this:
['>Rosalind_6404', '>Rosalind_5959', '>Rosalind_0808']
['CCTGCGGAAGATCGGCACTAGATCCCACTAATAATTCTGAGG', 'CCATCGGTAGCGCATCCTTAGTCCAATATCCATTTGTCAGCAGACACGC', 'CCACCCTCGTGGTATGGCTAGGCATTCAGTGGGAACCTGCGGGCAGTAGGTGGAAT']
Here is the source material, just remove the ''s:
['>Rosalind_6404'
CCTGCGGAAGATCGGCACTAGA
TCCCACTAATAATTCTGAGG
'>Rosalind_5959'
CCATCGGTAGCGCATCCTTAGTCCA
ATATCCATTTGTCAGCAGACACGC
'>Rosalind_0808'
CCACCCTCGTGGTATGGCTAGGCATTCAG
TGGGAACCTGCGGGCAGTAGGTGGAAT]
You can combine the .lstrip() and .rstrip() into a single .strip() call.
Then, you were thinking that .append() both added lines to a list and joined lines into a single line. Here, we start DNASEQ with an empty string and use += to join the lines into a long string:
DNA = open('DNAGCex.txt')
DNAID = []
DNASEQ = []
for line in DNA:
line = line.strip()
if line.startswith('>'):
DNAID.append(line)
DNASEQ.append('')
else:
DNASEQ[-1] += line
print DNAID
print DNASEQ
Within each iteration of the loop, you're only looking at a certain line from the file. This means that, although you certainly are appending lines that don't contain a linefeed at the end, you're still appending one of the file's lines at a time. You'll have to let the interpreter know that you want to combine certain lines, by doing something like setting a flag when you first start to read in a DNASEQ and clearing it when the next DNAID starts.
for line in DNA:
line = line.strip() # gets both sides
if line.startswith('>'):
starting = True
DNAID.append(line)
elif starting:
starting = False
DNASEQ.append(line)
else:
DNASEQ[-1] += line