Reading a multiline file with "&" as separator

Reading a multiline file with "&" as separator - python

I have a file which contains the following content:
This is the first line.
&
This is the second line
but without separator.
&
This is the third line.
...
Each line terminates with a \n. I want to convert this file input into the following list:
['This is the first line.', 'This is the second line but without separator.', 'This is the third line.', ...]
My actual code looks like:
file = open("/path/to/file", "r")
list = [line.rstrip() for line in file if not line.rstrip() is "&"]
The problem is that the multi line section gets separated in the list but I want it togehter with or without a \n in it.
I hope someone can give me a hint. Thanks!

just split the whole file by & and remove whitespace (assuming that they should just be separated by &)
l = [s.strip().replace('\n', ' ') for s in file.read().split('&')]

Here is a working example. You already know how to read the file, here is how you might parse the contents.
file_contents = """This is the first line.
&
This is the second line
but without separator.
&
This is the third line."""
all_lines = []
for l in file_contents.split('&'):
all_lines.append(" ".join(l.split('\n')).rstrip())
print(all_lines)
Prints:
['This is the first line.', ' This is the second line but without separator.', ' This is the third line.']

How about read all the lines and join them as a single string, then use String.split("&")
with open("test.txt") as file:
lines = file.read()
print(lines.split("&"))
# to remove the \n
print(lines.replace("\n", "").split("&"))

Related

Slice strings in .txt and return only one of the new strings

I want to use lines of strings of a .txt file as search queries in other .txt files. But before this, I need to slice those strings of the lines of my original text data. Is there a simple way to do this?
This is my original .txt data:
CHEMBL2057820|MUBD_HDAC2_ligandset|mol2|42|dock12
CHEMBL1957458|MUBD_HDAC2_ligandset|mol2|58|dock10
CHEMBL251144|MUBD_HDAC2_ligandset|mol2|41|dock98
CHEMBL269935|MUBD_HDAC2_ligandset|mol2|30|dock58
... (over thousands)
And I need to have a new file where the new new lines contain only part of those strings, like:
CHEMBL2057820
CHEMBL1957458
CHEMBL251144
CHEMBL269935

Open the file, read in the lines and split each line at the | character, then index the first result
with open("test.txt") as f:
parts = (line.lstrip().split('|', 1)[0] for line in f)
with open('dest.txt', 'w') as dest:
dest.write("\n".join(parts))
Explanation:
lstrip - removes whitespace on leading part of the line
split("|") returns a list like: ['CHEMBL2057820', 'MUBD_HDAC2_ligandset', 'mol2', '42', 'dock12'] for each line
Since we're only conerned with the first section it's redundant to split the rest of the contents of the line on the | character, so we can specify a maxsplit argument, which will stop splitting the string after it's encoutered that many chacters
So split("|", 1)
gives['CHEMBL2057820','MUBD_HDAC2_ligandset|mol2|42|dock12']
Since we're only interested in the first part split("|", 1)[0] returns
the "CHEMBL..." section

Use split and readlines:
with open('foo.txt') as f:
g = open('bar.txt')
lines = f.readlines()
for line in lines:
l = line.strip().split('|')[0]
g.write(l)

Python - replace a line by its column in file

Sorry for posting such an easy question, but i couldn't find an answer on google.
I wish my code to do something like this
code:
lines = open("Bal.txt").write
lines[1] = new_value
lines.close()
p.s i wish to replace the line in a file with a value

xxx.dat before:
ddddddddddddddddd
EEEEEEEEEEEEEEEEE
fffffffffffffffff
with open('xxx.txt','r') as f:
x=f.readlines()
x[1] = "QQQQQQQQQQQQQQQQQQQ\n"
with open('xxx.txt','w') as f:
f.writelines(x)
xxx.dat after:
ddddddddddddddddd
QQQQQQQQQQQQQQQQQQQ
fffffffffffffffff
Note:f.read() returns a string, whereas f.readlines() returns a list, enabling you to replace an occurrence within that list.
Inclusion of the \n (Linux) newline character is important to separate line[1] from line[2] when you next read the file, or you would end up with:
ddddddddddddddddd
QQQQQQQQQQQQQQQQQQQfffffffffffffffff

How to join continue line seperated with ellipses (...) in python

'''
This is single line.
This is second long line
... continue from previous line.
This third single line.
'''
I want to join lines which separated by ellipsis(...). This I want to do in Python. The long line is separated by new line (\n) and ellipsis (...). I am reading this file line by line and doing some operation on specific lines, but continue line ends with new line (\n) and next line starts with ellipsis (...). Because of this I am not able to get full line to do specific operation.
The lines I have took as example were from big file (lines more than 800). The python utility parse the files, search lines with specific keywords and replace some portion of the line with new syntax. This I want to do on multiple files.
Please advise me.

You can simply do:
delim = '...'
text = '''This is single line.
This is second long line
... continue from previous line.
This third single line.
'''
# here we're building a list containing each line
# we'll clean up the leading and trailing whitespace
# by mapping Python's `str.strip` method onto each
# line
# this gives us:
#
# ['This is single line.', 'This is second long line',
# '... continue from previous line.', 'This third single line.', '']
cleaned_lines = map(str.strip, text.split('\n'))
# next, we'll join our cleaned string on newlines, so we'll get back
# the original string without excess whitespace
# this gives us:
#
# This is single line.
# This is second long line
# ... continue from previous line.
# This third single line.
cleaned_str = '\n'.join(cleaned_lines)
# now, we'll split on our delimiter '...'
# this gives us:
#
# ['This is single line.\nThis is second long line\n',
# ' continue from previous line.\nThis third single line.\n']
split_str = cleaned_str.split(delim)
# lastly, we'll now strip off trailing whitespace (which includes)
# newlines. Then, we'll join our list together on an empty string
new_str = ''.join(map(str.rstrip, split_str))
print new_str
which outputs
This is single line.
This is second long line continue from previous line.
This third single line.

You can split the lines on line breaks, and then loop through and add the ellipses lines to the previous line, like so:
lines = lines.split('\n')
for i, line in enumerate(lines):
line = line.strip().lstrip()
if line.startswith('...') and i != 0:
lines[i - 1] = lines[i - 1].strip().lstrip() + line.replace('...', '')
del lines[i]

How to delete the last letter in the word if it is 'a' or 'o'?

I've got txt file with list of words, something like this:
adsorbowanie
adsorpcje
adular
adwena
adwent
adwentnio
adwentysta
adwentystka
adwersarz
adwokacjo
And I want to delete the last letter in every word, if that letter is "a" or "o".
I'm very new to this, so please explain this simply.

re.sub(r"[ao]$","",word)
This should do it for you.

Try this:
import re
import os
# Read the file, split the contents into a list of lines,
# removing line separators
with open('input.txt') as infile:
lines = infile.read().splitlines()
# Remove any whitespace around the word.
# If you are certain the list doesn't contain whitespace
# around the word, you can leave this out...
# (this is called a "list comprehansion", by the way)
lines = [line.strip() for line in lines]
# Remove letters if necessary, using regular expressions.
outlines = [re.sub('[ao]$', '', line) for line in lines]
# Join the output with appropriate line separators
outdata = os.linesep.join(outlines)
# Write the output to a file
with open('output.txt', 'w') as outfile:
outfile.write(outdata)

First read the file and split the lines. After that cut off the last char if your condition is fulfilled and append the new string to a list containing the analysed and modified strings/lines:
#!/usr/bin/env python3
# coding: utf-8
# open file, read lines and store them in a list
with open('words.txt') as f:
lines = f.read().splitlines()
# analyse lines read from file
new_lines = []
for s in lines:
# analyse last char of string,
# get rid of it if condition is fulfilled and append string to new list
s = s[:-1] if s[-1] in ['a', 'o'] else s
new_lines.append(s)
print(new_lines)

How to read a text file into a string variable and strip newlines?

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.

You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()

In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')

You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951

To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])

with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.

This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()

I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)

It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)

Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])

you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank

You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.

This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()

f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)

python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]

Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed

Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)

To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware

I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])

Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']

with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)

from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())

This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.

file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str

Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a multiline file with "&" as separator - python

just split the whole file by & and remove whitespace (assuming that they should just be separated by &) l = [s.strip().replace('\n', ' ') for s in file.read().split('&')]

How about read all the lines and join them as a single string, then use String.split("&") with open("test.txt") as file: lines = file.read() print(lines.split("&")) # to remove the \n print(lines.replace("\n", "").split("&"))

Related

Slice strings in .txt and return only one of the new strings

Python - replace a line by its column in file

How to join continue line seperated with ellipses (...) in python

How to delete the last letter in the word if it is 'a' or 'o'?

How to read a text file into a string variable and strip newlines?

Categories

Resources