Removing formatting when reading from file [duplicate] - python

This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 5 years ago.
I have a .txt file with values in it.
The values are listed like so:
Value1
Value2
Value3
Value4
My goal is to put the values in a list. When I do so, the list looks like this:
['Value1\n', 'Value2\n', ...]
The \n is not needed.
Here is my code:
t = open('filename.txt')
contents = t.readlines()

This should do what you want (file contents in a list, by line, without \n)
with open(filename) as f:
mylist = f.read().splitlines()

I'd do this:
alist = [line.rstrip() for line in open('filename.txt')]
or:
with open('filename.txt') as f:
alist = [line.rstrip() for line in f]

You can use .rstrip('\n') to only remove newlines from the end of the string:
for i in contents:
alist.append(i.rstrip('\n'))
This leaves all other whitespace intact. If you don't care about whitespace at the start and end of your lines, then the big heavy hammer is called .strip().
However, since you are reading from a file and are pulling everything into memory anyway, better to use the str.splitlines() method; this splits one string on line separators and returns a list of lines without those separators; use this on the file.read() result and don't use file.readlines() at all:
alist = t.read().splitlines()

After opening the file, list comprehension can do this in one line:
fh=open('filename')
newlist = [line.rstrip() for line in fh.readlines()]
fh.close()
Just remember to close your file afterwards.

I used the strip function to get rid of newline character as split lines was throwing memory errors on 4 gb File.
Sample Code:
with open('C:\\aapl.csv','r') as apple:
for apps in apple.readlines():
print(apps.strip())

for each string in your list, use .strip() which removes whitespace from the beginning or end of the string:
for i in contents:
alist.append(i.strip())
But depending on your use case, you might be better off using something like numpy.loadtxt or even numpy.genfromtxt if you need a nice array of the data you're reading from the file.

from string import rstrip
with open('bvc.txt') as f:
alist = map(rstrip, f)
Nota Bene: rstrip() removes the whitespaces, that is to say : \f , \n , \r , \t , \v , \x and blank ,
but I suppose you're only interested to keep the significant characters in the lines. Then, mere map(strip, f) will fit better, removing the heading whitespaces too.
If you really want to eliminate only the NL \n and RF \r symbols, do:
with open('bvc.txt') as f:
alist = f.read().splitlines()
splitlines() without argument passed doesn't keep the NL and RF symbols (Windows records the files with NLRF at the end of lines, at least on my machine) but keeps the other whitespaces, notably the blanks and tabs.
.
with open('bvc.txt') as f:
alist = f.read().splitlines(True)
has the same effect as
with open('bvc.txt') as f:
alist = f.readlines()
that is to say the NL and RF are kept

I had the same problem and i found the following solution to be very efficient. I hope that it will help you or everyone else who wants to do the same thing.
First of all, i would start with a "with" statement as it ensures the proper open/close of the file.
It should look something like this:
with open("filename.txt", "r+") as f:
contents = [x.strip() for x in f.readlines()]
If you want to convert those strings (every item in the contents list is a string) in integer or float you can do the following:
contents = [float(contents[i]) for i in range(len(contents))]
Use int instead of float if you want to convert to integer.
It's my first answer in SO, so sorry if it's not in the proper formatting.

I recently used this to read all the lines from a file:
alist = open('maze.txt').read().split()
or you can use this for that little bit of extra added safety:
with f as open('maze.txt'):
alist = f.read().split()
It doesn't work with whitespace in-between text in a single line, but it looks like your example file might not have whitespace splitting the values. It is a simple solution and it returns an accurate list of values, and does not add an empty string: '' for every empty line, such as a newline at the end of the file.

with open('D:\\file.txt', 'r') as f1:
lines = f1.readlines()
lines = [s[:-1] for s in lines]

The easiest way to do this is to write file.readline()[0:-1]
This will read everything except the last character, which is the newline.

Related

python doesn't append each line but skips some

I have a complete_list_of_records which has a length of 550
this list would look something like this:
Apples
Pears
Bananas
The issue is that when i use:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i)
the outcome of the file is 393 long and the structure someplaces looks like so
Apples
PearsBananas
Pineapples
I have tried with "w" instead of "a" append and manually inserted "\n" for each item in the list but this just creates blank spaces on every second row and still som rows have the same issue with dual lines in one.
Anyone who has encountered something similar?
From the comments seen so far, I think there are strings in the source list that contain newline characters in positions other than at the end. Also, it seems that some strings end with newline character(s) but not all.
I suggest replacing embedded newlines with some other character - e.g., underscore.
Therefore I suggest this:
with open("recordedlines.txt", "w") as recorded_lines:
for line in complete_list_of_records:
line = line.rstrip() # remove trailing whitespace
line = line.replace('\n', '_') # replace any embedded newlines with underscore
print(line, file=recorded_lines) # print function will add a newline
You could simply strip all whitespaces off in any case and then insert a newline per hand like so:
with open("recordedlines.txt", "a") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i.strip() + "\n")
you need to use
file.writelines(listOfRecords)
but the list values must have '\n'
f = open("demofile3.txt", "a")
li = ["See you soon!", "Over and out."]
li = [i+'\n' for i in li]
f.writelines(li)
f.close()
#open and read the file after the appending:
f = open("demofile3.txt", "r")
print(f.read())
output will be
See you soon!
Over and out.
you can also use for loop with write() having '\n' at each iteration
[Soln][1]
complete_list_of_records =['1.Apples','2.Pears','3.Bananas','4.Pineapples']
with open("recordedlines.txt", "w") as recorded_lines:
for i in complete_list_of_records:
recorded_lines.write(i+"\n")
I think it should work.
Make sure that, you write as a string.

Im trying to unify words together on my python application [duplicate]

This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 5 years ago.
I have a .txt file with values in it.
The values are listed like so:
Value1
Value2
Value3
Value4
My goal is to put the values in a list. When I do so, the list looks like this:
['Value1\n', 'Value2\n', ...]
The \n is not needed.
Here is my code:
t = open('filename.txt')
contents = t.readlines()
This should do what you want (file contents in a list, by line, without \n)
with open(filename) as f:
mylist = f.read().splitlines()
I'd do this:
alist = [line.rstrip() for line in open('filename.txt')]
or:
with open('filename.txt') as f:
alist = [line.rstrip() for line in f]
You can use .rstrip('\n') to only remove newlines from the end of the string:
for i in contents:
alist.append(i.rstrip('\n'))
This leaves all other whitespace intact. If you don't care about whitespace at the start and end of your lines, then the big heavy hammer is called .strip().
However, since you are reading from a file and are pulling everything into memory anyway, better to use the str.splitlines() method; this splits one string on line separators and returns a list of lines without those separators; use this on the file.read() result and don't use file.readlines() at all:
alist = t.read().splitlines()
After opening the file, list comprehension can do this in one line:
fh=open('filename')
newlist = [line.rstrip() for line in fh.readlines()]
fh.close()
Just remember to close your file afterwards.
I used the strip function to get rid of newline character as split lines was throwing memory errors on 4 gb File.
Sample Code:
with open('C:\\aapl.csv','r') as apple:
for apps in apple.readlines():
print(apps.strip())
for each string in your list, use .strip() which removes whitespace from the beginning or end of the string:
for i in contents:
alist.append(i.strip())
But depending on your use case, you might be better off using something like numpy.loadtxt or even numpy.genfromtxt if you need a nice array of the data you're reading from the file.
from string import rstrip
with open('bvc.txt') as f:
alist = map(rstrip, f)
Nota Bene: rstrip() removes the whitespaces, that is to say : \f , \n , \r , \t , \v , \x and blank ,
but I suppose you're only interested to keep the significant characters in the lines. Then, mere map(strip, f) will fit better, removing the heading whitespaces too.
If you really want to eliminate only the NL \n and RF \r symbols, do:
with open('bvc.txt') as f:
alist = f.read().splitlines()
splitlines() without argument passed doesn't keep the NL and RF symbols (Windows records the files with NLRF at the end of lines, at least on my machine) but keeps the other whitespaces, notably the blanks and tabs.
.
with open('bvc.txt') as f:
alist = f.read().splitlines(True)
has the same effect as
with open('bvc.txt') as f:
alist = f.readlines()
that is to say the NL and RF are kept
I had the same problem and i found the following solution to be very efficient. I hope that it will help you or everyone else who wants to do the same thing.
First of all, i would start with a "with" statement as it ensures the proper open/close of the file.
It should look something like this:
with open("filename.txt", "r+") as f:
contents = [x.strip() for x in f.readlines()]
If you want to convert those strings (every item in the contents list is a string) in integer or float you can do the following:
contents = [float(contents[i]) for i in range(len(contents))]
Use int instead of float if you want to convert to integer.
It's my first answer in SO, so sorry if it's not in the proper formatting.
I recently used this to read all the lines from a file:
alist = open('maze.txt').read().split()
or you can use this for that little bit of extra added safety:
with f as open('maze.txt'):
alist = f.read().split()
It doesn't work with whitespace in-between text in a single line, but it looks like your example file might not have whitespace splitting the values. It is a simple solution and it returns an accurate list of values, and does not add an empty string: '' for every empty line, such as a newline at the end of the file.
with open('D:\\file.txt', 'r') as f1:
lines = f1.readlines()
lines = [s[:-1] for s in lines]
The easiest way to do this is to write file.readline()[0:-1]
This will read everything except the last character, which is the newline.

Remove commas and newlines from text file in python

I have text file which looks like this:
ab initio
ab intestato
ab intra
a.C.
acanka, acance, acanek, acankach, acankami, acankÄ…
Achab, Achaba, Achabem, Achabie, Achabowi
I would like to pars every word separated by comma into a list. So it would look like ['ab initio', 'ab intestato', 'ab intra','a.C.', 'acanka', ...] Also mind the fact that there are words on new lines that are not ending with commas.
When I used
list1.append(line.strip()) it gave me string of every line instead of separate words. Can someone provide me some insight into this?
Full code below:
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
line = fp.readline()
while line:
list1.append(line.strip(','))
line = fp.readline()
Very close, but I think you want split instead of strip, and extend instead of append
You can also iterate directly over the lines with a for loop.
list1=[]
filepath="words.txt"
with open(filepath, encoding="utf8") as fp:
for line in fp:
list1.extend(line.strip().split(', '))
You can use your code to get down to "list of line"-content and apply:
cleaned = [ x for y in list1 for x in y.split(',')]
this essentially takes any thing you parsed into your list and splits it at , to creates a new list.
sberrys all in one solution that uses no intermediate list is faster.

Slice strings in .txt and return only one of the new strings

I want to use lines of strings of a .txt file as search queries in other .txt files. But before this, I need to slice those strings of the lines of my original text data. Is there a simple way to do this?
This is my original .txt data:
CHEMBL2057820|MUBD_HDAC2_ligandset|mol2|42|dock12
CHEMBL1957458|MUBD_HDAC2_ligandset|mol2|58|dock10
CHEMBL251144|MUBD_HDAC2_ligandset|mol2|41|dock98
CHEMBL269935|MUBD_HDAC2_ligandset|mol2|30|dock58
... (over thousands)
And I need to have a new file where the new new lines contain only part of those strings, like:
CHEMBL2057820
CHEMBL1957458
CHEMBL251144
CHEMBL269935
Open the file, read in the lines and split each line at the | character, then index the first result
with open("test.txt") as f:
parts = (line.lstrip().split('|', 1)[0] for line in f)
with open('dest.txt', 'w') as dest:
dest.write("\n".join(parts))
Explanation:
lstrip - removes whitespace on leading part of the line
split("|") returns a list like: ['CHEMBL2057820', 'MUBD_HDAC2_ligandset', 'mol2', '42', 'dock12'] for each line
Since we're only conerned with the first section it's redundant to split the rest of the contents of the line on the | character, so we can specify a maxsplit argument, which will stop splitting the string after it's encoutered that many chacters
So split("|", 1)
gives['CHEMBL2057820','MUBD_HDAC2_ligandset|mol2|42|dock12']
Since we're only interested in the first part split("|", 1)[0] returns
the "CHEMBL..." section
Use split and readlines:
with open('foo.txt') as f:
g = open('bar.txt')
lines = f.readlines()
for line in lines:
l = line.strip().split('|')[0]
g.write(l)

Getting rid of \n when using .readlines() [duplicate]

This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 5 years ago.
I have a .txt file with values in it.
The values are listed like so:
Value1
Value2
Value3
Value4
My goal is to put the values in a list. When I do so, the list looks like this:
['Value1\n', 'Value2\n', ...]
The \n is not needed.
Here is my code:
t = open('filename.txt')
contents = t.readlines()
This should do what you want (file contents in a list, by line, without \n)
with open(filename) as f:
mylist = f.read().splitlines()
I'd do this:
alist = [line.rstrip() for line in open('filename.txt')]
or:
with open('filename.txt') as f:
alist = [line.rstrip() for line in f]
You can use .rstrip('\n') to only remove newlines from the end of the string:
for i in contents:
alist.append(i.rstrip('\n'))
This leaves all other whitespace intact. If you don't care about whitespace at the start and end of your lines, then the big heavy hammer is called .strip().
However, since you are reading from a file and are pulling everything into memory anyway, better to use the str.splitlines() method; this splits one string on line separators and returns a list of lines without those separators; use this on the file.read() result and don't use file.readlines() at all:
alist = t.read().splitlines()
After opening the file, list comprehension can do this in one line:
fh=open('filename')
newlist = [line.rstrip() for line in fh.readlines()]
fh.close()
Just remember to close your file afterwards.
I used the strip function to get rid of newline character as split lines was throwing memory errors on 4 gb File.
Sample Code:
with open('C:\\aapl.csv','r') as apple:
for apps in apple.readlines():
print(apps.strip())
for each string in your list, use .strip() which removes whitespace from the beginning or end of the string:
for i in contents:
alist.append(i.strip())
But depending on your use case, you might be better off using something like numpy.loadtxt or even numpy.genfromtxt if you need a nice array of the data you're reading from the file.
from string import rstrip
with open('bvc.txt') as f:
alist = map(rstrip, f)
Nota Bene: rstrip() removes the whitespaces, that is to say : \f , \n , \r , \t , \v , \x and blank ,
but I suppose you're only interested to keep the significant characters in the lines. Then, mere map(strip, f) will fit better, removing the heading whitespaces too.
If you really want to eliminate only the NL \n and RF \r symbols, do:
with open('bvc.txt') as f:
alist = f.read().splitlines()
splitlines() without argument passed doesn't keep the NL and RF symbols (Windows records the files with NLRF at the end of lines, at least on my machine) but keeps the other whitespaces, notably the blanks and tabs.
.
with open('bvc.txt') as f:
alist = f.read().splitlines(True)
has the same effect as
with open('bvc.txt') as f:
alist = f.readlines()
that is to say the NL and RF are kept
I had the same problem and i found the following solution to be very efficient. I hope that it will help you or everyone else who wants to do the same thing.
First of all, i would start with a "with" statement as it ensures the proper open/close of the file.
It should look something like this:
with open("filename.txt", "r+") as f:
contents = [x.strip() for x in f.readlines()]
If you want to convert those strings (every item in the contents list is a string) in integer or float you can do the following:
contents = [float(contents[i]) for i in range(len(contents))]
Use int instead of float if you want to convert to integer.
It's my first answer in SO, so sorry if it's not in the proper formatting.
I recently used this to read all the lines from a file:
alist = open('maze.txt').read().split()
or you can use this for that little bit of extra added safety:
with f as open('maze.txt'):
alist = f.read().split()
It doesn't work with whitespace in-between text in a single line, but it looks like your example file might not have whitespace splitting the values. It is a simple solution and it returns an accurate list of values, and does not add an empty string: '' for every empty line, such as a newline at the end of the file.
with open('D:\\file.txt', 'r') as f1:
lines = f1.readlines()
lines = [s[:-1] for s in lines]
The easiest way to do this is to write file.readline()[0:-1]
This will read everything except the last character, which is the newline.

Categories

Resources