How to convert chars to lines in Python using read method - python

Noob here. I need to read in a file, using the read (rather than readlines()) method (which provides the input to several functions), and identify all of the lines in that file (i.e. to print or to append to a list).
I've tried join, split, appending to lists, all with little to show.
# Code I'm stuck with:
with open("text.txt", 'r') as file:
a = file.read()
# Stuff that doesn't work
for line in a:
# can't manipulate when using the below, but prints fine
# print(line, end = '')
temp = (line, end = '')
for line in a:
temp = ''
while not ' ':
temp += line
new = []
for i in a:
i = i.strip()
I tend to get either everything in a long string, or
'I', ' ', 't','e','n','d',' ', 't','o' .... get individual chars. I'm just looking to get each line up to the newline char \n, or basically, what readlines() would give me, despite the file being stored in memory using read()

with open('text.txt') as file:
for line in file:
# do whatever you want with the line
The file object is iterable over the lines in the file - for a text file.

All you need to do is split the file after reading and you get the list of each line.
with open("text.txt", 'r') as file:
a = file.read()
a.split('\n')

With the above help, and using read rather than readlines, I was able to separate out individual lines from a file as follows:
with open("fewwords.txt", "r") as file:
a = file.read()
empty_list = []
# break a, which is read in as 1 really big string, into lines, then remove newline char
a = a.split('\n')
for i in range(len(a)):
initial_list.append(a[i])

Related

Trying to remove multiple space in txt file using python [duplicate]

So I have this crazy long text file made by my crawler and it for some reason added some spaces inbetween the links, like this:
https://example.com/asdf.html (note the spaces)
https://example.com/johndoe.php (again)
I want to get rid of that, but keep the new line. Keep in mind that the text file is 4.000+ lines long. I tried to do it myself but figured that I have no idea how to loop through new lines in files.
Seems like you can't directly edit a python file, so here is my suggestion:
# first get all lines from file
with open('file.txt', 'r') as f:
lines = f.readlines()
# remove spaces
lines = [line.replace(' ', '') for line in lines]
# finally, write lines in the file
with open('file.txt', 'w') as f:
f.writelines(lines)
You can open file and read line by line and remove white space -
Python 3.x:
with open('filename') as f:
for line in f:
print(line.strip())
Python 2.x:
with open('filename') as f:
for line in f:
print line.strip()
It will remove space from each line and print it.
Hope it helps!
Read text from file, remove spaces, write text to file:
with open('file.txt', 'r') as f:
txt = f.read().replace(' ', '')
with open('file.txt', 'w') as f:
f.write(txt)
In #Leonardo Chirivì's solution it's unnecessary to create a list to store file contents when a string is sufficient and more memory efficient. The .replace(' ', '') operation is only called once on the string, which is more efficient than iterating through a list performing replace for each line individually.
To avoid opening the file twice:
with open('file.txt', 'r+') as f:
txt = f.read().replace(' ', '')
f.seek(0)
f.write(txt)
f.truncate()
It would be more efficient to only open the file once. This requires moving the file pointer back to the start of the file after reading, as well as truncating any possibly remaining content left over after you write back to the file. A drawback to this solution however is that is not as easily readable.
I had something similar that I'd been dealing with.
This is what worked for me (Note: This converts from 2+ spaces into a comma, but if you read below the code block, I explain how you can get rid of ALL whitespaces):
import re
# read the file
with open('C:\\path\\to\\test_file.txt') as f:
read_file = f.read()
print(type(read_file)) # to confirm that it's a string
read_file = re.sub(r'\s{2,}', ',', read_file) # find/convert 2+ whitespace into ','
# write the file
with open('C:\\path\\to\\test_file.txt', 'w') as f:
f.writelines('read_file')
This helped me then send the updated data to a CSV, which suited my need, but it can help for you as well, so instead of converting it to a comma (','), you can convert it to an empty string (''), and then [or] use a read_file.replace(' ', '') method if you don't need any whitespaces at all.
Lets not forget about adding back the \n to go to the next row.
The complete function would be :
with open(str_path, 'r') as file :
str_lines = file.readlines()
# remove spaces
if bl_right is True:
str_lines = [line.rstrip() + '\n' for line in str_lines]
elif bl_left is True:
str_lines = [line.lstrip() + '\n' for line in str_lines]
else:
str_lines = [line.strip() + '\n' for line in str_lines]
# Write the file out again
with open(str_path, 'w') as file:
file.writelines(str_lines)

How do I split each line into two strings and print without the comma?

I'm trying to have output to be without commas, and separate each line into two strings and print them.
My code so far yields:
173,70
134,63
122,61
140,68
201,75
222,78
183,71
144,69
But i'd like it to print it out without the comma and the values on each line separated as strings.
if __name__ == '__main__':
# Complete main section of code
file_name = "data.txt"
# Open the file for reading here
my_file = open('data.txt')
lines = my_file.read()
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
print(lines)
In your sample code, line contains the full content of the file as a str.
my_file = open('data.txt')
lines = my_file.read()
You then later re-open the file to iterate the lines:
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
Note, however, str.split and str.replace do not modify the existing value, as strs in python are immutable. Also note you are operating on lines there, rather than the for-loop variable line.
Instead, you'll need to assign the result of those functions into new values, or give them as arguments (E.g., to print). So you'll want to open the file, iterate over the lines and print the value with the "," replaced with a " ":
with open("data.txt") as f:
for line in f:
print(line.replace(",", " "))
Or, since you are operating on the whole file anyway:
with open("data.txt") as f:
print(f.read().replace(",", " "))
Or, as your file appears to be CSV content, you may wish to use the csv module from the standard library instead:
import csv
with open("data.txt", newline="") as csvfile:
for row in csv.reader(csvfile):
print(*row)
with open('data.txt', 'r') as f:
for line in f:
for value in line.split(','):
print(value)
while python can offer us several ways to open files this is the prefered one for working with files. becuase we are opening the file in lazy mode (this is the prefered one espicialy for large files), and after exiting the with scope (identation block) the file io will be closed automaticly by the system.
here we are openening the file in read mode. files folow the iterator polices, so we can iterrate over them like lists. each line is a true line in the file and is a string type.
After getting the line, in line variable, we split (see str.split()) the line into 2 tokens, one before the comma and the other after the comma. split return new constructed list of strings. if you need to omit some unwanted characters you can use the str.strip() method. usualy strip and split combined together.
elegant and efficient file reading - method 1
with open("data.txt", 'r') as io:
for line in io:
sl=io.split(',') # now sl is a list of strings.
print("{} {}".format(sl[0],sl[1])) #now we use the format, for printing the results on the screen.
non elegant, but efficient file reading - method 2
fp = open("data.txt", 'r')
line = None
while (line=fp.readline()) != '': #when line become empty string, EOF have been reached. the end of file!
sl=line.split(',')
print("{} {}".format(sl[0],sl[1]))

Reading and writing to files in python, comparing characters in different files to each other

I want to read 2 files in python, and based off those 2 create another file. The first file contains regular english (ex "hello") and the second file contains "cipher text" (2 5 letter random string Ex "aiwld" and "pqmcx") I want to match up the letter 'h' with the first letter in the cipher text and store it in the third file (the one that we created)
def cipher():
file = english.txt
file2 = secret.txt
file3 = cipher.txt
outputFile = open(file, 'r')
outputFile = open(file2, 'r')
So I have open, for reading, file and file2 and I want to match the first letter in the english.txt with the first letter in the secret.txt and and then write that letter to the cipher.txt file. I am completely lost on where to start and any help would be great.
Do I need to open both files, read from both, somehow compare and then write to the file?
I guess I am really unsure on how to compare individual letters in each file with other individual letters in a different file.
I think I would want something like set english.txt[0] == secret.txt[0] but I am not really sure.
The key thing you're looking at here is how to iterate over a file character by character (rather than the line by line you get more simply).
The simplest solution to this is to read the two files entirely into memory and iterate over them together. This can be done with the file.read() call and the zip() built-in. This suffers because large files would cause us to run out of memory.
Writing out the result is just a normal file.write() call.
For example:
with open('plaintext.text') as ptf:
plaintext = ptf.read()
with open('key.txt') as keyf:
key = keyf.read()
with open('output.txt') as f:
for plaintext_char, key_char in zip(plaintext, key):
# Do something to combine the characters
f.write(new_char)
So this might be overly complicated but
def cipher(file1 = 'english.txt',
file2 = 'secret.txt',
file3 = 'cipher.txt'):
fh1 = open(file1, 'r') # open the files
fh2 = open(file2, 'r')
fh3 = open(file3, 'w+') # write this file if it doesn't exist
ls1 = list() # initiate lists
ls2 = list()
for line in fh1: # add the charecters to the list
for char in line:
ls1.append(char)
for line in fh2:
for char in line:
ls2.append(char)
if ' ' in ls1: # remove blank spaces
ls1.remove(' ')
if ' ' in ls2:
ls2.remove(' ')
print ls1, ls2
for i in range(len(ls1)): # traverse through the list and write things! :)
fh3.write(ls1[i] + ' ' + ls2[i] + '\n')

to read line from file without getting "\n" appended at the end [duplicate]

This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 6 years ago.
My file is "xml.txt" with following contents:
books.xml
news.xml
mix.xml
if I use readline() function it appends "\n" at the name of all the files which is an error because I want to open the files contained within the xml.txt. I wrote this:
fo = open("xml.tx","r")
for i in range(count.__len__()): #here count is one of may arrays that i'm using
file = fo.readline()
find_root(file) # here find_root is my own created function not displayed here
error encountered on running this code:
IOError: [Errno 2] No such file or directory: 'books.xml\n'
To remove just the newline at the end:
line = line.rstrip('\n')
The reason readline keeps the newline character is so you can distinguish between an empty line (has the newline) and the end of the file (empty string).
From Best method for reading newline delimited files in Python and discarding the newlines?
lines = open(filename).read().splitlines()
You could use the .rstrip() method of string objects to get a version with trailing whitespace (including newlines) removed.
E.g.:
find_root(file.rstrip())
I timed it just for curiosity. Below are the results for a vary large file.
tldr;
File read then split seems to be the fastest approach on a large file.
with open(FILENAME, "r") as file:
lines = file.read().split("\n")
However, if you need to loop through the lines anyway then you probably want:
with open(FILENAME, "r") as file:
for line in file:
line = line.rstrip("\n")
Python 3.4.2
import timeit
FILENAME = "mylargefile.csv"
DELIMITER = "\n"
def splitlines_read():
"""Read the file then split the lines from the splitlines builtin method.
Returns:
lines (list): List of file lines.
"""
with open(FILENAME, "r") as file:
lines = file.read().splitlines()
return lines
# end splitlines_read
def split_read():
"""Read the file then split the lines.
This method will return empty strings for blank lines (Same as the other methods).
This method may also have an extra additional element as an empty string (compared to
splitlines_read).
Returns:
lines (list): List of file lines.
"""
with open(FILENAME, "r") as file:
lines = file.read().split(DELIMITER)
return lines
# end split_read
def strip_read():
"""Loop through the file and create a new list of lines and removes any "\n" by rstrip
Returns:
lines (list): List of file lines.
"""
with open(FILENAME, "r") as file:
lines = [line.rstrip(DELIMITER) for line in file]
return lines
# end strip_readline
def strip_readlines():
"""Loop through the file's read lines and create a new list of lines and removes any "\n" by
rstrip. ... will probably be slower than the strip_read, but might as well test everything.
Returns:
lines (list): List of file lines.
"""
with open(FILENAME, "r") as file:
lines = [line.rstrip(DELIMITER) for line in file.readlines()]
return lines
# end strip_readline
def compare_times():
run = 100
splitlines_t = timeit.timeit(splitlines_read, number=run)
print("Splitlines Read:", splitlines_t)
split_t = timeit.timeit(split_read, number=run)
print("Split Read:", split_t)
strip_t = timeit.timeit(strip_read, number=run)
print("Strip Read:", strip_t)
striplines_t = timeit.timeit(strip_readlines, number=run)
print("Strip Readlines:", striplines_t)
# end compare_times
def compare_values():
"""Compare the values of the file.
Note: split_read fails, because has an extra empty string in the list of lines. That's the only
reason why it fails.
"""
splr = splitlines_read()
sprl = split_read()
strr = strip_read()
strl = strip_readlines()
print("splitlines_read")
print(repr(splr[:10]))
print("split_read", splr == sprl)
print(repr(sprl[:10]))
print("strip_read", splr == strr)
print(repr(strr[:10]))
print("strip_readline", splr == strl)
print(repr(strl[:10]))
# end compare_values
if __name__ == "__main__":
compare_values()
compare_times()
Results:
run = 1000
Splitlines Read: 201.02846901328783
Split Read: 137.51448011841822
Strip Read: 156.18040391519133
Strip Readline: 172.12281272950372
run = 100
Splitlines Read: 19.956802833188124
Split Read: 13.657361738959867
Strip Read: 15.731161020969516
Strip Readlines: 17.434831199281092
run = 100
Splitlines Read: 20.01516321280158
Split Read: 13.786344555543899
Strip Read: 16.02410587620824
Strip Readlines: 17.09326775703279
File read then split seems to be the fastest approach on a large file.
Note: read then split("\n") will have an extra empty string at the end of the list.
Note: read then splitlines() checks for more then just "\n" possibly "\r\n".
It's better style to use a context manager for the file, and len() instead of calling .__len__()
with open("xml.tx","r") as fo:
for i in range(len(count)): #here count is one of may arrays that i'm using
file = next(fo).rstrip("\n")
find_root(file) # here find_root is my own created function not displayed here
To remove the newline character fro the end you could also use something like this:
for line in file:
print line[:-1]
A use case with #Lars Wirzenius's answer:
with open("list.txt", "r") as myfile:
for lines in myfile:
lines = lines.rstrip('\n') # the trick
try:
with open(lines) as myFile:
print "ok"
except IOError as e:
print "files does not exist"
# mode : 'r', 'w', 'a'
f = open("ur_filename", "mode")
for t in f:
if(t):
fn.write(t.rstrip("\n"))
"If" condition will check whether the line has string or not, if yes next line will strip the "\n" at the end and write to a file.
Code Tested. ;)

How to read a text file into a string variable and strip newlines?

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.
You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()
In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')
You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951
To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])
with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.
This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()
I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)
It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])
I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])
you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank
You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.
This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()
f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)
python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]
Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed
Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)
To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware
I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])
Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()
Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']
with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)
from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())
This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.
file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str
Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Categories

Resources