I want to split a text file in python, using the following peice of code:
inputfile = open(sys.argv[1]).read()
for line in inputfile.strip().split("\n"):
print line
the problem is, that it's read the first 12 lines only!! the file is large more than 10 thousand lines!
What is the possible reason!
Thanks,
with open(sys.argv[1]) as inputfile:
for line in inputfile:
print(line)
Use readlines() which will generate list automatically and no need to read by "\n".
Try this:
text = r"C:\Users\Desktop\Test\Text.txt"
oFile = open(text, 'r')
line = oFile.readline()[:-1]
while line:
splitLine = line.split(' ')
print splitLine
line = oFile.readline()[:-1]
oFile.close()
I use this style to iterate through huge text files at work
Related
So I have this crazy long text file made by my crawler and it for some reason added some spaces inbetween the links, like this:
https://example.com/asdf.html (note the spaces)
https://example.com/johndoe.php (again)
I want to get rid of that, but keep the new line. Keep in mind that the text file is 4.000+ lines long. I tried to do it myself but figured that I have no idea how to loop through new lines in files.
Seems like you can't directly edit a python file, so here is my suggestion:
# first get all lines from file
with open('file.txt', 'r') as f:
lines = f.readlines()
# remove spaces
lines = [line.replace(' ', '') for line in lines]
# finally, write lines in the file
with open('file.txt', 'w') as f:
f.writelines(lines)
You can open file and read line by line and remove white space -
Python 3.x:
with open('filename') as f:
for line in f:
print(line.strip())
Python 2.x:
with open('filename') as f:
for line in f:
print line.strip()
It will remove space from each line and print it.
Hope it helps!
Read text from file, remove spaces, write text to file:
with open('file.txt', 'r') as f:
txt = f.read().replace(' ', '')
with open('file.txt', 'w') as f:
f.write(txt)
In #Leonardo Chirivì's solution it's unnecessary to create a list to store file contents when a string is sufficient and more memory efficient. The .replace(' ', '') operation is only called once on the string, which is more efficient than iterating through a list performing replace for each line individually.
To avoid opening the file twice:
with open('file.txt', 'r+') as f:
txt = f.read().replace(' ', '')
f.seek(0)
f.write(txt)
f.truncate()
It would be more efficient to only open the file once. This requires moving the file pointer back to the start of the file after reading, as well as truncating any possibly remaining content left over after you write back to the file. A drawback to this solution however is that is not as easily readable.
I had something similar that I'd been dealing with.
This is what worked for me (Note: This converts from 2+ spaces into a comma, but if you read below the code block, I explain how you can get rid of ALL whitespaces):
import re
# read the file
with open('C:\\path\\to\\test_file.txt') as f:
read_file = f.read()
print(type(read_file)) # to confirm that it's a string
read_file = re.sub(r'\s{2,}', ',', read_file) # find/convert 2+ whitespace into ','
# write the file
with open('C:\\path\\to\\test_file.txt', 'w') as f:
f.writelines('read_file')
This helped me then send the updated data to a CSV, which suited my need, but it can help for you as well, so instead of converting it to a comma (','), you can convert it to an empty string (''), and then [or] use a read_file.replace(' ', '') method if you don't need any whitespaces at all.
Lets not forget about adding back the \n to go to the next row.
The complete function would be :
with open(str_path, 'r') as file :
str_lines = file.readlines()
# remove spaces
if bl_right is True:
str_lines = [line.rstrip() + '\n' for line in str_lines]
elif bl_left is True:
str_lines = [line.lstrip() + '\n' for line in str_lines]
else:
str_lines = [line.strip() + '\n' for line in str_lines]
# Write the file out again
with open(str_path, 'w') as file:
file.writelines(str_lines)
I have a problem with a code in python. I want to read a .txt file. I use the code:
f = open('test.txt', 'r') # We need to re-open the file
data = f.read()
print(data)
I would like to read ONLY the first line from this .txt file. I use
f = open('test.txt', 'r') # We need to re-open the file
data = f.readline(1)
print(data)
But I am seeing that in screen only the first letter of the line is showing.
Could you help me in order to read all the letters of the line ? (I mean to read whole the line of the .txt file)
with open("file.txt") as f:
print(f.readline())
This will open the file using with context block (which will close the file automatically when we are done with it), and read the first line, this will be the same as:
f = open(“file.txt”)
print(f.readline())
f.close()
Your attempt with f.readline(1) won’t work because it the argument is meant for how many characters to print in the file, therefore it will only print the first character.
Second method:
with open("file.txt") as f:
print(f.readlines()[0])
Or you could also do the above which will get a list of lines and print only the first line.
To read the fifth line, use
with open("file.txt") as f:
print(f.readlines()[4])
Or:
with open("file.txt") as f:
lines = []
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
print(lines[-1])
The -1 represents the last item of the list
Learn more:
with statement
files in python
readline method
Your first try is almost there, you should have done the following:
f = open('my_file.txt', 'r')
line = f.readline()
print(line)
f.close()
A safer approach to read file is:
with open('my_file.txt', 'r') as f:
print(f.readline())
Both ways will print only the first line.
Your error was that you passed 1 to readline which means you want to read size of 1, which is only a single character. please refer to https://www.w3schools.com/python/ref_file_readline.asp
I tried this and it works, after your suggestions:
f = open('test.txt', 'r')
data = f.readlines()[1]
print(data)
Use with open(...) instead:
with open("test.txt") as file:
line = file.readline()
print(line)
Keep f.readline() without parameters.
It will return you first line as a string and move cursor to second line.
Next time you use f.readline() it will return second line and move cursor to the next, etc...
I have a raw data in a .txt file format and would like to convert it to .csv file format.
This is a sample data from the txt fle:
(L2-CR666 Reception Counter) L2-CR666 Reception Counter has been forced.
(L7-CR126 Handicapped Toilet) L7-CR126 Handicapped Toilet has been forced.
I would like to achieve the following result:
L2-CR666 Reception Counter, forced
L7-CR126 Handicapped Toilet, forced
I have tried the following code but was unable to achieve the stated result. Where did I went wrong?
import csv
with open('Converted Detection\\Testing 01\\2019-02-21.txt') as infile, open('Converted Detection\\Converted CSV\\log.csv', 'w') as outfile:
for line in infile:
outfile.write(infile.read().replace("(", ""))
for line in infile:
outfile.write(', '.join(infile.read().split(')')))
outfile.close()
You can try this :
with open('Converted Detection\\Testing 01\\2019-02-21.txt') as infile, open('Converted Detection\\Converted CSV\\log.csv', 'w') as outfile:
for line in infile:
# Get text inside ()
text = line[line.find("(")+1:line.find(")")]
# Remove \r\n
line = line.rstrip("\r\n")
# Get last word
forcedText = line.split(" ")[len(line.split(" "))-1]
# Remove . char
forcedText = forcedText[:len(forcedText)-1]
outfile.write(text+", "+forcedText+"\n")
outfile.close()
Best
You could use .partition() to truncate everything before ) and then simply replace the parts you do not want accordingly. Also, you do not have to close the file when using the with statement as it automatically closes it for you, and you do not have to import the csv library to save a file with the .csv extension.
The following code outputs your wanted result:
infile_path = "Converted Detection\\Testing 01\\2019-02-21.txt"
outfile_path = "Converted Detection\\Converted CSV\\log.csv"
with open(infile_path, "r") as infile, open(outfile_path, "+w") as outfile:
for line in infile:
line = line.partition(")")[2].replace(" has been forced.", ", forced").strip()
outfile.write(line + "\n")
First for loop is reading infile. No need to reread infile and second loop.
Also with block will take care of closing files.
for line in infile:
line = line.replace("(", "")
outfile.write(', '.join(line.split(')')))
I would suggest using:
lineout = ', '.join(linein.replace('(','').replace(')','').split(' has been ')
where:
linein = line.strip()
I'm writing code that goes over a text file counting how many words are in every line and having trouble putting the result (many lines that each consist ofa number) into a new text file.
My code:
in_file = open("our_input.txt")
out_file = open("output.txt", "w")
for line in in_file:
line = (str(line)).split()
x = (len(line))
x = str(x)
out_file.write(x)
in_file.close()
out_file.close()
But the file I'm getting has all the number together in one line.
How do I seperate them in the file I'm making?
You need to add a new line after each line :
out_file.write(x + '\n')
Also as a more pythonic way for dealing with files you can use with statement to open the files which will close the files at the end of the block.
And instead of multiple assignment and converting the length to string you can use str.format() method to do all of this jobs in one line:
with open("our_input.txt") as in_file,open("output.txt", "w") as out_file:
for line in in_file:
out_file.write('{}\n'.format(len(line.split())))
Add newline in the file while writing
in_file = open("our_input.txt")
out_file =open("output.txt", "w")
for line in in_file:
line= (str(line)).split()
x=(len(line))
x=str(x)
out_file.write(x)
#Write newline
out_file.write('\n')
in_file.close()
As the previous answers have pointed out, your need to write a newline to separate the ouput.
Here is yet another way to write the code
with open("our_input.txt") as in_file, open("output.txt", "w") as out_file:
res = map(lambda line: len(line.split()), in_file)
for r in res:
out_file.write('%d\n' % r)
I have a file that I am currently reading from using
fo = open("file.txt", "r")
Then by doing
file = open("newfile.txt", "w")
file.write(fo.read())
file.write("Hello at the end of the file")
fo.close()
file.close()
I basically copy the file to a new one, but also add some text at the end of the newly created file. How would I be able to insert that line say, in between two lines separated by an empty line? I.e:
line 1 is right here
<---- I want to insert here
line 3 is right here
Can I tokenize different sentences by a delimiter like \n for new line?
First you should load the file using the open() method and then apply the .readlines() method, which splits on "\n" and returns a list, then you update the list of strings by inserting a new string in between the list, then simply write the contents of the list to the new file using the new_file.write("\n".join(updated_list))
NOTE: This method will only work for files which can be loaded in the memory.
with open("filename.txt", "r") as prev_file, open("new_filename.txt", "w") as new_file:
prev_contents = prev_file.readlines()
#Now prev_contents is a list of strings and you may add the new line to this list at any position
prev_contents.insert(4, "\n This is a new line \n ")
new_file.write("\n".join(prev_contents))
readlines() is not recommended because it reads the whole file into memory. It is also not needed because you can iterate over the file directly.
The following code will insert Hello at line 2 at line 2
with open('file.txt', 'r') as f_in:
with open('file2.txt','w') as f_out:
for line_no, line in enumerate(f_in, 1):
if line_no == 2:
f_out.write('Hello at line 2\n')
f_out.write(line)
Note the use of the with open('filename','w') as filevar idiom. This removes the need for an explicit close() because it closes the file automatically at the end of the block, and better, it does this even if there is an exception.
For Large file
with open ("s.txt","r") as inp,open ("s1.txt","w") as ou:
for a,d in enumerate(inp.readlines()):
if a==2:
ou.write("hi there\n")
ou.write(d)
U could use a marker
#FILE1
line 1 is right here
<INSERT_HERE>
line 3 is right here
#FILE2
some text
with open("FILE1") as file:
original = file.read()
with open("FILE2") as input:
myinsert = input.read()
newfile = orginal.replace("<INSERT_HERE>", myinsert)
with open("FILE1", "w") as replaced:
replaced.write(newfile)
#FILE1
line 1 is right here
some text
line 3 is right here