I have a text file which is about 400,000 lines long. I need to import this text file into a program which only accepts text files which are delimited with spaces or tabs, but this text file is delimited with semi-colons. There is no option in the program I am exporting the text file from (Arcmap) to change the delimination and doing find and replace in the text file itself will literally take 2 days.
I have searched for a script to do this but they all seem to replace the whole LINE of the word file with a space, instead of individually replacing each semi-colon, Leaving me with an empty text file.
Here is a sample of my text file:
"OID_";"POINTID";"GRID_CODE";"POINT_X";"POINT_Y"
;1;-56.000000;200900.250122;514999.750122
;2;-56.000000;200900.750122;514999.750122
;3;-56.000000;200901.250122;514999.750122
;4;-57.000000;200901.750122;514999.750122
;5;-57.000000;200902.250122;514999.750122
;6;-57.000000;200902.750122;514999.750122
;7;-57.000000;200903.250122;514999.750122
;8;-57.000000;200903.750122;514999.750122
;9;-57.000000;200904.250122;514999.750122
;10;-57.000000;200904.750122;514999.750122
I need it to look something like this:
1 -56.000000 200900.250122 514999.750122
2 -56.000000 200900.750122 514999.750122
How about this:
sed -i 's/;/ /g' yourBigFile.txt
This is not a Python solution. You have to start this in a shell. But if you use Notepad, I guess you are on Windows. So here a Python solution:
f1 = open('yourBigFile.txt', 'r')
f2 = open('yourBigFile.txt.tmp', 'w')
for line in f1:
f2.write(line.replace(';', ' '))
f1.close()
f2.close()
with Python, you can use fileinput.
import fileinput
for line in fileinput.FileInput("file",inplace=1):
line = line.replace(";"," ")
print line,
this will replace all your ";" to spaces in place.
Python 3.2 has added ability to use this as context manager, so that the files that fail during processing for some reason will always get closed:
import fileinput
def main():
with fileinput.input(inplace=True) as f:
for line in f:
line = line.replace(";", " ")
print(line, end='')
(inspiration)
Use it by supplying it with the text file you want to process.
Related
I am trying to delete all blank lines in all YAML files in a folder. I have multiple lines with nothing but CRLF (using Notepad++), and I can't seem to eliminate these blank lines. I researched this before posting, as always, but I can't seem to get this working.
import glob
import re
path = 'C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml'
for fname in glob.glob(path):
with open(fname, 'r') as f:
sfile = f.read()
for line in sfile.splitlines(True):
line = sfile.rstrip('\r\n')
f = open(fname,'w')
f.write(line)
f.close()
Here is a view in Notepad++
I want to delete the very first row shown here, as well as all other blank rows. Thanks.
If you use python, you can update the line using:
re.sub(r'[\s\r\n]','',line)
Close the reading file handler before writing.
If you use Notepad++, install the plugin called TextFX.
Replace all occurances of \r\n with blank.
Select all the text
Use the new menu TextFX -> TextFX Edit -> E:Delete Blank Lines
I hope this helps.
You cant write the file you are currently reading in. Also you are stripping things via file.splitlines() from each line - this way you'll remove all \r\n - not only those in empty lines. Store content in a new name and delete/rename the file afterwards:
Create demo file:
with open ("t.txt","w") as f:
f.write("""
asdfb
adsfoine
""")
Load / create new file from it:
with open("t.txt", 'r') as r, open("q.txt","w") as w:
for l in r:
if l.strip(): # only write line if when stripped it is not empty
w.write(l)
with open ("q.txt","r") as f:
print(f.read())
Output:
asdfb
adsfoine
( You need to strip() lines to see if they contain spaces and a newline. )
For rename/delete see f.e. How to rename a file using Python and Delete a file or folder
import os
os.remove("t.txt") # remove original
os.rename("q.txt","t.txt") # rename cleaned one
It's nice and easy...
file_path = "C:\\Users\\ryans\\OneDrive\\Desktop\\output\\*.yaml"
with open(file_path,"r+") as file:
lines = file.readlines()
file.seek(0)
for i in lines:
if i.rstrip():
file.write(i)
Where you open the file, read the lines, and if they're not blank write them back out again.
Basically, I want a script that opens a file, and then goes through the file and sees if the file contains any curse words. If a line in the file contains a curse word, then I want to replace that line with "CENSORED". So far, I think I'm just messing up the code somehow because I'm new to Python:
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
with open(filename)as fi:
for line in fi:
if censor in line:
fi.write(fi.replace(line, "CENSORED"))
print(fi)
I am new to this, so I'm probably just messing something up...
Line, as in This code (if "Hat" was a curse word):
There Is
A
Hat
Would be:
There Is
A
CENSORED
You cannot write to the same file your are reading, for two reasons:
You opened the file in read-only mode, you cannot write to such a file. You'd have to open the file in read-write mode (using open(filename, mode='r+')) to be able to do what you want.
You are replacing data as you read, with lines that are most likely going to be shorter or longer. You cannot do that in a file. For example, replacing the word cute with censored would create a longer line, and that would overwrite not just the old line but the start of the next line as well.
You need to write out your changed lines to a new file, and at the end of that process replace the old file with the new.
Note that your replace() call is also incorrect; you'd call it on the line:
line = line.replace(censor, 'CENSORED')
The easiest way for you to achieve what you want is to use the fileinput module; it'll let you replace a file in-place, as it'll handle writing to another file and the file swap for you:
import fileinput
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
for line in fileinput.input(filename, inplace=True):
line = line.replace(censor, 'CENSORED')
print(line, end='')
The print() call is a little magic here; the fileinput module temporarily replaces sys.stdout meaning that print() will write to the replacement file rather than your console. The end='' tells print() not to include a newline; that newline is already part of the original line read from the input file.
Consider:
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
# Open the file, iterate through the lines and censor them, storing them in lines list
with open(filename) as f:
lines = [line.replace(censor, 'CENSORED').strip() for line in f]
# If you want to re-write the censored file, re-open it, and write the lines
with open(filename, 'w') as f:
f.write('\n'.join(lines))
We're using a list comprehension to censor the lines of the file.
If you want to replace the entire line, and not just the word, replace
lines = [line.replace(censor, 'CENSORED').strip() for line in f]
with
lines = ['CENSORED' if censor in line else line.strip() for line in f]
filename = input("Enter a file name: ")
censor = input("Enter the curse word that you want censored: ")
with open(filename)as fi:
for line in fi:
if censor in line:
print("CENSORED")
else:
print(line)
with open('filename.txt', 'r') as data:
the_lines = data.readlines()
with open('filename.txt', 'w') as data:
for line_content in the_lines:
if curse_word in line_content:
data.write('Censored')
else:
data.write(line_content)
You have only opened the file for reading. Some options:
Read the whole file in, do the replacement, and write it over the original file again.
Read the file line-by-line, process and write the lines to a new file, then delete the old file and rename in the new file.
Use the fileinput module, which does all the work for you.
Here's an example of the last option:
import fileinput,sys
for line in fileinput.input(inplace=1):
line = line.replace('bad','CENSORED')
sys.stdout.write(line)
And use:
test.py file1.txt file2.txt file3.txt
Each file will be edited in place.
I'm supposed to open a file, read it line per line and display the lines out.
Here's the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
csv_read_line = open(in_path, "rb").read().split("\n")
line_number = 0
for line in csv_read_line:
line_number+=1
print str(line_number) + line
Here's the contents of the input file:
12345^67890^abcedefg
random^test^subject
this^sucks^crap
And here's the result:
this^sucks^crapjectfg
Some weird combo of all three. In addition to this, the result of line_number is missing. Printing out the result of len(csv_read_line) outputs 1, for some reason, no matter how many is in the input file. Changing the split type from \n to ^ gives the expected output, though, so I'm assuming the problem is probably with the input file.
I'm using a Mac, and did both the python code and the input file (on Sublime Text) on the Mac itself.
Am I missing something?
You seem to be splitting on "\n" which isn't necessary, and could be incorrect depending on the line terminators used in the input file. Python includes functionality to iterate over the lines of a file one at a time. The advantages are that it will worry about processing line terminators in a portable way, as well as not requiring the entire file to be held in memory at once.
Further, note that you are opening the file in binary mode (the b character in your mode string) when you actually intend to read the file as text. This can cause problems similar to the one you are experiencing.
Also, you do not close the file when you are done with it. In this case that isn't a problem, but you should get in the habit of using with blocks when possible to make sure the file gets closed at the earliest possible time.
Try this:
with open(in_path, "r") as f:
line_number = 0
for line in f:
line_number += 1
print str(line_number) + line.rstrip('\r\n')
So your example just works for me.
But then, i just copied your text into a text editor on linux, and did it that way, so any carriage returns will have been wiped out.
Try this code though:
import os
in_path = "input.txt"
with open(in_path, "rb") as inputFile:
for lineNumber, line in enumerate(inputFile):
print lineNumber, line.strip()
It's a little cleaner, and the for line in file style deals with line breaks for you in a system independent way - Python's open has universal newline support.
I'd try the following Pythonic code:
#!/usr/bin/env python
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
with open(in_path, 'rb') as f:
for i, line in enumerate(f):
print(str(i) + line)
There are several improvements that can be made here to make it more idiomatic python.
import csv
in_path = "../vas_output/Glyph/20140623-FLYOUT_mins_cleaned.csv"
out_path = "../vas_gender/Glyph/"
#Lets open the file and make sure that it closes when we unindent
with open(in_path,"rb") as input_file:
#Create a csv reader object that will parse the input for us
reader = csv.reader(input_file,delimiter="^")
#Enumerate over the rows (these will be lists of strings) and keep track of
#of the line number using python's built in enumerate function
for line_num, row in enumerate(reader):
#You can process whatever you would like here. But for now we will just
#print out what you were originally printing
print str(line_num) + "^".join(row)
When I run the following in the Python IDLE Shell:
f = open(r"H:\Test\test.csv", "rb")
for line in f:
print line
#this works fine
however, when I run the following for a second time:
for line in f:
print line
#this does nothing
This does not work because you've already seeked to the end of the file the first time. You need to rewind (using .seek(0)) or re-open your file.
Some other pointers:
Python has a very good csv module. Do not attempt to implement CSV parsing yourself unless doing so as an educational exercise.
You probably want to open your file in 'rU' mode, not 'rb'. 'rU' is universal newline mode, which will deal with source files coming from platforms with different line endings for you.
Use with when working with file objects, since it will cleanup the handles for you even in the case of errors. Ex:
.
with open(r"H:\Test\test.csv", "rU") as f:
for line in f:
...
You can read the data from the file in a variable, and then you can iterate over this data any no. of times you want to in your script. This is better than doing seek back and forth.
f = open(r"H:\Test\test.csv", "rb")
data = f.readlines()
for line in data:
print line
for line in data:
print line
Output:
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
# This is test.csv
Line1,This is line 1, there are, some numbers here,321423423
Line2,This is line2 , there are some characters here,sdfdsfdsf
Because you've gone all the way through the CSV file, and the iterator is exhausted. You'll need to re-open it before the second loop.
I'm very new to programming (obviously) and really advanced computer stuff in general. I've only have basic computer knowledge, so I decided I wanted to learn more. Thus I'm teaching myself (through videos and ebooks) how to program.
Anyways, I'm working on a piece of code that will open a file, print out the contents on the screen, ask you if you want to edit/delete/etc the contents, do it, and then re-print out the results and ask you for confirmation to save.
I'm stuck at the printing the contents of the file. I don't know what command to use to do this. I've tried typing in several commands previously but here is the latest I've tried and no the code isn't complete:
from sys import argv
script, filename = argv
print "Who are you?"
name = raw_input()
print "What file are you looking for today?"
file = raw_input()
print (file)
print "Ok then, here's the file you wanted."
print "Would you like to delete the contents? Yes or No?"
I'm trying to write these practice codes to include as much as I've learned thus far. Also I'm working on Ubuntu 13.04 and Python 2.7.4 if that makes any difference. Thanks for any help thus far :)
Opening a file in python for reading is easy:
f = open('example.txt', 'r')
To get everything in the file, just use read()
file_contents = f.read()
And to print the contents, just do:
print (file_contents)
Don't forget to close the file when you're done.
f.close()
Just do this:
>>> with open("path/to/file") as f: # The with keyword automatically closes the file when you are done
... print f.read()
This will print the file in the terminal.
with open("filename.txt", "w+") as file:
for line in file:
print line
This with statement automatically opens and closes it for you and you can iterate over the lines of the file with a simple for loop
How to read and print the content of a txt file
Assume you got a file called file.txt that you want to read in a program and the content is this:
this is the content of the file
with open you can read it and
then with a loop you can print it
on the screen. Using enconding='utf-8'
you avoid some strange convertions of
caracters. With strip(), you avoid printing
an empty line between each (not empty) line
You can read this content: write the following script in notepad:
with open("file.txt", "r", encoding="utf-8") as file:
for line in file:
print(line.strip())
save it as readfile.py for example, in the same folder of the txt file.
Then you run it (shift + right click of the mouse and select the prompt from the contextual menu) writing in the prompt:
C:\examples> python readfile.py
You should get this. Play attention to the word, they have to be written just as you see them and to the indentation. It is important in python. Use always the same indentation in each file (4 spaces are good).
output
this is the content of the file
with open you can read it and
then with a loop you can print it
on the screen. Using enconding='utf-8'
you avoid some strange convertions of
caracters. With strip(), you avoid printing
an empty line between each (not empty) line
to input a file:
fin = open(filename) #filename should be a string type: e.g filename = 'file.txt'
to output this file you can do:
for element in fin:
print element
if the elements are a string you'd better add this before print:
element = element.strip()
strip() remove notations like this: /n
print ''.join(file('example.txt'))
This will give you the contents of a file separated, line-by-line in a list:
with open('xyz.txt') as f_obj:
f_obj.readlines()
It's pretty simple
#Opening file
f= open('sample.txt')
#reading everything in file
r=f.read()
#reading at particular index
r=f.read(1)
#print
print(r)
Presenting snapshot from my visual studio IDE.
single line to read/print contents of a file
reading file : example.txt
print(open('example.txt', 'r').read())
output:
u r reading the contents of example.txt file
Reading and printing the content of a text file (.txt) in Python3
Consider this as the content of text file with the name world.txt:
Hello World! This is an example of Content of the Text file we are about to read and print
using python!
First we will open this file by doing this:
file= open("world.txt", 'r')
Now we will get the content of file in a variable using .read() like this:
content_of_file= file.read()
Finally we will just print the content_of_file variable using print command.
print(content_of_file)
Output:
Hello World! This is an example of Content of the Text file we are about to read and print
using python!