How to delete non-ASCII characters in a text file?

How to delete non-ASCII characters in a text file? - python

I have this .log file that I changed the extension name into .txt file but it still reads as log file
but after I copied it and paste it a new editor and saved it as .txt file.. this is what it showed:
Somebody told me that it is a non-ASCII characters that I should delete. Is there any way to delete it or any way to copy the contents of a log file then place it in a text file using python?

In Python you can specify the input encoding.
with open('trendx.log', 'r', encoding='utf-16le') as reader, \
open('trendx.txt', 'w') as writer:
for line in reader:
if "ROW" in line:
writer.write(line)
I have obviously copied over some stuff from your earlier questions. Kudos for finally identifying the actual problem.
Notice in particular how we avoid reading the entire file into memory and instead processing a line at a time.

Related

Python script not finding value in log file when the value is in the file

The code below is meant to find any xls or csv file used in a process. The .log file contains full paths with extensions and definitely contains multiple values with "xls" or "csv". However, Python can't find anything...Any idea? The weird thing is when I copy the content of the log file and paste it to another notepad file and save it as log, it works then...
infile=r"C:\Users\me\Desktop\test.log"
important=[]
keep_words=["xls","csv"]
with open(infile,'r') as f:
for line in f:
for word in keep_words:
if word in line:
important.append(line)
print(important)

I was able to figure it out...encoding issue...
with io.open(infile,encoding='utf16') as f:

You must change the line
for line in f:
to
for line in f.readlines():
You made the python search in the bytes opened file, not in his content, even in his lines (in a list, just like the readlines method);
I hope I was able to help (sorry about my bad English).

Python: Putting Charater in front or behind a file

I want to write a couple characters into a file where there is already text inside. What would be the code to add characters to the front of the file and to the back of the text file if I want the text that was initially in the file to remain in the center?

To add some text to the end of your file, simply open it in append mode and then write to it as usual.
open('file.txt', 'a')
If you want to add something to the beginning of the file, and you don't mind loading the contents of the file temporarily into memory.
addedText = 'Hello World!'
with open('file.txt', 'r+') as myFile:
filecontents = myFile.read()
myFile.seek(0,0)
f.write(addedText.rstrip('\r\n') + '\n' + filecontents)

When you want to open a file and keep its content you have to open the file in append mode. Also have a look at:
file.seek (can be used to set the files current position)

There is no function in any knows underlying file systems that allows to insert bytes into a file. You can only :
add bytes (characters) at the end of the file (append mode)
rewrite bytes in place anywhere in the file
truncate a file at current position.
So if you want to add anything not at the end of the file, the common way (that is used by many text editors) is :
rename the old file to a temp name (it is known as a backup copy)
create a new file with the original name and write what you want to it (here the prefix, the original content and the postfix)
(optionaly) delete the backup copy.
That way allows you to recover your file even if bad things occur while writing the new copy : you can at least get the previous copy and restart your edition.

Convert shift_jis to utf-8

I have a bunch of txt files that is encoded in shift_jis, I want to convert them to utf-8 encoding so the special characters can display properly. This has been probably asked before, but I can't seem to get it right.
Update: I changed my code so it first write to a list then it will write the content from the list.
words = []
with codecs.open("dummy.txt", mode='r+', encoding='shiftjis') as file:
words = file.read()
file.seek(0)
for line in words:
file.write(line.encode('utf-8'))
However now I get runtime error, the program just crashes.
Upon further investigation, it seems like the "file.seek(0)" has caused the program to crash. The program runs without error if this line is commented. I don't know why it is so. How is it causing errors?

You can't read and write from the same file at the same time like this. That's why its not working. Input and output is buffered, and the file objects share the same file pointer, so it's hard to predict what would happen. You either need to write the output to a different file or read the entire file into memory, close it, reopen it and write it back out.
with codecs.open("dummy.txt", mode='r', encoding='shiftjis') as file:
lines = file.read()
with codecs.open("dummy.txt", mode='w') as file:
for line in lines:
file.write(line)

Script to search and replace strings in a flie

I'm new to Python and am struggling to understand why this program
#!/usr/bin/env python
infile = open('/usr/src/scripts/in_file.conf')
outfile = open('/usr/src/scripts/in_file.conf', 'w')
replacements = {'abcd':'ABCD', '1234':'bob'}
for line in infile:
for src, target in replacements.items():
line = line.replace(src, target)
outfile.write(line)
infile.close()
outfile.close()
results in a blank file after script execution.
The original in_file.conf is:
testfile of junk
abcd
******************
1234
*************
Correct me if i'm wrong, but it is my understanding that the script opens the in_file.conf and loads the contents into two temporary files in memory, infile & outfile. the dictionary type variable replacements acts like an array to hold the "to find" and to "replace" string.
It loops over each line then a nested loop goes down the line and loads the variables src and target with the contents of the replacement variable (like an array); then writes the line, until all the lines are written.
Am I way off in my understanding?
The in_file.conf is in the same directory as the script, could it just not finding the in_file.conf and writing a blank file?
I told you i was new to python.
Kind Regards,
Reggie.

The problem is that you're opening the same file in read mode and then in write mode (which truncates the file). You should ideally have a different file for the output, but if you need the output to be in the same file, you can delete the old file and rename the new one afterwards.

Please use different files for infile and outfile. Opening a file in write mode will delete its contents. Because your infile and outfile are the same files, your file contents is deleted and your for loop is never run

How to delete a line from a file in Python

I tried to use this: Deleting a line from a file in Python. I changed the code a bit to suit my purposes, like so:
def deleteLine():
f=open("cata.testprog","w+")
content=f.readlines()
outp = []
print(f.readline())
for lines in f:
print(lines)
if(lines[:4]!="0002"):
#if the first four characters of a line do not equal
outp.append(lines)
print(outp)
f.writelines(outp)
f.close()
The problem is that the entire file gets replaced by an empty string and outp is equal to an empty list. My file is not empty, and I want to delete the line that begins with "0002". Printing f.readline() gave me an empty string. Does anyone have any idea of how to fix it?
Thanks in advance,

You changed the file handling completely compared to the linked question. There, the file is first opened to read the content, closed and then opened to write the processed content. Here you open it and you first use f.readlines(), so all the data of the file is in content and the file pointer is at the end of the file.

You have opened file in wrong mode: "w+". This mode is used for writing and reading, but at first it is truncated. At first open file for reading. If it is not huge file read it into memory, convert there and save it by opening file in write mode.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to delete non-ASCII characters in a text file? - python

Related

Python script not finding value in log file when the value is in the file

Python: Putting Charater in front or behind a file

Convert shift_jis to utf-8

Script to search and replace strings in a flie

How to delete a line from a file in Python

Categories

Resources