How to convert _io.TextIOWrapper to string? - python

I read the text format using below code,
f = open("document.txt", "r+", encoding='utf-8-sig')
f.read()
But the type of f is _io.TextIOWrapper. But I need type as string to move on.
Please help me to convert _io.TextIOWrapper to string.

You need to use the output of f.read().
string = f.read()
I think your confusion is that f will be turned into a string just by calling its method .read(), but that's not the case. I don't think it's even possible for builtins to do that.
For reference, _io.TextIOWrapper is the class of an open text file. See the documentation for io.TextIOWrapper.
By the way, best practice is to use a with-statement for opening files:
with open("document.txt", "r", encoding='utf-8-sig') as f:
string = f.read()

This is good:
with open(file, 'r', encoding='utf-8-sig') as f:
data = f.read()
This is not good:
with open(file, 'r', encoding='utf-8-sig') as file:
data = file.read()

It's not a super elegant solution but it works for me
def extractPath(innie):
iggy = str(innie)
getridofme ="<_io.TextIOWrapper name='"
getridofmetoo ="' mode='r' encoding='UTF-8'>"
iggy = iggy.replace(getridofme, "")
iggy = iggy.replace(getridofmetoo, "")
#iggy.trim()
print(iggy)
return iggy

Related

How to substitute a variable for a string when using the open() function in python

I am beginner to python and coding in general, and am attempting to pass a variable as an argument to the open() function.
regularly I would write something like this:
f = open("text.txt" , "r")
print(f.read())
however I would like to do something along these lines:
var = "text.txt"
f = open("var", "r")
print(f.read())
Any explanations or resources would be very helpful, thanks in advance
f = open("var", "r")
is wrong, var is a variable so you should use
f = open(var, "r")
#################################################
# Generate two files to use in the demonstration.
#################################################
filename_1 = 'test1.txt'
filename_2 = 'test2.txt'
with open(filename_1, 'w') as f_out:
test_text = '''
I am beginner to python and coding in general, and am attempting to pass a
\nvariable as an argument to the open() function.
'''
f_out.write(test_text)
with open(filename_2, 'w') as f_out:
test_text = '''
How to substitute a variable for a string when using the open() function in python
'''
f_out.write(test_text)
#############################################
# Demonstrate opening files using a variable.
#############################################
with open(filename_1,'r') as f_in:
print(f_in.read())
with open(filename_2, 'r') as f_in:
print(f_in.read())

Cannot save string to file with `\n` characters

The following code produces a file with content test\\nstring, but I need the file to contain test\nstring. I can't figure out a way to replace the \\symbol either.
s = "test\nstring"
with open('test.txt', 'w') as f:
f.write(s)
How can I make sure that the file contains only \n instead of \\n?
use s = "test\\nstring"
I tried with the following code and worked.
s = "test\\nstring"
with open('test.txt', 'w') as f:
f.write(s)
and the test.txt file contains
test\nstring
Besides of escaping and raw string, you can encode it (2 or 3) with 'string_escape':
s = "test\nstring".encode('string_escape')
with open('test.txt', 'w') as f:
f.write(s)
The raw strings may help
s = r"test\nstring"
with open('test.txt', 'w') as f:
f.write(s)

How to find byte at specific index from file?

I need to print byte at specific position in file that i know path. So I open default file in "rb" mode and then I need to know what byte is on 15 position. It is posible ?
Here's how you can achieve this with seek:
with open('my_file', 'rb') as f:
f.seek(15)
f.read(1)
Another way you could do this is to read the entire document and slice it:
First read the contense of the file:
file = open('test.txt', 'rb')
a = file.read()
Then take the desired value:
b = a[14]
Then don't forget to close the file afterwards:
file.close()
Or so that is closes automatically:
with open('test.txt', 'rb') as file:
a = file.read()
b = a[14]

How can I replace UTF chars using Python?

I'm having a bad time with character encoding. It's kinda to understand why this happens when I open my .txt file:
Questions:
What's this type of encoding? Why this happens?
How can I rewrite my txt file to use normal accents or even without accents and special chars?
Is there any special library to handle this? I could create a huge function that will replace() all these chars, but I don't know when or which chars will appear in my future txts.
My code:
folder = 'E:\\WinPython\\notebooks\\scripts\\script1\\'
txtFile = folder + 'PROF_SAI_318_210117_310117_orig.txt'
with open(txtFile, 'r') as f:
with open('PROF_SAI_318_210117_310117_clean.txt', 'w') as g:
for line in f:
do_something() # what should I write here to 'clean' my file?
g.write(line)
print("Ok!")
Output excerpt:
SPLEONARDO SIM\xc3\x83O ESTARLING
GOFLORESTA S/A A\xc3\x87UCAR E ALCOOL
SPFOCO REPRESENTA\xc3\x87\xc3\x95ES E CONSULTORIA
It looks like you are using Notepad++ to display your file. The encoding displayed looks like cp1252:
>>> b'COMUNICA\xc7\xc3O M\xc1QUINAS'.decode('cp1252')
'COMUNICAÇÃO MÁQUINAS'
In Notepad++, on the menu select Encoding->Character sets->Western European->Windows-1252 and your file should display correctly.
Here's an example that converts to UTF-8 (your output excerpt):
>>> b'SPLEONARDO SIM\xc3O ESTARLING'.decode('cp1252')
'SPLEONARDO SIMÃO ESTARLING'
>>> b'SPLEONARDO SIM\xc3O ESTARLING'.decode('cp1252').encode('utf8')
b'SPLEONARDO SIM\xc3\x83O ESTARLING'
For your example code, you can do:
with open(txtFile, 'r', encoding='cp1252') as f:
with open('PROF_SAI_318_210117_310117_clean.txt', 'w', encoding='utf8') as g:
for line in f:
g.write(line)
If your files aren't too large, you can just do:
with open(txtFile, 'r', encoding='cp1252') as f:
with open('PROF_SAI_318_210117_310117_clean.txt', 'w', encoding='utf8') as g:
g.write(f.read())

Python 3 Replace Lines

I have a little trouble with Python. Here is code:
f = open('/path/to/file', 'r')
filedata = f.read()
f.close()
postgres = filedata.replace('# DBENGINE=MYSQL', 'DBENGINE=PGSQL')
dbname = filedata.replace('# DBNAME=DB1', 'DBNAME=DB1')
dbrwuser = filedata.replace('# DBRWUSER="user1"', 'DBRWUSER="user1"')
f = open('/path/to/file', 'w')
f.write(postgres)
f.write(dbname)
f.write(dbrwuser)
f.close()
As you can see I'm trying to read a big file and when I try to replace it it just replaces "Postgres" and it does not make change "dbname, dbrwuser" etc. So I tried to figured it out but couldn't do it.
Any idea or sample?
Thanks.
You make three copies of the input instead of replacing it each time. Use the following:
filedata = filedata.replace('# DBENGINE=MYSQL', 'DBENGINE=PGSQL')
filedata = filedata.replace('# DBNAME=DB1', 'DBNAME=DB1')
filedata = filedata.replace('# DBRWUSER="user1"', 'DBRWUSER="user1"')
...
f.write(filedata)
You can also do it by take all replacement string into a dictonary
import re
repString = {'# DBENGINE=MYSQL': 'DBENGINE=PGSQL', '# DBNAME=DB1': 'DBNAME=DB1', '# DBRWUSER="user1"': 'DBRWUSER="user1"'}
repString = dict((re.escape(k), v) for k, v in repString.iteritems())
pattern = re.compile("|".join(repString.keys()))
filedata = pattern.sub(lambda m: repString[repString.escape(m.group(0))], filedata)
f = open('/path/to/file', 'w')
f.write(filedata)
f.close()
A few suggestions and clarifications:
f.read() reads the entire file. This is probably not a good idea for large files. Instead, use
with open(filename, "r") as f:
for line in f:
# do something with the line
Using with open() also eliminates the need for closing the file afterwards - it's done automatically.
string.replace() returns the entire string with the first argument replaced by the second. Since you make a new variable every time you use replace, the changes only apply to the individual variables. Changes made in postgres will not exist in dbname.
Instead, redefine the variable filedata for every replace to keep the changes and avoid needless copying:
filedata = filedata.replace('# DBENGINE=MYSQL', 'DBENGINE=PGSQL')
filedata = filedata.replace('# DBNAME=DB1', 'DBNAME=DB1')
filedata = filedata.replace('# DBRWUSER="user1"', 'DBRWUSER="user1"')
# at this point, filedata contains all three changes
When you open a file for writing using the w option, the file is overwritten. This means that the file will only contain the contents written by your last write, f.write(dbrwuser). Instead, make your changes and only write once or append to the file instead:
filedata = filedata.replace('# DBENGINE=MYSQL', 'DBENGINE=PGSQL')
...
...
with open('/path/to/file', 'w') as f:
f.write(filedata)

Categories

Resources