Processing Russian text file fails

Processing Russian text file fails - python

I have this code:
# -*- coding: utf-8 -*-
import codecs
prefix = u"а"
rus_file = "rus_names.txt"
output = "rus_surnames.txt"
with codecs.open(rus_file, 'r', 'utf-8') as infile:
with codecs.open(output, 'a', 'utf-8') as outfile:
for line in infile.readlines():
outfile.write(line+prefix)
And it gives me smth kinda chineese text in an output file. Even when I try to outfile.write(line) it gives me the same crap in an output. I just don't get it.
The purpose: I have a huge file with male surnames. I need to get the same file with female surnames. In russian it looks like this: Ivanov - Ivanova | Иванов - Иванова

Try
lastname = str(line+prefix, 'utf-8')
outfile.write(lastname)

So #AndreyAtapin was partially right. I've tried to add lines in a file which contains my previous mistakes with chineese characters. even flushing the file didn't help. But when I delete it and script creates it once again, it works! thanks.

Related

How to display cyrillic text from a file in python?

I want to read some cyrilic text from a txt file in Python 3.
This is what the text file contains.
абцдефгчийклмнопярстувшхыз
I used:
with open('text.txt', 'r') as myfile:
text=myfile.read()
print (text)
But this is the ouput in the python shell:
ÿþ01F45D3G89:;<=>?O#ABC2HEK7
Can someone explain why this is the output?

Python supports utf-8 for this sort of thing.
You should be able to do:
with open('text.txt', encoding = 'utf-8', mode = 'r') as my_file:
...
Also, be sure that your text file is saved with utf-8 encoding. I tested this in my shell and without proper encoding my output was:
?????????????????????
With proper encoding:
file = open('text.txt', encoding='utf-8', mode='r')
text = file.read()
print(text)
абцдефгчийклмнопярстувшхы

Try working on the file using codecs, you need to
import codecs
and then do
text = codecs.open('text.txt', 'r', 'utf-8')
Basically you need utf8

python read from chinese file err

I make read from chinese
but ,it didn't run normally
The code is below:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
#read from file
file=open('temp2','rb',encoding='utf-8')
lines=file.readlines()
for line in lines:
print(line)
file.close()
this file content is below:
http://www.sina.com.cn/intro/copyright.shtml
新浪新闻
国内、国际。
国内、国际。

You should change utf-8 to utf8. - is not needed.
Also, please change rb to r if the text file is not saved in binary format.

Korean txt file encoding with utf-8

I'm trying to process a Korean text file with python, but it fails when I try to encode the file with utf-8.
#!/usr/bin/python
#-*- coding: utf-8 -*-
f = open('tag.txt', 'r', encoding='utf=8')
s = f.readlines()
z = open('tagresult.txt', 'w')
y = z.write(s)
z.close
=============================================================
Traceback (most recent call last):
File "C:\Users\******\Desktop\tagging.py", line 5, in <module>
f = open('tag.txt', 'r', encoding='utf=8')
TypeError: 'encoding' is an invalid keyword argument for this function
[Finished in 0.1s]
==================================================================
And when I just opens a Korean txt file encoded with utf-8, the fonts are broken like this. What can I do?
\xc1\xc1\xbe\xc6\xc1\xf6\xb4\xc2\n',
'\xc1\xc1\xbe\xc6\xc7\xcf\xb0\xc5\xb5\xe7\xbf\xe4\n',
'\xc1\xc1\xbe\xc6\xc7\xcf\xbd\xc3\xb4\xc2\n',
'\xc1\xcb\xbc\xdb\xc7\xd1\xb5\xa5\xbf\xe4\n',
'\xc1\xd6\xb1\xb8\xbf\xe4\

I don't know Korean, and don't have sample string to try, but here are some advices for you:
1
f = open('tag.txt', 'r', encoding='utf=8')
You have a typo here, utf-8 not utf=8, this explains for the exception you got.
The default mode of open() is 'r' so you don't have to define it again.
2 Don't just use open, you should use context manager statement to manage the opening/closing file descriptor, like this:
with open('tagresult.txt', 'w') as f:
f.write(s)

In Python 2 the open function does not take an encoding parameter. Instead you read a line and convert it to unicode. This article on kitchen (as in kitchen sink) modules provides details and some lightweight utilities to work with unicode in python 2.x.

How to convert hex string to plain text?

How to read a hex file and convert it to plain text?
For example, this is my file user.dat.(For China mainland user.dat)
And here is what I have tried so far:
# -*- coding:utf-8 -*-
with open('user.dat','rb') as f:
data = f.read()
print data
And the result is like this.Some is right,while some is not.
How to get the entire right content?

just add this line of in your code, str.decode('hex') will decode string into plain text.
output = data.decode('hex')
print output
OK you have some error so try this...
import binascii
with open('user.dat', 'rb') as f:
data = f.read()
print(binascii.hexlify(data))

Confusing Error when Reading from a File in Python

I'm having a problem opening the names.txt file. I have checked that I am in the correct directory. Below is my code:
import os
print(os.getcwd())
def alpha_sort():
infile = open('names', 'r')
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
infile.close()
return 0
alpha_sort()
And the error I got:
FileNotFoundError: [Errno 2] No such file or directory: 'names'
Any ideas on what I'm doing wrong?

You mention in your question body that the file is "names.txt", however your code shows you trying to open a file called "names" (without the ".txt" extension). (Extensions are part of filenames.)
Try this instead:
infile = open('names.txt', 'r')

As a side note, make sure that when you open files you use universal mode, as windows and mac/unix have different representations of carriage returns (/r/n vs /n etc.). Universal mode gets python to handle this, so it's generally a good idea to use it whenever you need to read a file. (EDIT - should read: a text file, thanks cameron)
So the code would just look like this
infile = open( 'names.txt', 'rU' ) #capital U indicated to open the file in universal mode

This doesn't solve that issue, but you might consider using with when opening files:
with open('names', 'r') as infile:
string = infile.read()
string = string.replace('"','')
name_list = string.split(',')
name_list.sort()
return 0
This closes the file for you and handles exceptions as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Processing Russian text file fails - python

Try lastname = str(line+prefix, 'utf-8') outfile.write(lastname)

So #AndreyAtapin was partially right. I've tried to add lines in a file which contains my previous mistakes with chineese characters. even flushing the file didn't help. But when I delete it and script creates it once again, it works! thanks.

Related

How to display cyrillic text from a file in python?

python read from chinese file err

Korean txt file encoding with utf-8

How to convert hex string to plain text?

Confusing Error when Reading from a File in Python

Categories

Resources