Writing Russian characters to txt file using python [duplicate] - python

This question already has answers here:
Writing Unicode text to a text file?
(8 answers)
Closed 6 years ago.
I try to write to txt file list with russian string.(I get that with unique1 = np.unique(df['search_term']), it's numpy.ndarray)
thefile = open('search_term.txt', 'w')
for item in unique1:
thefile.write("%s\n" % item)
But in list this string looks correct. But after writing it looks like
предметов berger bg bg045-14 отзывы
звезд
воронеж
Why a get that?

Try writing to the file like this:
import codecs
thefile = codecs.open('search_term.txt', 'w', encoding='utf-8')
for item in unique1:
thefile.write("%s\n" % item)
The problem is that the file likely is encoded correctly hence why the characters will display incorrectly.

Related

how to check if text file contains string? [duplicate]

This question already has answers here:
How to search for a string in text files?
(13 answers)
Closed 12 months ago.
Hello, so, i'm having trouble to find if text file contains string.
for example.
and i want to create program to check if this file contains string 'banana'.
i have tried this but it didn't work
with open("test.txt","r") as f:
content = f.read()
for line in content:
if 'banana' in line:
do_something()
else:
exit()
text file looks like this:
banana is yellow
apple is red
python is good
You don't need looping 'cause your file is just text, so just use conditional
with open("test.txt","r") as f:
content = f.read()
if 'banana' in content:
do_something()
else:
exit()

Python: How to remove specific string from a file? [duplicate]

This question already has answers here:
How to bypass memory error when replacing a string in a large txt file?
(2 answers)
Closed 1 year ago.
I have a file, for example, "data.txt" with "1234567890" text inside. How can my program delete "678", so that "data.txt" will consist of "1234590"?
In addition, data.txt is a really heavy file. So you can't use pure read() or readlines().
I want to use only python tools, so "shell" is not an option.
You can do something like following:
with open("Stud.txt", "r") as fin:
with open("out.txt", "w") as fout:
for line in fin:
fout.write(line.replace('678', ''))

Why my file end up blank when I try to remove characters? [duplicate]

This question already has answers here:
replacing text in a file with Python
(7 answers)
What is the best way to modify a text file in-place?
(4 answers)
Closed 1 year ago.
I followed this subject here because I want to remove the <br /> that are in my output text file.
So my code is the following one :
def file_cleaner(video_id):
with open('comments_'+video_id+'.txt', 'r') as infile, open('comments_'+video_id+'.txt', 'w') as outfile:
temp = infile.read().replace("<br />", "")
outfile.write(temp)
If I remove this function call my file has content, but after I call this function my file is empty. Where did I do something wrong ?
Opening a file in w mode truncates the file first. So there's nothing to read from the file.
Read the file first, then open it for writing.
def file_cleaner(video_id):
with open('comments_'+video_id+'.txt', 'r') as infile:
temp = infile.read().replace("<br />", "")
with open('comments_'+video_id+'.txt', 'w') as outfile:
outfile.write(temp)

Adding 'r' to json string from file python [duplicate]

This question already has answers here:
Convert regular Python string to raw string
(12 answers)
Closed 3 years ago.
I am facing with the following error while reading json from file
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2 column 1 (char 948)
The json content is read from file using the script below
import json
if __name__ == "__main__":
file_path = "D:\\Freelancing\\Scraping_football_historic_data\\Data\\1.138103502"
with open(file_path,'r') as datafile:
dict_data = json.load(datafile)
print(dict_data)
Upon searching for answer, this question had an answer that suggested me to add r before the json string.
How can it be done in the case above, or if there's a better way to read the file. ?
The contents of the file can be read from the pastebin link:
https://pastebin.com/ZyyrtcZW
You are missing a comma between each of your individual dictionaries, your data should look like
....
{"op":"mcm","clk":"5733948534","pt":1514206953689,"mc":[{"id":"1.138103502","rc":[{"ltp":2.02,"id":48756}]}]},
{"op":"mcm","clk":"5739085003","pt":1514309273736,"mc":[{"id":"1.138103502","rc":[{"ltp":2.0,"id":48756}]}]},
{"op":"mcm","clk":"5739711407","pt":1514327265235,"mc":[{"id":"1.138103502","rc":[{"ltp":2.06,"id":48756}]}]},
.....

Why does my first element in readlines() of a CSV have additional characters? [duplicate]

This question already has answers here:
Reading Unicode file data with BOM chars in Python
(7 answers)
Closed 4 years ago.
I ran the following python code to open a CSV, and the first element had some extra characters in it that aren't present when I view the CSV in a text editor, say Notepad++.
priorities_file = open('priorities.txt', 'r')
print('Name of the file: ', priorities_file.name)
p = priorities_file.readlines()
print('Read Line: %s' % (p))
The output looked like this:
Name of the file: priorities.txt
Read Line: ['Autonomy\n', 'Travel\n',...
I understand the '\n' and how to remove that from each element, but I don't understand why there are the additional characters in front of the element ' Autonomy'. Can anyone tell me why this is? Bonus points for a way to remove those characters which I honestly couldn't find how to reproduce.
repr() would help. (on Python 3.X; use ascii() instead).
p = priorities_file.readlines()
print(repr(p))
My hunch is that the ecnoding in the csv file is not actually ASCII or UTF8?
UPDATE:
This should do the trick:
p = p.decode("utf-8-sig")

Categories

Resources