Python readline() method caused UnicodeDecodeError

Python readline() method caused UnicodeDecodeError - python

I`m trying to read and extract information of a large txt and to write it in another document, and I get this error:
Here is my code:
#Create list with PLZ, city and state
cepfinal = open("cepfinal.txt", "w") #file to be written
with open("ceptest2.txt", "r") as fp: #read file
while True:
line = fp.readline()
# print(str(line))
x = line.split("\t") #separate all that have double space
plz = x[0] #extract PLZ
# print(plz)
y = x[1]
mun = y.split("/") #separe city from state
# print(mun)
plzmun = [plz] + mun
# print(plzmun)
final = plzmun.pop(2) #remove state
plzmun = " ".join(plzmun) #create string
print(plzmun)
cepfinal.write(plzmun + "\n")
fp.close()
It is a 45 Gb file, so I suppose I have a memory issue. Can someone help me to make a lean code?

your problem is with encoding,
you can try this to solve your problem
with open("ceptest2.txt", "r", encoding="utf8") as fp:

Related

How to search and replace a specific value in a line in Python

I have a text file that has some values as follows:
matlab.file.here.we.go{1} = 50
matlab.file.here.sxd.go{1} = 50
matlab.file.here.asd.go{1} = 50
I want the code to look for "matlab.file.here.sxd.go{1}" and replace the value assigned to it from 50 to 1. But I want it to be dynamic (i.e., later I will have over 20 values to change and I don't want to search for that specific phrase). I'm new to python so I don't have much information in order to search for it online. Thanks
I tried the following
file_path = r'test\testfile.txt'
file_param = 'matlab.file.here.we.go{1}'
changing = 'matlab.file.here.we.go{1} = 1'
with open(file_path, 'r') as f:
content = f.readlines()
content = content.replace(file_param , changing)
with open(file_path, 'w') as f:
f.write(content)
but it didn't achieve what I wanted

You can split on the equal sign. You can read and write files at the same time.
import os
file_path = r'test\testfile.txt'
file_path_temp = r'test\testfile.txt.TEMP'
new_value = 50
changing = 'matlab.file.here.we.go{1} = 1'
with open(file_path, 'r') as rf, open(file_path_temp, 'w') as wf:
for line in rf:
if changing in line:
temp = line.split(' = ')
temp[1] = new_value
line = ' = '.join(temp)
wf.write(line)
os.remove(file_path)
os.rename(file_path_temp, file_path)

How to write the output to a file, the name of which is passed as a second parameter?

def generate_daily_totals(input_filename, output_filename):
"""result in the creation of a file blahout.txt containing the two lines"""
with open(input_filename, 'r') as reader, open(output_filename, 'w') as writer: #updated
for line in reader: #updated
pieces = line.split(',')
date = pieces[0]
rainfall = pieces[1:] #each data in a line
total_rainfall = 0
for data in rainfall:
pure_data = data.rstrip()
total_rainfall = total_rainfall + float(pure_data)
writer.write(date + "=" + '{:.2f}'.format(total_rainfall) + '\n') #updated
#print(date, "=", '{:.2f}'.format(total_rainfall)) #two decimal point format,
generate_daily_totals('data60.txt', 'totals60.txt')
checker = open('totals60.txt')
print(checker.read())
checker.close()
By reading a file, the original program runs well but I was required to convert it by writing the file. I am confused as the write method applies to string only so does that mean only the print section can be replaced by write method? This is the first time I am trying to use the write method. Thanks!
EDIT: the above codes have been updated based on the blhsing instruction which helped a lot! But still not running well as the for loop which gets skipped for some reason. Proper suggestions would be appreciated!
expected output:
2006-04-10 = 1399.46
2006-04-11 = 2822.36
2006-04-12 = 2803.81
2006-04-13 = 1622.71
2006-04-14 = 3119.60
2006-04-15 = 2256.14
2006-04-16 = 3120.05
2006-04-20 = 1488.00

You should open both the input file for reading, and the output file for writing, so change:
with open(input_filename, 'w') as writer:
for line in writer: # error not readable
to:
with open(input_filename, 'r') as reader, open(output_filename, 'w') as writer:
for line in reader:
Also, unlike the print function, the write method of a file object does not automatically add a trailing newline character to the output, so you would have to add it on your own.
Change:
writer.write(date + "=" + '{:.2f}'.format(total_rainfall))
to:
writer.write(date + "=" + '{:.2f}'.format(total_rainfall) + '\n')
or you can use print with the outputting file object specified as the file argument:
print(date, "=", '{:.2f}'.format(total_rainfall), file=writer)

How to loop through a list of strings in Python

I'm a bit new to Python and I am trying to simplify my existing code.
Right now, I have the code repeated 5 times with different strings. I'd like to have the code one time and have it run through a list of strings.
Currently what I have:
def wiScanFormat():
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
MAC = data.replace("Address:", "\nAddress, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(MAC)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
SSID = data.replace("ESSID:", "\nESSID, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(SSID)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
FREQ = data.replace("Frequency:", "\nFrequency, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(FREQ)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
QUAL = data.replace("Quality", "\nQuality, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(QUAL)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
SIG = data.replace("Signal level", "\nSignal Level, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(SIG)
File.close()
What I'd like to have:
ORG = ['Address:', 'ESSID:'...etc]
NEW = ['\nAddress, ' , '\nESSID, ' , ... etc]
and run that through:
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
ID = data.replace("ORG", "NEW")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(ID)
File.close()
I've tried running exactly what I put up, but it does not seem to format it the way I need to.
The output from above looks like:
Cell 46 - Address: xx:xx:xx:xx:xx:xx ESSID:"MySSID" Frequency:2.412 GHz (Channel 1) Quality=47/100 Signal level=48/100 Quality=47/100 Signal level=48/100
But it is supposed to look like this (And it does when I run that same block over the strings separately):
xx:xx:xx:xx:xx:xx MySSID 5.18 GHz (Channel 36) 0.81 0.99
How should I go about looping this block of code through my list of strings?
There two strings that I would need for the find and replace, old and new, so they would have to work together. These lists will be the same size, obviously, and I need them to be in the correct order. Address with address, ESSID with ESSID, etc.
Thanks in advance!

Try something like this:
ORG = ['Address:', 'ESSID:'...etc]
NEW = ['\nAddress, ' , '\nESSID, ' , ... etc]
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
for org, new in zip(ORG, NEW):
data = data.replace(org, new)
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(data)
File.close()
(Note the way zip works: https://docs.python.org/2/library/functions.html#zip)

If I am reading your question right, you are opening the same file, making a small alteration, saving it, and then closing it again, five times. You could just open it once, make all the alterations, and then save it. For instance, like this:
filename = "/home/pi/gpsMaster/WiScan.txt"
with open(filename, 'r') as fin:
data = fin.read()
data = data.replace("Address:", "\nAddress, ")
data = data.replace("ESSID:", "\nESSID, ")
data = data.replace("Frequency:", "\nFrequency, ")
data = data.replace("Quality", "\nQuality, ")
data = data.replace("Signal level", "\nSignal Level, ")
with open(filename, 'w') as fout:
fout.write(data)
If you want to use lists (ORG and NEW) for your replacements, you could do this:
with open(filename, 'r') as fin:
data = fin.read()
for o,n in zip(ORG, NEW):
data = data.replace(o,n)
with open(filename, 'w') as fout:
fout.write(data)

Given your ORG and NEW, the simplest way to do this would be something like:
# Open once for both read and write; use with statement for guaranteed close at end of block
with open("/home/pi/gpsMaster/WiScan.txt", "r+") as f:
data = f.read() # Slurp file
f.seek(0) # Seek back to beginning of file
# Perform all replacements
for orig, repl in zip(ORG, NEW):
data = data.replace(orig, repl)
f.write(data) # Write new data over old
f.truncate() # If replacement shrunk file, truncate extra

You could just do this:
def wiScanFormat(path = "/home/pi/gpsMaster/WiScan.txt"):
# List of tuples with strings to find and strings to replace with
replacestr = [
("Address:", "\nAddress, "),
("ESSID:", "\nESSID, "),
("Frequency:", "\nFrequency, "),
("Quality", "\nQuality, "),
("Signal level", "\nSignal Level, ")
]
with open(path, "r") as file: # Open a file
data = file.read()
formated = data
for i in replacestr: # Loop over each element (tuple) in the list
formated = formated.replace(i[0], i[1]) # Replace the data
with open(path, "w") as file:
written = file.write(formated) # Write the data
return written

how to read the content of .txt file using python?

output_filename = r"C:\Users\guage\Output.txt"
RRA:
GREQ-299684_6j
GREQ-299684_6k
CZM:
V-GREQ-299684_6k
V-GREQ-299524_9
F_65624_1
R-GREQ-299680_5
DUN:
FB_71125_1
FR:
VQ-299659_18
VR-GREQ-299659_19
VEQ-299659_28
VR-GREQ-299659_31
VR-GREQ-299659_32
VEQ-299576_1
GED:
VEQ-299622_2
VR-GREQ-299618_13
VR-GREQ-299559_1
VR-GREQ-299524_14
FB_65624_1
VR-GREQ-299645_1
MNT:
FB_71125_1
FB_71125_2
VR-534_4
The above is the content of the the .txt file. how can I read it separately the content of it. for example -
RRA:VR-GREQ-299684_6j VR-GREQ-299684_6k VR-GREQ-299606_3 VR-GREQ-299606_4 VR-GREQ-299606_5 VR-GREQ-299606_7
and save it in a variable or something similar to it. Later I want to read CZM separately and so on. I did as below.
with open(output_filename, 'r') as f:
excel = f.read()
But how to read it separately ? can someone tell me how to do it ?

Something like this:
def read_file_with_custom_record_separator(file_path, delimiter='\n'):
fh = open(file_path)
data = ""
for line in fh:
if line.strip().endswith(delimiter) and data != "":
print "VARIABLE:\n<", data, ">\n"
data = line
else:
data += line
print "LAST VARIABLE:\n<", data, ">\n"
And then:
read_file_with_custom_record_separator("input.txt", ":")

You can make use of the file text : as indicator to create a new file like this:
savefilename = ""
with open(filename, 'r') as f:
for line in f:
line = line.strip() # get rid of the unnecessary white chars
lastchar = line[-1:] # get the last char
if lastchar == ":": # if the last char is ":"
savefilename = line[0:-1] # get file name from line (except the ":")
sf = open(savefilename + ".txt", 'w') # create a new file
else:
sf.write(line + "\n") # write the data to the opened file
Then you should get collection of files:
RRA.txt
CZM.txt
DUN.txt
# etc
which contains all the appropriate data:
RRA.txt
VR-GREQ-299684_6j
VR-GREQ-299684_6k
VR-GREQ-299606_3
VR-GREQ-299606_4
VR-GREQ-299606_5
VR-GREQ-299606_7
CZM.txt
VR-GREQ-299684_6k
VR-GREQ-299606_6
VR-GREQ-299606_8
VR-GREQ-299640_1
VR-GREQ-299640_5
VR-GREQ-299524_9
FB_65624_1
VR-GREQ-299680_5
DUN.txt
FB_71125_1
# and so on
You can replace the sf = open and the sf.write which whatever way you feel best to separate the data. Here, I use files...

You can iterate over the file and use the lines and indices to your advantage; something like this:
with open(output_filename, 'r') as f:
for index, line in enumerate(f):
# here you have access to each line and its index
# so you can save any number of lines you wish

What about reading it into a list, then process its element as you prefer
>>> f = open('myfile.txt', 'r').readlines()
>>> len(f)
46
>>> f[0]
RRA:
>>> f[-1]
VR-GREQ-299534_4
>>> f[:3]
['RRA:\n', 'VR-GREQ-299684_6j \n', 'VR-GREQ-299684_6k \n']
>>>
>>> [l for l in f if l.startswith('FB_')]
['FB_65624_1 \n', 'FB_71125_1 \n', 'FB_69228_1 \n', 'FB_65624_1 \n', 'FB_71125_1 \n', 'FB_71125_2 \n']
>>>

Reading comma separated values from text file in python

I have a text file consisting of 100 records like
fname,lname,subj1,marks1,subj2,marks2,subj3,marks3.
I need to extract and print lname and marks1+marks2+marks3 in python. How do I do that?
I am a beginner in python.
Please help
When I used split, i got an error saying
TypeError: Can't convert 'type' object to str implicitly.
The code was
import sys
file_name = sys.argv[1]
file = open(file_name, 'r')
for line in file:
fname = str.split(str=",", num=line.count(str))
print fname

If you want to do it that way, you were close. Is this what you were trying?
file = open(file_name, 'r')
for line in file.readlines():
fname = line.rstrip().split(',') #using rstrip to remove the \n
print fname

Note: its not a tested code. but it tries to solve your problem. Please give it a try
import csv
with open(file_name, 'rb') as csvfile:
marksReader = csv.reader(csvfile)
for row in marksReader:
if len(row) < 8: # 8 is the number of columns in your file.
# row has some missing columns or empty
continue
# Unpack columns of row; you can also do like fname = row[0] and lname = row[1] and so on ...
(fname,lname,subj1,marks1,subj2,marks2,subj3,marks3) = *row
# you can use float in place of int if marks contains decimals
totalMarks = int(marks1) + int(marks2) + int(marks3)
print '%s %s scored: %s'%(fname, lname, totalMarks)
print 'End.'

"""
sample file content
poohpool#signet.com; meixin_kok#hotmail.com; ngai_nicole#hotmail.com; isabelle_gal#hotmail.com; michelle-878#hotmail.com;
valerietan98#gmail.com; remuskan#hotmail.com; genevieve.goh#hotmail.com; poonzheng5798#yahoo.com; burgergirl96#hotmail.com;
insyirah_powergals#hotmail.com; little_princess-angel#hotmail.com; ifah_duff#hotmail.com; tweety_butt#hotmail.com;
choco_ela#hotmail.com; princessdyanah#hotmail.com;
"""
import pandas as pd
file = open('emaildump.txt', 'r')
for line in file.readlines():
fname = line.split(';') #using split to form a list
#print(fname)
df1 = pd.DataFrame(fname,columns=['Email'])
print(df1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python readline() method caused UnicodeDecodeError - python

your problem is with encoding, you can try this to solve your problem with open("ceptest2.txt", "r", encoding="utf8") as fp:

Related

How to search and replace a specific value in a line in Python

How to write the output to a file, the name of which is passed as a second parameter?

How to loop through a list of strings in Python

how to read the content of .txt file using python?

Reading comma separated values from text file in python

Categories

Resources