Why do I receive this error on parsing? - python

I am reading in a textfile and converting it into a python dictionary:
The file looks like this with labelword:
20001 World Economies
20002 Politics
20004 Internet Law
20005 Philipines Elections
20006 Israel Politics
20007 Science
This is the code to read the file and create a dictionary:
def get_pair(line):
key, sep, value = line.strip().partition("\t")
return int(key), value
with open("mapped.txt") as fd:
d = dict(get_pair(line) for line in fd)
print(d)
I receive {} when I print the contents of d.
Additionally, I receive this error:
Traceback (most recent call last):
File "predicter.py", line 23, in <module>
d = dict(get_pair(line) for line in fd)
File "predicter.py", line 23, in <genexpr>
d = dict(get_pair(line) for line in fd)
File "predicter.py", line 19, in get_pair
return int(key), value
ValueError: invalid literal for int() with base 10: ''
What does this mean? I do have content inside the file, I am not sure why is it not being read.

It means key is empty, which in turn means you have a line with a \t tab at the start or an empty line:
>>> '\tScience'.partition('\t')
>>> ''.partition('\t')
('', '', '')
My guess is that it is the latter; you can skip either such lines in your generator expression:
d = dict(get_pair(line) for line in fd if '\t' in line.strip())
Because line.strip() returns the lines without leading and trailing whitespace, empty lines or lines with only a tab at the start result in a string without a tab in it altogether. This won't handle all cases, but you could also strip the value passed to get_pair():
d = dict(get_pair(line.strip()) for line in fd if '\t' in line.strip())

Related

Python readlines() function ignores line written by program

I want to read lines and check whether an specific number is in it, but when reading and printing the list with the lines I can't get the 1st line where my testing string was written by the same program:
Code I use to write stuff on the file:
with open('db.txt', 'a') as f:
f.write(f'Request's channel id from guild {guild.id}:{request_channel_id} \n')
and the code I'm using to read the files and check the lines is:
with open('db.txt', 'r') as f:
index = 0
for line in f:
index += 1
if str(message.guild.id) in line or str(message.channel.id) in line:
break
content = f.readlines()
print(content)
content = content[index]
content.strip(":")
The second block of code is returning: [] and empty list even though I opened it and the line is there. But, when I write directly at the file with my keyboard the code "sees" the random stuff I wrote.
.txt file content:
Id do canal de request servidor 833434062248869899: 888273958263222332
a.a
all
a
a
a
a
Error:
['a.a\n', '\n', 'all\n', 'a\n', 'a\n', 'a\n', 'a']
Ignoring exception in on_message
Traceback (most recent call last):
File "C:\Users\CARVALHO\AppData\Local\Programs\Python\Python39\lib\site-packages\discord\client.py", line 343, in _run_event
await coro(*args, **kwargs)
File "C:\Users\CARVALHO\desktop\gabriel\codando\music_bot\main.py", line 48, in on_message
request_channel_id = int(content[1])
IndexError: string index out of range

Python 2.7 - Read text replace and write to same file

I am reading a text file ( 20 + lines) and doing a find and replace at multiple places in the text with the below code .
with open(r"c:\TestFolder\test_file_with_State.txt","r+") as fp:
finds = 'MI'
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,'MICHIGAN',textdata)
fp.write(line)
When trying to write it back to the same file, I get the below error.
IOError Traceback (most recent call last)
<ipython-input> in <module>()
6 line = re.sub(pattern,'MICHIGAN',textdata)
7 print line
----> 8 fp.write(line)
9
what is that I am doing wrong.
You've already read the file in so you're at the end of the file and there's nowhere to write the text to.
You can get around this by going back to the beginning of the file with fp.seek(0)
Also surrounding whitespace is being consumed by the regex so you can add it back in.
So your code would be:
with open(r"c:\TestFolder\test_file_with_State.txt","r+") as fp:
finds = 'MI'
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,' MICHIGAN ',textdata)
fp.seek(0)
fp.write(line)

Find the first instance of a variable in a .txt file and delete the line it is in

I have created a text file that holds different names, followed by a number in the format:
Name 1, 10
Name 2, 5
Name 3, 5
Name 2, 7
Name 2, 6
Name 4, 8
ect.
I want to find the line that the variable 'Name 2' first appears on - so line 2, and then delete that line. Is that possible?
def skip_line_with(it, name):
# Yield lines until find the line with `name`
for line in it:
if line.startswith(name):
break # Do not yield the line => skip the line
yield line
# Yield lines after the line
for line in it:
yield line
with open('a.txt') as fin, open('b.txt', 'w') as fout:
fout.writelines(skip_line_with(fin, 'Name 2,'))
New file b.txt without unwanted line will be created.
UPDATE If you want to replace the file in-place (assuming file is not huge):
def skip_line_with(it, name):
for line in it:
if line.startswith(name):
break
yield line
for line in it:
yield line
with open('a.txt', 'r+') as f:
replaced = list(skip_line_with(f, 'Name 2,'))
f.seek(0)
f.writelines(replaced)
f.truncate()

python unicode replace MemoryError

i want replace unicode character to a file with python
this is my code :
with codecs.open('/etc/bluetooth/main.conf', "r", "utf8") as fi:
mainconf=fi.read()
forrep = ''.decode('utf8')
for line in mainconf.splitlines():
if('Name = ' in line):
forrep = line.split('=')[1]
print 'name',type(name)
print 'mainconf',type(mainconf)
print 'forrep',type(forrep)
mainconf = mainconf.replace(forrep, name)
#mainconf = mainconf.replace(forrep.decode('utf8'),' '+name)
with codecs.open('/etc/bluetooth/main.conf','w',"utf8") as fi:
fi.write(mainconf)
but python always get me error MemoryError...
this out :
name <type 'unicode'>
mainconf <type 'unicode'>
forrep <type 'unicode'>
Traceback (most recent call last):
File "WORK/Bluetooth/Bluetooth.py", line 359, in <module>
if __name__ == '__main__':main()
File "WORK/Bluetooth/Bluetooth.py", line 336, in main
BLMan.SetAllHCIName(common.cfg.get('BLUETOOTH', 'HCI_DEVICE_NAME'))
File "WORK/Bluetooth/Bluetooth.py", line 194, in SetAllHCIName
mainconf = mainconf.replace(forrep, name)
MemoryError
Iterate over the file object, you are storing the whole file content in memory using mainconf=fi.read() :
with codecs.open('/etc/bluetooth/main.conf', "r", "utf8") as fi:
for line in fi:
You store all the lines with read then you store a list of all the lines using splitlines so you are storing all the file content twice and as #abarnet pointed out in a comment you then try to store a third copy with
mainconf = mainconf.replace(forrep, name).
Iterating over the file object will give you a line at a time, if you need to store the lines after replacing do so each time through the loop so at most you will only have one copy of the file content in memory.
I have no idea what name is but writing to a tempfile will be the most efficient way to do what you want:
from tempfile import NamedTemporaryFile
with open('/etc/bluetooth/main.conf') as fi, NamedTemporaryFile(dir=".", delete=False) as out:
for line in fi:
if line.startswith("Name ="):
a, b = line.split("=",1)
out.write("{} = {}".format(a, name.encode("utf-8")))
else:
out.write(line)
move(out.name, '/etc/bluetooth/main.conf')

Python: re-formatting multiple lines in text file

I apologize if this post is long, but I am trying to be as detailed as possible. I have done a considerable amount of research on the topic, and would consider myself an "intermediate" skilled programmer.
My problem: I have a text file with multiple lines of data. I would like to remove certain parts of each line in an effort to get rid of some irrelevant information, and then save the file with the newly formatted lines.
Here is an example of what I am trying to accomplish. The original line is something like:
access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594
I am trying to have the code read the text file, and output:
permit tcp any 209.143.156.200 www
The following code works, but only if there is a single line in the text file:
input_file = open("ConfigInput.txt", "r")
output_file = open("ConfigOutput.txt", "w")
for line in input_file:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
output_file.close()
input_file.close()
However, when I attempt to run this with a full file of multiple lines of data, I receive an error:
File "C:\Python27\asaReader", line 5, in <module>
line = line.split("extended ", 1)[1]
IndexError: list index out of range
I suspect that it is not moving onto the next line of data in the text file, and therefore there isn't anything in [1] of the previous string. I would appreciate any help I can get on this.
Some possible causes:
You have blank lines in your file (blank lines obviously won't contain the word extended)
You have lines that aren't blank, but don't contain the word extended
You could try printing your lines individually to see where the problem occurs:
for line in input_file:
print("Got line: %s" % (line))
line = line.split("extended ", 1)[1]
Oh, and it's possible that the last line is blank and it's failing on that. It would be easy enough to miss.
Print something out when you hit a line that can't be processed
for line in input_file:
try:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
except Exception, e:
print "Choked on this line: %r"%line
print e
An alternate approach would be to cache all the lines (assuming the file is not humongous.)
>>> with open('/tmp/ConfigInput.txt', 'rU') as f:
... lines = f.readlines()
...
...
>>> lines
['access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594\n']
>>> lines = [re.sub('(^.*extended |\(.*$)', '', line) for line in lines]
>>> lines
['permit tcp any host 209.143.156.200 eq www \n']
>>> with open('/tmp/ConfigOutput.txt', 'w') as f:
... f.writelines(lines)
...
...
>>>

Categories

Resources