Why does this code throw an IndexError? - python

I am trying to read values from a file, data.txt, with this code. However, it throws an IndexError when I run it. Why would this be happening?
def main():
myfile=open('data.txt','r')
line=myfile.readline()
while line!='':
line=line.split()
age=line[1]
line=myfile.readline()
myfile.close()
main()

If line happens to contain exactly one fragment, line.split() returns a list of exactly one element, and accessing its second element (at index 1) leads to an error.
Also, to make your code better, don't ever reassign the variables. It hampers readers, and the code is written mostly to be read, especially by yourself.
I'd use a simpler loop:
for line in myfile: # this iterates over the lines of the file
fragments = line.split()
if len(fragments) >= 2:
age = fragments[1]
...
Also, the idiomatic way to open the file for a particular duration and close it automatically is the use of with:
with open(...) as myfile:
for line in myfile:
...
# At this point, the file will be automatically closed.

Python starts indexing at 0.
in your age=line[1] part, if there is only one word in the line, Python will throw an IndexError to tell you that. Seeing your data would be helpful, but the following is the generally accepted and much easier way of reading a file:
with open('data.txt', 'r') as myfile:
for line in myfile:
# '' is a a false value, so, this is the same as if line != ''
if line:
line = line.split()
# if age is always the first thing on the line:
age = line[0]
# if age can be somewhere else on the line, you need to do something more complicated
Note that, because you used with, you don't need to close the file yourself, the with statement does that

def main():
myfile=open('data.txt','r')
line=myfile.readline()
while line!='':
line=line.split()
try:
age=line[1]
except IndexError:
age = None
line=myfile.readline()
myfile.close()
main()
The try statement works as follows.
First, the try clause (the statement(s) between the try and except keywords) is executed.
If no exception occurs, the except clause is skipped and execution of the try statement is finished.
If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.
If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message.
For more details, see https://docs.python.org/2/tutorial/errors.html#handling-exceptions

Related

How to only read first none empty line in python using sys.stdin

I want my Python code to read a file that will contain numbers only in one line. But that one line will not necessarily be the first one. I want my program to ignore all empty lines until it gets to the first line with numbers.
The file will look something like this:
In this example I would want my Python code to ignore the first 2 lines which are empty and just grabbed the first one.
I know that when doing the following I can read the first line:
import sys
line = sys.stdin.readline()
And I tried doing a for loop like the following to try to get it done:
for line in sys.stdin.readlines():
values = line.split()
rest of code ....
However I cannot get the code to work properly if the line of numbers in the file is empty. I did try a while loop but then it became an infinite loop. Any suggestions on how does one properly skip empty lines and just performs specific actions on the first line that is not empty?
Here is example of a function to get the next line containing some non-whitespace character, from a given input stream.
You might want to modify the exact behaviour in the event that no line is found (e.g. return None or do something else instead of raising an exception).
import sys
import re
def get_non_empty_line(fh):
for line in fh:
if re.search(r'\S', line):
return line
raise EOFError
line = get_non_empty_line(sys.stdin)
print(line)
Note: you can happily call the function more than once; the iteration (for line in f:) will carry on from wherever it got to the last time.
You probably want to use the continue keyword with a check if the line is empty, like this:
for line in sys.stdin.readlines():
if not line.strip():
continue
values = line.split()
rest of code ....

What is the difference between these two ways to read lines in Python with `sys.stdin`?

I don't think I understand the ways to read lines from the input using sys.stdin .
What is the difference between
import sys
while True:
foo(sys.stdin.readline())
and
import sys
for line in sys.stdin:
foo(line)
and why would I pick one choice over the other?
Also, how would I get the behavior of
import sys
first_line = sys.readline()
foo(first_line)
while True:
bar(sys.readline())
by using a for-in loop? Specifically, what would be an elegant way to treat the first line separately from the other lines in the input? Does somethine along the lines of for line in sys.stdin still work?
There's nothing special about sys.stdin here; it's just a normal text file object.
Iterating any iterable, including a file object, with for x in iterable:, just calls next on it over and over until it raises a StopIteration.
Notice that this means that if you want to skip over a header line before processing the rest of a file, you can just call next(f) before the loop.
And readline does the same thing as next, except for the hint parameter (which you're not using), and what happens on various error conditions (which aren't likely to matter here), and what happens at EOF: readline returns an empty string, next raises a StopIteration.
So, there's no general reason to pick one over the other in general; it comes down to which is more readable in your particular case.
If your goal is to loop over all the lines, it's a lot more readable to use a for loop. Compare:
for line in sys.stdin:
do_stuff(line)
while True:
line = sys.stdin.readline()
if not line:
break
do_stuff(line)
If, on the other hand, your loop involves reading variable chunks of stuff with some non-trivial logic, readline is usually going to be clearer:
while True:
line = sys.stdin.readline()
if not line:
break
while line.rstrip().endswith('\\'):
line = line.rstrip().rstrip('\\') + sys.stdin.readline()
do_stuff(line)
logical_line = ''
try:
for line in sys.stdin:
if logical_line:
logical_line += line
if not line.rstrip().endswith('\\'):
do_stuff(logical_line)
logical_line = ''
except StopIteration:
if logical_line:
do_stuff(logical_line)
while True:
foo(sys.stdin.readline())
This code will loop forever. If there is an EOF on sys.stdin -- for instance, if input was redirected from a file, and the end of that file has been reached -- then it will call foo('') repeatedly. This is probably bad.
for line in sys.stdin:
foo(line)
This code will stop looping when an EOF is encountered. This is good.
If you want to handle the first line differently, you can simply call sys.stdin.readline() once before entering the loop:
first_line = sys.readline()
foo(first_line)
for line in sys.stdin:
bar(line)

EOF Error in python Hackerrank

Trying to solve a problem but the compiler of Hackerrank keeps on throwing error EOFError while parsing: dont know where is m i wrong.
#!usr/bin/python
b=[]
b=raw_input().split()
c=[]
d=[]
a=raw_input()
c=a.split()
f=b[1]
l=int(b[1])
if(len(c)==int(b[0])):
for i in range(l,len(c)):
d.append(c[i])
#print c[i]
for i in range(int(f)):
d.append(c[i])
#print c[i]
for j in range(len(d)):
print d[j],
i also tried try catch to solve it but then getting no input.
try:
a=input()
c=a.split()
except(EOFError):
a=""
input format is 2 spaced integers at beginning and then the array
the traceback error is:
Traceback (most recent call last):
File "solution.py", line 4, in <module>
b=raw_input().split()
EOFError: EOF when reading a line
There are several ways to handle the EOF error.
1.throw an exception:
while True:
try:
value = raw_input()
do_stuff(value) # next line was found
except (EOFError):
break #end of file reached
2.check input content:
while True:
value = raw_input()
if (value != ""):
do_stuff(value) # next line was found
else:
break
3. use sys.stdin.readlines() to convert them into a list, and then use a for-each loop. More detailed explanation is Why does standard input() cause an EOF error
import sys
# Read input and assemble Phone Book
n = int(input())
phoneBook = {}
for i in range(n):
contact = input().split(' ')
phoneBook[contact[0]] = contact[1]
# Process Queries
lines = sys.stdin.readlines() # convert lines to list
for i in lines:
name = i.strip()
if name in phoneBook:
print(name + '=' + str( phoneBook[name] ))
else:
print('Not found')
I faced the same issue. This is what I noticed. I haven't seen your "main" function but Hackerrank already reads in all the data for us. We do not have to read in anything. For example this is a function def doSomething(a, b):a and b whether its an array or just integer will be read in for us. We just have to focus on our main code without worrying about reading. Also at the end make sure your function return() something, otherwise you will get another error. Hackerrank takes care of printing the final output too. Their code samples and FAQs are a bit misleading. This was my observation according to my test. Your test could be different.
It's because your function is expecting an Input, but it was not provided. Provide a custom input and try to compile it. It should work.
i dont know but providing a custom input and compiling it and got me in! and passed all cases without even changing anything.
There are some codes hidden below the main visible code in HackerRank.
You need to expand that (observe the line no. where you got the error and check that line by expanding) code and those codes are valid, you need to match the top visible codes with the hidden codes.
For my case there was something like below:
regex_integer_in_range = r"___________" # Do not delete 'r'.
regex_alternating_repetitive_digit_pair = r"__________" # Do not delete 'r'.
I just filled up the above blank as like below and it was working fine with the given hidden codes:
regex_integer_in_range = r"^[0-9][\d]{5}$" # Do not delete 'r'.
regex_alternating_repetitive_digit_pair = r"(\d)(?=\d\1)" # Do not delete 'r'.

Python File Remains Empty After Writing to it Issue

I am trying to read URL directly from MYSQLDB table and tldextract to get the domain from the url and find the SPF(Sender Policy Framework) Record for the domain.
When i'm trying to write the SPF records of each and every domain i scan,My Ouput_SPF_Records.txt do not contain any records i write.
Not sure with the issue,Any suggestions please ?
import sys
import socket
import dns.resolver
import re
import MySQLdb
import tldextract
from django.utils.encoding import smart_str, smart_unicode
def getspf (domain):
answers = dns.resolver.query(domain, 'TXT')
for rdata in answers:
for txt_string in rdata.strings:
if txt_string.startswith('v=spf1'):
return txt_string.replace('v=spf1','')
db=MySQLdb.connect("x.x.x.x","username","password","db_table")
cursor=db.cursor()
cursor.execute("SELECT application_id,url FROM app_info.app_urls")
data=cursor.fetchall()
x=0
while x<len(data):
c=tldextract.extract(data[x][1])
#print c
app_id=data[x][0]
#print app_id
d=str(app_id)+','+c[1]+'.'+c[2]
#with open('spfout.csv','a') as out:
domain=smart_str(d)
#print domain
with open('Ouput_SPF_Records.txt','w') as g:
full_spf=""
spf_rec=""
y=domain.split(',')
#print "y===",y,y[0],y[1]
app_id=y[0]
domains=y[1]
try:
full_spf=getspf(domains.strip())+"\n"
spf_rec=app_id+","+full_spf
print spf_rec
except Exception:
pass
g.write(spf_rec)
x=x+1
g.close()
Try openning the file with append mode, instead of w mode. w mode overwrites the file in each iteration. Example -
with open('Ouput_SPF_Records.txt','a') as g:
Most probably, the last time you open the file in write mode, you do not write anything in since, you are catching and ignoring all exceptions , which causes the empty file.
Also, if you know the error which you are expecting, you should use except <Error>: instead of except Exception: . Example -
try:
full_spf=getspf(domains.strip())+"\n"
spf_rec=app_id+","+full_spf
print spf_rec
except <Error you want to catch>:
pass
Your problem is you open the file many times, each time through the loop. You use w mode, which erases the contents and writes from the beginning.
Either open the file once before the loop, or open in append mode a, so you don't delete the previously written data.
You can use :
import pdb;pdb.set_trace()
Debug your code and try to figure out the problem.
also note that :
1. You shouldn't just write 'pass' in the try/except block. Deal with the Exception
2.
with open('Ouput_SPF_Records.txt','w') as g:
it will automatically close the file, so there is no need to do : g.close() explicitly.
I think this is the result of getspf return None by default.
The Problem is that python cant concatenate str and NoneType (the type of None) (which throws an exception that you quickly discard).
You may try this instead:
def getspf (domain):
answers = dns.resolver.query(domain, 'TXT')
for rdata in answers:
for txt_string in rdata.strings:
if txt_string.startswith('v=spf1'):
return txt_string.replace('v=spf1','')
return ""#"Error"
Probably you should check for the exception, my guess is that statements inside it are not performed, and spf_rec is left to "".
As per POSIX definition (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206), every line you write should end with "\n".
You might considering to initialise spf_rec with "\n" rather than "".
Also, as "Anand S Kumar" said, without an append, the file is overwritten at every "while x
I think that if you open the Ouput_SPF_Records.txt file with "vi", you will see the last line written (unless an exception occurred on the last execution of the cycle, causing the file to be just "").
In other words, the problem is that many software may not read a line that doesn't respect the POSIX standard, and because your file is probably composed by a unique line that doesn't respect this standard, the file won't be read at all.

What is this JSON Decoder piece of code doing?

I have been using this piece of code:
def read_text_files(filename):
# Creates JSON Decoder
decoder = json.JSONDecoder()
with open(filename, 'r') as inputfile:
# Returns next item in input file, removes whitespace from it and saves it in line
line = next(inputfile).strip()
while line:
try:
# Returns 2-tuple of Python representation of data and index where data ended
obj, index = decoder.raw_decode(line)
# Remove object
yield obj
# Remove already scanned part of line from rest of file
line = line[index:]
except ValueError:
line += next(inputfile).strip()
if not line:
line += next(inputfile).strip()
global count
count+=1
print str(count)
all_files = glob.glob('Documents/*')
for filename in all_files:
for data in read_text_files(filename):
rawTweet = data['text']
print 'Here'
It reads in a JSON file and decodes it. However, what I realise is that when I place the count and print statements inside the ValueError, I'm losing almost half of the documents being scanned in here - they never make it back to the main method.
Could somebody explain to me exactly what the try statement is doing and why I'm losing documents in the except part. Is it due to bad JSON?
Edit: Including more code
Currently, with the code posted, the machine prints:
"Here"
2
3 etc...
199
Here
200
Here (alternating like this until)...
803
804
805 etc...
1200
Is this happening because some of the JSON is corrupt? Is it because some of the documents are duplicates (and some definitely are)?
Edit 2:
Interesting, deleting:
line=next(inputfile).strip()
while line
and replacing it with:
for line in inputfile:
appears to have fixed the problem. Is there a reason for this?
The try statement is specifying a block of statements for which exceptions are handled through the following except blocks (only one in your case).
My impression is that with your modifications you are making a second exception trigger inside the exception handler itself. This makes control go to a higher-level exception handler, even outside function read_text_files. If no exception occurs in the exception handler, the loop can continue.
Please check that count exists and has been initialized with an integer value (say 0).

Categories

Resources