my_string[0] throws string index out of range in Python - python

I am extracting api calls from a log as strings, storing them in a file then in another process reading them from this file.
I want to catch the lines that exist, which length is above 5 and starting with a double quote.
if (input_string):
if (len(input_string)>5):
if (input_string[0] == '"'):
Throws the following exception:
File "/home/api_calls_simulator.py", line 58, in doWork
if (input_string[0] == '"'):
IndexError: string index out of range
input_string comes from:
call_file = open("./myfile.txt")
for input_string in call_file:
example of matching line:
"/api/v2/product/id/2088?user_name=website&key=secretkey&format=json" 200 2790 "-" hit
What am i missing here ?

Related

Getting EOF error but running my code in Thonny produces no errors

I'm learning python and one of my labs required me to:
Write a program whose input is a string which contains a character and a phrase, and whose output indicates the number of times the character appears in the phrase. The output should include the input character and use the plural form, n's, if the number of times the characters appears is not exactly 1.
My code ended up being:
char = input()
string = input()
count = 0
for i in string:
if i == char:
count +=1
if count > 1 or count == 0:
print(f"{count} {char}'s")
else:
print(f'{count} {char}')
Whenever I run the code in Thonny or in the Zybooks development tab it works but when I select the submit option I keep getting and EOF error:
Traceback (most recent call last):
File "main.py", line 2, in <module>
string = input()
EOFError: EOF when reading a line
Does anyone know what's causing the error?
I tried using the break command but it didn't help though I think if I used break at the end of my for statement it wouldn't count all the way. Any ideas folks?
Thank you Mr. Roberts the number of inputs was the issue. I had to create a single input and pull what I needed from that single line. My code ended up being:
string = input()
char = string[0]
phrase = string[1:]
count = 0
for i in phrase:
if i == char:
count +=1
All good now.

Placing variable in single quotes

I am receiving an integer error when reading from my CSV sheet. Its giving me problems reading the last column. I know theres characters in the last column but how do I define digit as a character. The API function psspy.two_winding_chg_4 requires an input using single quotes ' ' as shown below in that function(3rd element of the array)
Traceback (most recent call last):
File "C:\Users\RoszkowskiM\Desktop\win4.py", line 133, in <module>
psspy.two_winding_chng_4(from_,to,'%s'%digit,[_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i],[_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f, max_value, min_value,_f,_f,_f],[])
File ".\psspy.py", line 25578, in two_winding_chng_4
TypeError: an integer is required
ValueError: invalid literal for int() with base 10: 'T1'
The code:
for row in data:
data_location, year_link, from_, to, min_value,max_value,name2,tla_2,digit = row[5:14]
output = 'From Bus #: {}\tTo Bus #: {}\tVMAX: {} pu\tVMIN: {} pu\t'
if year_link == year and data_location == location and tla_2==location:
from_=int(from_)
to=int(to)
min_value=float(min_value)
max_value=float(max_value)
digit=int(digit)
print(output.format(from_, to, max_value, min_value))
_i=psspy.getdefaultint()
_f=psspy.getdefaultreal()
psspy.two_winding_chng_4(from_,to,'%s'%digit,[_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i],[_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f, max_value, min_value,_f,_f,_f],[])
The easiest and probable most usable option would be to used your own function to filter on only digits. Example:
def return_digits(string):
return int(''.join([x for x in string if x.isdigit()]))

Prevent datetime.strptime from exit in case of format mismatch

I am parsing dates from a measurement file (about 200k lines). The format is a date and a measurement. The date format is "2013-08-07-20-46" or in time format "%Y-%m-%d-%H-%M". Ever so often the time stamp has a bad character. (The data came from a serial link which had interruptions). The entry would look like : 201-08-11-05-15 .
My parsing line to convert the time string into seconds is :
time.mktime(datetime.datetime.strptime(dt, "%Y-%m-%d-%H-%M").timetuple())
I got it online and don't fully understand how it work. (But it works)
My problem is to prevent the program from throwing error exit when a format mismatch happens. Is there a way to prevent the strptime to no exit but gracefully return an error flag in which case I would simple discard the data line and move on to the next. Yes, I can perform a pattern check with regexp but I was wondering if some smart mismatch handling is already built into strptime.
Append # Anand S Kumar
It worked for a few bad lines but then it failed.
fp = open('bmp085.dat', 'r')
for line in fp:
[dt,t,p]= string.split(line)
try:
sec= time.mktime(datetime.datetime.strptime(dt, "%Y-%m-%d-%H-%M").timetuple()) - sec0
except ValueError:
print 'Bad data : ' + line
continue #If you are doing this in a loop (looping over the lines) so that it moves onto next iteration
print sec, p ,t
t_list.append(sec)
p_list.append(p)
fp.close()
Output:
288240.0 1014.48 24.2
288540.0 1014.57 24.2
288840.0 1014.46 24.2
Bad data : �013-08-11-05-05 24.2! 1014.49
Bad data : 2013=0▒-11-05-10 �24.2 1014.57
Bad data : 201�-08-11-05-15 24.1 1014.57
Bad data : "0�#-08-1!-p5-22 24.1 1014.6
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: too many values to unpack
>>>
Append # Anand S Kumar
It crashed again.
for line in fp:
print line
dt,t,p = line.split(' ',2)
try:
sec= time.mktime(datetime.datetime.strptime(dt, "%Y-%m-%d-%H-%M").timetuple()) - sec0
except ValueError:
print 'Bad data : ' + line
continue #If you are doing this in a loop (looping over the lines) so that it moves onto next iteration
print sec, p ,t
Failed :
2013-08-11�06-t5 03/9 9014.y
Bad data : 2013-08-11�06-t5 03/9 9014.y
2013-08-11-06-50 (23. 1014.96
295440.0 (23. 1014.96
2013-08-11-06%55 23.9 !�1015.01
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
TypeError: must be string without null bytes, not str
>>> fp.close()
>>>
You can use try..except catching any ValueError and if any such value error occurs, move onto the next line. Example -
try:
time.mktime(datetime.datetime.strptime(dt, "%Y-%m-%d-%H-%M").timetuple())
except ValueError:
continue #If you are doing this in a loop (looping over the lines) so that it moves onto next iteration
If you are doing something else (maybe like a function call for each line , then return None or so in the except block)
The second ValueError you are getting should be occuring in line -
[dt,t,p]= string.split(line)
This issue is occur because there maybe a particular line that is resulting in more than 3 elements. One thing you can do for this would be to use the maxspplit argument from str.split() to split maximum 3 times. Example -
dt,t,p = line.split(None,2)
Or if you really want to use string.split() -
[dt,t,p]= string.split(line,None,2)
Or if you are not expecting space inside any of the fields, you can include the line causing the ValueError inside the try..except block and treat it as a bad line.
Use try - except in a for-loop:
for dt in data:
try:
print time.mktime(datetime.datetime.strptime(dt, "%Y-%m-%d-%H-%M").timetuple())
except ValueError:
print "Wrong format!"
continue
Output for data = ["1998-05-14-15-45","11998-05-14-15-45","2002-05-14-15-45"]:
895153500.0
Wrong format!
1021383900.0

Why am I getting an IndexError in Python 3 when indexing a string and not slicing?

I'm new to programming, and experimenting with Python 3. I've found a few topics which deal with IndexError but none that seem to help with this specific circumstance.
I've written a function which opens a text file, reads it one line at a time, and slices the line up into individual strings which are each appended to a particular list (one list per 'column' in the record line). Most of the slices are multiple characters [x:y] but some are single characters [x].
I'm getting an IndexError: string index out of range message, when as far as I can tell, it isn't. This is the function:
def read_recipe_file():
recipe_id = []
recipe_book = []
recipe_name = []
recipe_page = []
ingred_1 = []
ingred_1_qty = []
ingred_2 = []
ingred_2_qty = []
ingred_3 = []
ingred_3_qty = []
f = open('recipe-file.txt', 'r') # open the file
for line in f:
# slice out each component of the record line and store it in the appropriate list
recipe_id.append(line[0:3])
recipe_name.append(line[3:23])
recipe_book.append(line[23:43])
recipe_page.append(line[43:46])
ingred_1.append(line[46])
ingred_1_qty.append(line[47:50])
ingred_2.append(line[50])
ingred_2_qty.append(line[51:54])
ingred_3.append(line[54])
ingred_3_qty.append(line[55:])
f.close()
return recipe_id, recipe_name, recipe_book, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, \
ingred_3_qty
This is the traceback:
Traceback (most recent call last):
File "recipe-test.py", line 84, in <module>
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, ingred_3_qty = read_recipe_file()
File "recipe-test.py", line 27, in read_recipe_file
ingred_1.append(line[46])
The code which calls the function in question is:
print('To show list of recipes: 1')
print('To add a recipe: 2')
user_choice = input()
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, \
ingred_3, ingred_3_qty = read_recipe_file()
if int(user_choice) == 1:
print_recipe_table(recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty,
ingred_2, ingred_2_qty, ingred_3, ingred_3_qty)
elif int(user_choice) == 2:
#code to add recipe
The failing line is this:
ingred_1.append(line[46])
There are more than 46 characters in each line of the text file I am trying to read, so I don't understand why I'm getting an out of bounds error (a sample line is below). If I change to the code to this:
ingred_1.append(line[46:])
to read a slice, rather than a specific character, the line executes correctly, and the program fails on this line instead:
ingred_2.append(line[50])
This leads me to think it is somehow related to appending a single character from the string, rather than a slice of multiple characters.
Here is a sample line from the text file I am reading:
001Cheese on Toast Meals For Two 012120038005002
I should probably add that I'm well aware this isn't great code overall - there are lots of ways I could generally improve the program, but as far as I can tell the code should actually work.
This will happen if some of the lines in the file are empty or at least short. A stray newline at the end of the file is a common cause, since that comes up as an extra blank line. The best way to debug a case like this is to catch the exception, and investigate the particular line that fails (which almost certainly won't be the sample line you reproduced):
try:
ingred_1.append(line[46])
except IndexError:
print(line)
print(len(line))
Catching this exception is also usually the right way to deal with the error: you've detected a pathological case, and now you can consider what to do. You might for example:
continue, which will silently skip processing that line,
Log something and then continue
Bail out by raising a new, more topical exception: eg raise ValueError("Line too short").
Printing something relevant, with or without continuing, is almost always a good idea if this represents a problem with the input file that warrants fixing. Continuing silently is a good option if it is something relatively trivial, that you know can't cause flow-on errors in the rest of your processing. You may want to differentiate between the "too short" and "completely empty" cases by detecting the "completely empty" case early such as by doing this at the top of your loop:
if not line:
# Skip blank lines
continue
And handling the error for the other case appropriately.
The reason changing it to a slice works is because string slices never fail. If both indexes in the slice are outside the string (in the same direction), you will get an empty string - eg:
>>> 'abc'[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> 'abc'[4:]
''
>>> 'abc'[4:7]
''
Your code fails on line[46] because line contains fewer than 47 characters. The slice operation line[46:] still works because an out-of-range string slice returns an empty string.
You can verify that the line is too short by replacing
ingred_1.append(line[46])
with
try:
ingred_1.append(line[46])
except IndexError:
print('line = "%s", length = %d' % (line, len(line)))

re.search() & group not working in file with identical format to one that works. Python

Finding coding sequence
cds_position = ''
cds_start = 0
cds_end = 0
cds_sequence = ''
for line in data:
cds_temp = ''
if re.findall(r' CDS ',line):
cds_temp = cds_temp + line.replace('\n','')
position = re.search(r'(\d+)\.\.(\d+)',cds_temp)
cds_start = cds_start + int(position.group(1))
cds_end = cds_end + int(position.group(2))
cds_position = str(cds_start)+':'+str(cds_end)
cds_sequence = cds_sequence + sequence[(cds_start-1):(cds_end-1)]
I get this error
Traceback (most recent call last):
File "Upstream_ORF.py", line 357, in <module>
GenBank_Reader(test_file)
File "Upstream_ORF.py", line 317, in GenBank_Reader
cds_start = cds_start + int(position.group(1))
AttributeError: 'NoneType' object has no attribute 'group'
ok I really don't understand why I am getting this error.
i wrote a script that goes through a file of a particular format line by line and whenever it encounters a particular string followed by 10 spaces, it takes the number values that follow it
exon 1..1333
/gene="BRD2"
/gene_synonym="D6S113E; FSH; FSRG1; NAT; RING3; RNF3"
/inference="alignment:Splign:1.39.8"
/number=3
STS 350..463
/gene="BRD2"
/gene_synonym="D6S113E; FSH; FSRG1; NAT; RING3; RNF3"
/standard_name="CGCb278"
/db_xref="UniSTS:240930"
so whenever it finds the word exon followed by 10 spaces it takes the numberes flanking the '..'
it worked for 5 different files but for one of them it just isn't working and it is the exact same format. i'm not sure why its working now because it still works with the other ones. i found all the occurences it says 'exon' in the file and none of them were flanked by 10 spaces like the one i was looking for.
why would this error come up when it works for other files with the same format ?
If re.search returns None, that means that it failed to find a match. The file in question must have something different about it which causes the expression to fail.
Couple of little comments about your code:
if re.findall(r' CDS ',line): is unnecessary. Just do if ' CDS ' in line:, which does a substring search.
Instead of line.replace('\n','') you should use line.rstrip('\n'), as that is more typical.

Categories

Resources