Python List: IndexError: list index out of range - python

When I try to print splited_data[1] I'm getting error message IndexError: list index out of range, On the other hand splited_data[0] is working fine.
I want to insert data into MySQL. splited_data[0] are my MySQL columns and splited_data[1] is mysql column values. I want if splited_data[1] is empty then insert empty string in mysql. But I'm getting IndexError: list index out of range. How to avoid this error? Please help me. thank you
Here is my code. Which is working fine. I'm only get this error message when splited_data[1] is empty.
def clean(data):
data = data.replace('[[','')
data = data.replace(']]','')
data = data.replace(']','')
data = data.replace('[','')
data = data.replace('|','')
data = data.replace("''",'')
data = data.replace("<br/>",',')
return data
for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
if m:
k = m.group(1)
k.encode('utf-8')
clean_data = clean(k) #Clean function is used to replace garbase data from text
filter_data = clean_data.splitlines(True) # splited data with lines
filter_data.pop(0)
for index,item in enumerate(filter_data):
splited_data = item.split(' = ',1)
print splited_data[0],splited_data[1]
# splited_data[0] used as mysql column
# splited_data[1] used as mysql values
here is Splited_data data
[u' music ', u'Jatin Sharma\n']
[u' cinematography', u'\n']
[u' released ', u'Film datedf=y201124']

split_data = item.partition('=')
# If there was an '=', then it is now in split_data[1],
# and the pieces you want are split_data[0] and split_data[2].
# Otherwise, split_data[0] is the whole string, and
# split_data[1] and split_data[2] are empty strings ('').

Try removing the whitespace on both sides of the equals sign, like this:
splited_data = item.split('=',1)

A list is contiguous. So you need to make sure it's length is greater than your index before you try to access it.
'' if len(splited_data) < 2 else splited_data[1]
You could also check before you split:
if '=' in item:
col, val=item.split('=',1)
else:
col, val=item, ''

Related

Parsing a list of lists and manipulating it in place

So I have a list of lists that I need to parse through and manipulate the contents of. There are strings of numbers and words in the sublists, and I want to change the numbers into integers. I don't think it's relevant but I'll mention it just in case: my original data came from a CSV that I split on newlines, and then split again on commas.
What my code looks like:
def prep_data(data):
list = data.split('\n') #Splits data on newline
list = list[1:-1] #Gets rid of header and last row, which is an empty string
prepped = []
for x in list:
prepped.append(x.split(','))
for item in prepped: #Converts the item into an int if it is able to be converted
for x in item:
try:
item[x] = int(item[x])
except:
pass
return prepped
I tried to loop through every sublist in prepped and change the type of the values in them, but it doesn't seem like the loop does anything as the prep_data returns the same thing as it did before I implemented that for loop.
I think I see what is wrong, you are thinking python is more generous with it's assignment than it actually is.
def prep_data(data):
list = data.split('\n') #Splits data on newline
list = list[1:-1] #Gets rid of header and last row, which is an empty string
prepped = []
for x in list:
prepped.append(x.split(','))
for i in prepped: #Converts the item into an int if it is able to be converted
item = prepped[i]
for x in item:
try:
item[x] = int(item[x])
except:
pass
prepped[i] = item
return prepped
I can't run this on the machine I'm on right now but it seems the problem is that "prepped" wasn't actually receiving any new assignments, you were just changing values in the sub array "item"
I'm not sure about your function, because maybe I didn't understand your income data, but you could try something like the following because if you only pass, you could lose string or weird data:
def parse_data(raw_data):
data_lines = raw_data.split('\n') #Splits data on newline
data_rows_without_header = data_lines[1:-1] #Gets rid of header and last row, which is an empty string
parsed_date = []
for raw_row in data_rows_without_header:
splited_row = raw_line.split(',')
parsed_row = []
for value in splited_row:
try:
parsed_row.append(int(value)
except:
print("The value '{}' is not castable".format(value))
parsed_row.append(value) # if cast fails, add the string as it is
parsed_date.append(parsed_row)
return parsed_date

python searching for string starting with letter

data = cursor.fetchone ( )
while data is not None:
data = str(data)
print(data)
data.split(",")
for index in data:
if index.startswith("K"):
print(index, end=';')
data = cursor.fetchone()
Here is the relevant part of my code. The data is retrieved from a mysql server, and is a long string of text separated by commas. I can split the string with the commas fine, however, I then need to search for 4 letter strings. I know they start with K, but when I run the program it only prints the K. How do I get it to print the whole string.
Sample Data:
"N54,W130,KRET,KMPI,COFFEE"
Expected output:
"N54,W130,KRET,KMPI,COFFEE"
"KRET;KMPI;"
Your line data.split(",") does nothing because you need to assign that value. You also said you want to only print the K? If so you only want to print the first character of the string so d[0]
data = cursor.fetchone ( )
while data is not None:
data = str(data)
print(data)
data = data.split(",")
for d in data:
if d.startswith("K"):
print(d, end=';')
data = cursor.fetchone()
EDIT: Based on your edit it seems you want to entire string to be printed so I updated it for that
If you are looking for 4 letters string starting with "K", What about using regular expressions?
import re
regex = r"K[a-zA-Z0-9]{3}"
test_str="N54,W130,KRET,KMPI,COFFEE"
matches = re.finditer(regex,test_str)
output=""
for matchNum, match in enumerate(matches):
output+=match.group()+";"
print(output)
The output is: KRET;KMPI;

Getting slashes and letters while I only want number

I am using the following code to bring back prices from an ecommerce website:
response.css('div.price.regularPrice::text').extract()
but getting the following result:
'\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t',
I do not want the slashes and letters and only the number 5. How do I get this?
First you can use strip() to remove tabs "\t" and enters "\n".
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip() for item in data]
and you get
['Dhs 5.00', '']
Next you can use if to skip empty elements
data = [item for item in data if item]
and you get
['Dhs 5.00']
If item always has the same structure Dns XXX.00
then you can use slicing [4:-3] to remove "Dhs " and ".00"
data = [item[4:-3] for item in data]
and you get
['5']
So now you have to only get first element data[0] to get 5.
If you need you can convert string "5" to integer 5 using int()
result = int(data[0])
You can even put all in one line
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip()[4:-3] for item in data if item.strip()]
result = int(data[0])
If you always need only first element from list then you can write it
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
result = int( data[0].strip()[4:-3] )
Use regex to fetch only the numbers.
\d+ regex expression should do the trick.

Remove text:u from strings in python

I am using xlrd library to import values from excel file to python list.
I have a single column in excel file and extracting data row wise.
But the problem is the data i am getting in list is as
list = ["text:u'__string__'","text:u'__string__'",.....so on]
How can i remove this text:u from this to get natural list with strings ?
code here using python2.7
book = open_workbook("blabla.xlsx")
sheet = book.sheet_by_index(0)
documents = []
for row in range(1, 50): #start from 1, to leave out row 0
documents.append(sheet.cell(row, 0)) #extract from first col
data = [str(r) for r in documents]
print data
Iterate over items and remove extra characters from each word:
s=[]
for x in list:
s.append(x[7:-1]) # Slice from index 7 till lastindex - 1
If that's the standard input list you have, you can do it with simple split
[s.split("'")[1] for s in list]
# if your string itself has got "'" in between, using regex is always safe
import re
[re.findall(r"u'(.*)'", s)[0] for s in list]
#Output
#['__string__', '__string__']
I had the same problem. Following code helped me.
list = ["text:u'__string__'","text:u'__string__'",.....so on]
for index, item in enumerate(list):
list[index] = list[index][7:] #Deletes first 7 xharacters
list[index] = list[index][:-1] #Deletes last character

while iterating if statement wont evaluate

this little snippet of code is my attempt to pull multiple unique values out of rows in a CSV. the CSV looks something like this in the header:
descr1, fee part1, fee part2, descr2, fee part1, fee part2,
with the descr columns having many unique names in a single column. I want to take these unique fee names and make a new header out of them. to do this I decided to start by getting all the different descr columns names, so that when I start pulling data from the actual rows I can check to see if that row has a fee amount or one of the fee names I need. There are probably a lot of things wrong with this code, but I am a beginner. I really just want to know why my first if statement is never triggered when the l in fin does equal a comma, I know it must at some point as it writes a comma to my row string. thanks!
row = ''
header = ''
columnames = ''
cc = ''
#fout = open(","w")
fin = open ("raw data.csv","rb")
for l in fin:
if ',' == l:
if 'start of cust data' not in row:
if 'descr' in row:
columnames = columnames + ' ' + row
row = ''
else:
pass
else:
pass
else:
row = row+l
print(columnames)
print(columnames)
When you iterate over a file, you get lines, not characters -- and they have the newline character, \n, at the end. Your if ',' == l: statement will never succeed because even if you had a line with only a single comma in it, the value of l would be ",\n".
I suggest using the csv module: you'll get much better results than trying to do this by hand like you're doing.

Categories

Resources