List index out of range while creating jagged array - python

All, I am trying to create a jagged list in Python 3.x. Specifically, I am pulling a number of elements from a list of webpages using Selenium. Each row of my jagged list ("matrix") represents the contents of one of these said webpages. Each of these rows should have as many columns as there are elements pulled from its respective webpage - this number will vary from page to page.
e.g.
webpage1 has 3 elements: a,b,c
webpage2 has 6 elements: d,e,f,g,h,i
webpage3 has 4 elements: j,k,l,m
...
would look like:
[[a,b,c],
[d,e,f,g,h,i],
[j,k,l,m],...]
Here's my code, thus far:
from selenium import webdriver
chromePath = "/Users/me/Documents/2018/chromedriver"
browser = webdriver.Chrome(chromePath)
url = 'https://us.testcompany.com/eng-us/women/handbags/_/N-r4xtxc/to-1'
browser.get(url)
hrefLinkArray = []
hrefElements = browser.find_elements_by_class_name("product-item")
for eachOne in hrefElements:
hrefLinkArray.append(eachOne.get_attribute('href'))
pics = [[]]
for y in range(0, len(hrefLinkArray)): # or type in "range(0, 1)" to debug
browser.get(hrefLinkArray[y])
productViews = browser.find_elements_by_xpath("// *[ # id = 'lightSlider'] / li")
b = -1
for a in productViews:
b = b + 1
# print(y) for debugging
# print(b) for debugging
pics[y][b] = a.get_attribute('src') # <------------ ERROR!
# pics[y][b].append(a.get_attribute('src') GIVES SAME ERROR AS ABOVE
del productViews[:]
browser.quit()
Whenever I run this, I get an error on the first iteration of the a in productViews loop:
line 64, in <module>
pics[y][b] = a.get_attribute('src')
IndexError: list assignment index out of range
From what I can tell, the the integer references are correct (see my debugging lines in the for a in productViews loop), so pics[0][0] is a proper way to reference the jagged list. This being said, I have a feeling pics[0][0] does not yet exist? Or maybe only pics[0] does? I've seen similar posts about this error, but the only solution I've understood seems to be using .append(), and even as such, using this on a 1D list. As you can see in my code, I've used .append() for the hrefLinkArray successfully, whereas it appears unsuccessful on line 64/65. I'm stumped as to why this might be.
Please let me know:
Why my lines .append() and [][]=... are throwing this error.
If there is a more efficient way to accomplish my goal, I'd like to learn!
UPDATE: using #User4343502's answer, in conjunction with #StephenRauch's input, the error resolved and I now and getting the intended-sized jagged list! My amended code is:
listOfLists = []
for y in range(0, len(hrefLinkArray)):
browser.get(hrefLinkArray[y])
productViews = browser.find_elements_by_xpath("// *[ # id = 'lightSlider'] / li")
otherList = []
for other in productViews:
otherList.append(other.get_attribute('src'))
# print(otherList)
listOfLists.append(otherList)
del otherList[:]
del productViews[:]
print(listOfLists)
Note, this code prints a jagged list of totally empty indices e.g. [[][],[][][][],[],[][][],[][],[][][][][]...], but that is a separate issue - I believe related to my productViews object and how it retrieves by xpath... What's important, though, is that my original question was answered. Thanks!

list.append will add an element into a list. This works regardless of what the element is.
a = [1, 2, 3]
b = [float, {}]
c = [[[None]]]
## We will append to this empty list
list_of_lists = []
for x in (a, b, c):
list_of_lists.append(x)
## Prints: [[1, 2, 3], [<type 'float'>, {}], [[[None]]]]
print(list_of_lists)
Try it Online!

Related

Python insertion sorting a csv by row

My objective is to use an insertion sort to sort the contents of a csv file by the numbers in the first column for example I want this:
[[7831703, Christian, Schmidt]
[2299817, Amber, Cohen]
[1964394, Gregory, Hanson]
[1984288, Aaron, White]
[9713285, Alexander, Kirk]
[7025528, Janice, Lee]
[6441979, Sarah, Browning]
[8815776, Rick, Wallace]
[2395480, Martin, Weinstein]
[1927432, Stephen, Morrison]]
and sort it to:
[[1927432, Stephen, Morrison]
[1964394, Gregory, Hanson]
[1984288, Aaron, White]
[2299817, Amber, Cohen]
[2395480, Martin, Weinstein]
[6441979, Sarah, Browning]
[7025528, Janice, Lee]
[7831703, Christian, Schmidt]
[8815776, Rick, Wallace]
[9713285, Alexander, Kirk]]
based off the numbers in the first column within python my current code looks like:
import csv
with open('EmployeeList.csv', newline='') as File:
reader = csv.reader(File)
readList = list(reader)
for row in reader:
print(row)
def insertionSort(readList):
#Traverse through 1 to the len of the list
for row in range(len(readList)):
# Traverse through 1 to len(arr)
for i in range(1, len(readList[row])):
key = readList[row][i]
# Move elements of arr[0..i-1], that are
# greater than key, to one position ahead
# of their current position
j = i-1
while j >=0 and key < readList[row][j] :
readList[row] = readList[row]
j -= 1
readList[row] = key
insertionSort(readList)
print ("Sorted array is:")
for i in range(len(readList)):
print ( readList[i])
The code can already sort the contents of a 2d array, but as it is it tries to sort everything.
I think if I got rid of the [] it would work but in testing it hasn't given what I needed.
To try to clarify again I want to sort the rows positions based off of the first columns numerical value.
Sorry if I didn't understand your need right. But you have a list and you need to sort it? Why you don't you just use sort method in list object?
>>> data = [[7831703, "Christian", "Schmidt"],
... [2299817, "Amber", "Cohen"],
... [1964394, "Gregory", "Hanson"],
... [1984288, "Aaron", "White"],
... [9713285, "Alexander", "Kirk"],
... [7025528, "Janice", "Lee"],
... [6441979, "Sarah", "Browning"],
... [8815776, "Rick", "Wallace"],
... [2395480, "Martin", "Weinstein"],
... [1927432, "Stephen", "Morrison"]]
>>> data.sort()
>>> from pprint import pprint
>>> pprint(data)
[[1927432, 'Stephen', 'Morrison'],
[1964394, 'Gregory', 'Hanson'],
[1984288, 'Aaron', 'White'],
[2299817, 'Amber', 'Cohen'],
[2395480, 'Martin', 'Weinstein'],
[6441979, 'Sarah', 'Browning'],
[7025528, 'Janice', 'Lee'],
[7831703, 'Christian', 'Schmidt'],
[8815776, 'Rick', 'Wallace'],
[9713285, 'Alexander', 'Kirk']]
>>>
Note that here we have first element parsed as integer. It is important if you want to sort it by numerical value (99 comes before 100).
And don't be confused by importing pprint. You don't need it to sort. I just used is to get nicer output in console.
And also note that List.sort() is in-place method. It doesn't return sorted list but sorts the list itself.
*** EDIT ***
Here is two different apporach to sort function. Both could be heavily optimized but I hope you get some ideas how this can be done. Both should work and you can add some print commands in loops to see what happens there.
First recursive version. It orders the list a little bit on every run until it is ordered.
def recursiveSort(readList):
# You don't want to mess original data, so we handle copy of it
data = readList.copy()
changed = False
res = []
while len(data): #while 1 shoudl work here as well because eventually we break the loop
if len(data) == 1:
# There is only one element left. Let's add it to end of our result.
res.append(data[0])
break;
if data[0][0] > data[1][0]:
# We compare first two elements in list.
# If first one is bigger, we remove second element from original list and add it next to the result set.
# Then we raise changed flag to tell that we changed the order of original list.
res.append(data.pop(1))
changed = True
else:
# otherwise we remove first element from the list and add next to the result list.
res.append(data.pop(0))
if not changed:
#if no changes has been made, the list is in order
return res
else:
#if we made changes, we sort list one more time.
return recursiveSort(res)
And here is a iterative version, closer your original function.
def iterativeSort(readList):
res = []
for i in range(len(readList)):
print (res)
#loop through the original list
if len(res) == 0:
# if we don't have any items in our result list, we add first element here.
res.append(readList[i])
else:
done = False
for j in range(len(res)):
#loop through the result list this far
if res[j][0] > readList[i][0]:
#if our item in list is smaller than element in res list, we insert it here
res.insert(j, readList[i])
done = True
break
if not done:
#if our item in list is bigger than all the items in result list, we put it last.
res.append(readList[i])
print(res)
return res

Python ( iteration problem ) with an exercice

The code :
import pandas as pd
import numpy as np
import csv
data = pd.read_csv("/content/NYC_temperature.csv", header=None,names = ['temperatures'])
np.cumsum(data['temperatures'])
printcounter = 0
list_30 = [15.22]#first temperature , i could have also added it by doing : list_30.append(i)[0] since it's every 30 values but doesn't append the first one :)
list_2 = [] #this is for the values of the subtraction (for the second iteration)
for i in data['temperatures']:
if (printcounter == 30):
list_30.append(i)
printcounter = 0
printcounter += 1
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
list_2.append(substraction)
print(max(list_2))
Hey guys ! i'm really having trouble with the black part.
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
I'm trying to iterate over the elements and sub stracting element x with the next element (x+1) but the following error pops out TypeError: 'float' object is not iterable. I have also tried to iterate using x instead of list_30[x] but then when I use next(x) I have another error.
for x in list_30: will iterate on list_30, and affect to x, the value of the item in the list, not the index in the list.
for your case you would prefer to loop on your list with indexes:
index = 0
while index < len(list_30):
substract = list_30[index] - list_30[index + 1]
edit: you will still have a problem when you will reach the last element of list_30 as there will be no element of list_30[laste_index + 1],
so you should probably stop before the end with while index < len(list_30) -1:
in case you want the index and the value, you can do:
for i, v in enumerate(list_30):
substract = v - list_30[i + 1]
but the first one look cleaner i my opinion
if you`re trying to find ifference btw two adjacent elements of an array (like differentiate it), you shoul probably use zip function
inp = [1, 2, 3, 4, 5]
delta = []
for x0,x1 in zip(inp, inp[1:]):
delta.append(x1-x0)
print(delta)
note that list of deltas will be one shorter than the input

Creating heap algorithm to output a list of all permutations, in base python without other modules

I am trying to build an algorithm which will output a list of all permutations of an inputted string and I am getting very lost, especially when it comes to heap algorithm. I tried to copy the code listed on the Wikipedia page to no avail. I want a solution in base Python.
# Desired output
heaps_func('art')
['rta', 'tra', 'tar', 'rat', 'art', 'atr']
# Current code
def heaps_func(a):
lst=[a]
l=len(a)
if len(a)==1:
return lst
else:
for x in range(len(a)-1):
if x<(l-1):
if l%2==0:
k=list(a)
p=k[i]
k[i]=k[l-1]
k[l-1]=p
k=''.join(k)
lst.append(k)
else:
k=list(a)
p=k[0]
k[0]=k[l-1]
k[l-1]=p
k=''.join(k)
lst.append(k)
return lst
You can do it by using recursion. Here I am goiing to add the python code for you.
def heaps_func(a,size):
if size ==1:
a = ''.join(a)
print(a)
return
for i in range(size):
heaps_func(a,size-1)
if size%2==1:
a[0],a[size-1]=a[size-1],a[0]
else:
a[i], a[size - 1] = a[size - 1], a[i]
heaps_func(list('art'),3)
If the given string contains duplicate characters, this program will also print duplicate element. For example, in string "arr", 'r' contains two times. And output of this program will be :
arr rar rar arr rra rra
To get rid of this, we can use a list and before printing we will search on that list whether this element exist in the list or not. If not exist then we will print it and store it on the list.
Programs:
def heaps_func(a,size,listofwords):
if size ==1:
a = ''.join(a)
#print(a)
if listofwords.count(a)==0:
print(a)
listofwords.append(a)
return
for i in range(size):
heaps_func(a,size-1,listofwords)
if size%2==1:
a[0],a[size-1]=a[size-1],a[0]
else:
a[i], a[size - 1] = a[size - 1], a[i]
listofwords=[]
heaps_func(list('arr'),len('arr'),listofwords)
for details please read the following link . But there it is described in C/C++.
https://www.geeksforgeeks.org/heaps-algorithm-for-generating-permutations/

Extract field using Python

I am looking for help to extract one specific field.
Here is the example, I am not able to split and cut based on field number because number may change due to the content change
Example 1
[["cn","Phone",1,"","LI(\"\")","0","19%","",""],["OS_DisplayName","Display Name",1,"","LI(\"\")","1,0","19%","",""],["OS_ProductPackage","Product Package",1,"","CO(\"\",\"REQ;1_BASIC!OS!TRV;2_Messaging!OS!OEM;3_Extended!OS!EAC;4_Enhanced!OS!APO;5_Analog Port!OS!CCA;6_Contact Center Agent\",\"\",\";\",\"\",\"\")","2,0","19%","",""],["sn","Last name",1,"","LI(\"\")","3,0","12%","",""],["givenName","First name",1,"","LI(\"\")","4,0","12%","",""],["OS_SiteCode","Site Code",1,"","LI(\"\")","5,0","19%","",""]],[["917845678923","Backup","OEM","917845678923","","CNdd_RD_91784567","","cn=917845678923,cn=Subscribers,cn=np_CNdd_RnD_WangJing,cn=IPC_APAC_1_01,cn=DN,cn=Resources,cn=Users,cn=OS"]],
Output should be
cn=917845678923,cn=Subscribers,cn=np_CNdd_RnD_WangJing,cn=IPC_APAC_1_01,cn=DN,cn=Resources,cn=Users,cn=OS
Example 2
[["cn","Phone",1,"","LI(\"\")","0","19%","",""],["OS_DisplayName","Display Name",1,"","LI(\"\")","1,0","19%","",""],["OS_ProductPackage","Product Package",1,"","CO(\"\",\"REQ;1_BASIC!OS!TRV;2_Messaging!OS!OEM;3_Extended!OS!EAC;4_Enhanced!OS!APO;5_Analog Port!OS!CCA;6_Contact Center Agent\",\"\",\";\",\"\",\"\")","2,0","19%","",""],["sn","Last name",1,"","LI(\"\")","3,0","12%","",""],["givenName","First name",1,"","LI(\"\")","4,0","12%","",""],["OS_SiteCode","Site Code",1,"","LI(\"\")","5,0","19%","",""]],[["868694755000","Yaeng Danning","EAC","Yaeng","Dainning","CNdd_DT_86869475","","cn=868694755000,cn=Subscribers,cn=np_CNdd_DN,cn=IPC_APAC_1_01,cn=DN,cn=Resources,cn=Users,cn=OS"]],
Output should be
cn=868694755000,cn=Subscribers,cn=np_CNdd_DN,cn=IPC_APAC_1_01,cn=DN,cn=Resources,cn=Users,cn=OS
Can someone help me on this.
i tried below code but i am not able to use constant filed number (e[8]) due to field number change
e = line3.split('","","')
print "e"
print e
e = e[8].replace('"]],','').replace('","','').strip()
print "e:" ,e
You could flatten the list and then search through it.
myList = (['one', 'two', ['cn=blahblah', 4, [5],['hi']], [6, [[[7, 'hello']]]]])
def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i
flattenedList = list(flatten(myList))
for x in flattenedList:
if str(x).startswith('cn='):
print(x)
If you are guaranteed the cn field is the very last, you could do something like:
cnFields = array [-1][-1]
and then parse it as you see fit.
Otherwise, you'll need to iterate through the 2d array until you find a string that starts with cn=.

2d list not working

I am trying to create a 2D list, and I keep getting the same error "TypeError: list indices must be integers, not tuple" I do not understand why, or how to use a 2D list correctly.
Total = 0
server = xmlrpclib.Server(url);
mainview = server.download_list("", "main")
info = [[]]
info[0,0] = hostname
info[0,1] = time
info[0,2] = complete
info[0,3] = Errors
for t in mainview:
Total += 1
print server.d.get_hash(t)
info[Total, 0] = server.d.get_hash(t)
info[Total, 1] = server.d.get_name(t)
info[Total, 2] = server.d.complete(t)
info[Total, 3] = server.d.message(t)
if server.d.complete(t) == 1:
Complete += 1
else:
Incomplete += 1
if (str(server.d.message(t)).__len__() >= 3):
Error += 1
info[0,2] = Complete
info[0,3] = Error
everything works, except for trying to deal with info.
Your mistake is in accessing the 2D-list, modify:
info[0,0] = hostname
info[0,1] = time
info[0,2] = complete
info[0,3] = Errors
to:
info[0].append(hostname)
info[0].append(time)
info[0].append(complete)
info[0].append(Errors)
Same goes to info[Total, 0] and etc.
The way you created info, it is a list containing only one element, namely an empty list. When working with lists, you have to address the nested items like
info[0][0] = hostname
For initialization, you have to create a list of lists by e.g.
# create list of lists of 0, size is 10x10
info = [[0]*10 for i in range(10)]
When using numpy arrays, you can address the elements as you did.
One advantage of "lists of lists" is that not all entries of the "2D list" shall have the same data type!
info = [[] for i in range(4)] # create 4 empty lists inside a list
info[0][0].append(hostname)
info[0][1].append(time)
info[0][2].append(complete)
info[0][3].append(Errors)
You need to create the 2d array first.

Categories

Resources