list index out of range, python - python

I get an Error list index out of range when I'm trying to split a big list in to an array with arrays in it.
I have no idea why this is happening.
the end result of this code should be an array with arrays in it. So that i later can call for example val[5] and get 10 values.
I'm able to print val if the print statement is inside the for loop. And the code is working as it should. But if I move the print statement outside the for loop I get the Index out of range error.
import sys
import numpy as np
from numpy import array, random, dot
def vectorizeImages(filename):
file_object = open(filename)
lines = file_object.read().split()
#array to save all values that is digets.
strings = []
values = []
images = []
val = []
test=[]
#loop that checks if the position in the list contains a digit.
#If it does it will save the digit to the value-array.
for i in lines:
if i.isdigit():
strings.append(i)
#Converting all the strings to ints.
strings = list(map(int, strings))
#deviding every value in the array by 32
for i in strings:
a = i
values.append(a)
#splits large list into smaller lists and ads a 1
for i in range(len(values)):
a = []
for j in range(400):
a.append(values[i*400+j]) #ERROR:list index out of range
a.append(1)
val.append(a)

Your error is here: a.append(values[i*400+j]).
When i = 0, then you populate a and you end up with 400 elements. You do the full loop if your have at least 400 elements.
But at some point, you will ask for more elements than what you have in values, and then it fails because you loop and the biggest element you ask is len(values) * 400 - 1 which is obviously greater than your list size.

This is what I did to solve my problem (also improved my python skills)
def readImages(images):
lines = open(images).read().split()
images = []
values = [float(i)/32.0 for i in lines if i.isdigit()]
for i in range(len(values) / 400):
a = [values[i * 400 + j] for j in range(400)]
images.append(a)
return np.array(images)

Related

How do I end a while loop with a for loop in it?

Im trying to create a sequence of jobs, and put them in an array.
the coding works if I run the lines separately. The one problem is that it does not stop the while loop when count equals amountofmachines
it gives the error:
IndexError: list assignment index out of range
Im a bit new to python and used to Matlab. How can I end this while loop and make the code resume at the line a.sort()?
import random
import numpy as np
from random import randint
MachineNumber = 6 #amount of machines imported from Anylogic
JobNumber = 4 #amount of job sequences
JobSeqList = np.zeros((JobNumber,MachineNumber), dtype=np.int64)
amountofmachines = randint(1, MachineNumber) #dictated how much machines the order goes through
a = [0]*amountofmachines #initialize array of machines sequence
count = 0 #initialize array list of machines
element = [n for n in range(1, MachineNumber+1)]
while count <= amountofmachines:
a[count] = random.choice(element)
element.remove(a[count])
count = count + 1
a.sort() #sorts the randomized sequence
A = np.asarray(a) #make an array of the list
A = np.pad(A, (0,MachineNumber-len(a)), 'constant') #adds zeros to the end of sequence
#add the sequence to the array of all sequences
JobSeqList[0,:] = A[:]
I have tested your code, and found the answer!
Matlab indexes start at 1, so the first item in a list would be at 1..
However, python indexes start at 0, so the first item in a list would be at 0…
Change this line:
while count <= amountofmachines
To be:
while count < amountofmachines
Updated Code:
import random
import numpy as np
from random import randint
MachineNumber = 6 #amount of machines imported from Anylogic
JobNumber = 4 #amount of job sequences
JobSeqList = np.zeros((JobNumber,MachineNumber), dtype=np.int64)
amountofmachines = randint(1, MachineNumber) #dictated how much machines the order goes through
a = [0]*amountofmachines #initialize array of machines sequence
count = 0 #initialize array list of machines
element = [n for n in range(1, MachineNumber+1)]
while count < amountofmachines:
a[count] = random.choice(element)
element.remove(a[count])
count = count + 1
a.sort() #sorts the randomized sequence
A = np.asarray(a) #make an array of the list
A = np.pad(A, (0,MachineNumber-len(a)), 'constant') #adds zeros to the end of sequence
#add the sequence to the array of all sequences
JobSeqList[0,:] = A[:]
The problem with your while loop using < vs <= has already been answered, but I'd like to go a bit further and suggest that building a list in this way (by having a counter you increment or decrement manually) is something that's almost never done in Python in the first place, in the hope that giving you some more "pythonic" tools will help you avoid similar stumbling blocks in the future as you're getting used to Python. Python has really great tools for iterating over and building data structures that eliminate a lot of opportunities for minor errors like this, by taking all the "busy work" off of your shoulders.
All of this code:
a = [0]*amountofmachines #initialize array of machines sequence
count = 0 #initialize array list of machines
element = [n for n in range(1, MachineNumber+1)]
while count < amountofmachines:
a[count] = random.choice(element)
element.remove(a[count])
count = count + 1
a.sort() #sorts the randomized sequence
amounts to "build a sorted array of amountofmachines unique numbers taken from range(1, MachineNumber+1)", which can be more simply expressed using random.sample and sorted:
a = sorted(random.sample(range(1, MachineNumber + 1), amountofmachines))
Note that a = sorted(a) is the same as a.sort() -- sorted does a sort and returns the result as a list, whereas sort does an in-place sort on an existing list. In the line of code above, random.sample returns a list of random elements taken from the range, and sorted returns a sorted version of that list, which is then assigned to a.
If random.sample didn't exist, you could use random.shuffle and a list slice. This of this as shuffling a deck of cards (element) and then taking amountofmachines cards off the top before re-sorting them:
element = [n for n in range(1, MachineNumber+1)]
random.shuffle(element)
a = sorted(element[:amountofmachines])
If neither of those existed and you had to use random.choice to pick elements one by one, there are still easier ways to built a list through iteration; there's no need to statically pre-allocate the list, and there's no need to track your iteration with a counter you manage yourself, because for does that for you:
a = []
element = [n for n in range(1, MachineNumber+1)]
for i in range(amountofmachines):
a.append(random.choice(element))
element.remove(a[i])
a.sort()
To make it simpler yet, it's not necessary to even have the for loop keep track of i for you, because you can access the last item in a list with [-1]:
a = []
element = [n for n in range(1, MachineNumber+1)]
for _ in range(amountofmachines):
a.append(random.choice(element))
element.remove(a[-1])
a.sort()
and to make it simpler yet, you can use pop() instead of remove():
a = []
element = [n for n in range(1, MachineNumber+1)]
for _ in range(amountofmachines):
a.append(element.pop(random.choice(range(len(element)))))
a.sort()
which could also be expressed as a list comprehension:
element = [n for n in range(1, MachineNumber+1)]
a = [
element.pop(random.choice(range(len(element))))
for _ in range(amountofmachines)
]
a.sort()
or as a generator expression passed as an argument to sorted:
element = [n for n in range(1, MachineNumber+1)]
a = sorted(
element.pop(random.choice(range(len(element))))
for _ in range(amountofmachines)
)

How to find the averages of an argument which consists of nested arrays in python

This is the argument i would be passing to my logic.
var= '--a1=[[1,2,3],[]] --a2=4'
I need to find the average of these two arrays as mentioned below.
"1 4"
because [1,2,3] evaluates to 2, [] evaluates to 0 and the average of [2,0] is 1.
I have tried the code as given below,but its of no use.
var="--a1=[1,2,3] --a2=4.0"
args=var.split(' ')
args=[s[s.find('=') + 1:] for s in args]
print (args)
for elem in args:
for i in elem:
print (i)
Can someone please help me
Here is one way to do it:
(assuming you want the average, not the item located at the middle. If the middle is what you need, you should be able to adapt the following solution by yourself)
var="--a1=[1,2,3] --a2=4.0"
args=var.split(' ')
args=[s[s.find('=') + 1:] for s in args]
for elem in args:
# remove everything but numbers and commas (we don't care about nested arrays here)
# "[[[1,2,3],[]]" --> "1,2,3,"
stripped = "".join([ c for c in elem if c.isdigit() or c == ','])
# now, split by commas, and convert each number from string to int (ignore empty strings with `if d`)
# "1,2,3," --> [1,2,3]
numbers = [int(d) for d in stripped.split(",") if d]
# now, just compute the mean and print
avg = sum(numbers) / len(numbers)
print(avg)
I've utilized the inbuilt eval() function to solve your problem. I've tried my best to explain the code in the comments. Take a look at the code below:
def average(array): # returns the average of the values in an array
n = 0
if array == []:
return 0
else:
for i in array:
n += i
return n/len(array)
var= '--a1=[[1,2,3],[]] --a2=4'
start = var.find('[') # I used this method the find the first "[" in the string
end = len(var) - var[::-1].find(']') # I reversed the string to find the last "]" in the string
a1 = eval(var[start:end]) # converts the list in the form of string to a list
# for example "[1,2,3]" will be converted into [1,2,3] using the eval function
if var.count('[') == 1: # the code below requires a 2d array so this part converts the array to 2d if 1d is passed
a1 = [a1]
averages = [] # to store the averages of all of the lists
for array in a1:
averages.append(average(array)) # finding average for each array in the string
print(averages)

Get indexes for Subsample of list of lists

I have several lists of data in python:
a = [2,45,1,3]
b = [4,6,3,6,7,1,37,48,19]
c = [45,122]
total = [a,b,c]
I want to get n random indexes from them:
n = 7
# some code
result = [[1,3], [2,6,8], [0,1]] # or
result = [[0], [0,2,6,8], [0,1]] # or
result = [[0,1], [0,2,3,6,8], []] # or any other
The idea - it takes randomly any elements (indexes of that elements) from any arrays, but total count of them must be n.
So my idea - generate random indexes:
n = 7
total_len = sum([len(el) for el in total])
inds = random.sample(range(total_length), n))
But how then get such indexes?
I think about np.cumsum() and shift indixes after that but can't find elegant solution...
P.S.
Actually, I need to use it for loading data from a several csv files using skiprow option. So my idea - get indexes for every file, and this let me load only necessary rows from every file.
So my real task:
i have several csv files of different length and need to get n random rows from them.
My idea:
lengths = my_func_to_get_lengths_for_every_csv(paths) # list of lengths
# generate random subsamle of indexes
skip = ...
for ind, fil in enumerate(files):
pd.read_csv(fil, skiprows=skip[ind])
You could flatten the list first and then take your samples:
total_flat = [item for sublist in total for item in sublist]
inds = random.sample(total_flat , k=n)
Is this what you mean?
relative_inds = []
min_bound = 0
for lst in total:
relative_inds.append([i - min_bound for i in inds if min_bound <= i < min_bound + len(lst)])
min_bound += len(lst)

Improve runtime of python code with indexing and string concatenation

I tried hard to improve the runtime of the following code snippet, which turned out to be the CPU-bottleneck in an asyncio-client package that I'm developing:
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = '\t'.join(data)
return s
What I do is basically very simple:
elements is a list of str (each <= 7 characters) that I eventually write at specific positions into a tab-separated file.
indices is a list of int giving the positions of each of the elements in the file
If there is no element at a certain position, an empty string is inserted
I finally write the string into a text file using aiofiles.
So far, I tried to use a generator to create the data on the fly, as well as to use numpy for faster indexing, but with no success. Any idea how to make this code run faster would be great. Here is a reproducible example with timing:
import numpy as np
import timeit
n = 1_000_000 # total number of items
k = 500_000 # number of elements to insert
elements = ['ade/gua'] * k # elements to insert, <= 7 unicode characters
indices = list(range(0, n, 2)) # indices where to insert, sorted
assert len(elements) == len(indices)
# This is where I started
def baseline():
data = [''] * n
for i, ix in enumerate(indices):
data[ix] = elements[i]
s = '\t'.join(data)
return s
# Generate values on the fly
def generator():
def f():
it = iter(enumerate(indices))
i, ix = next(it)
for j in range(n):
if j == ix:
yield elements[i]
try:
i, ix = next(it)
except:
pass
else:
yield ''
s = '\t'.join(f()) # iterating though generation seem too costly
return s
# Harness numpy
indices_np = np.array(indices) # indices could also be numpy array
def numpy():
data = np.full(n, '', dtype='<U7')
data[indices_np] = elements # this is faster with numpy
s = '\t'.join(data) # much slower. array2string or savetxt does not help
return s
assert baseline() == generator() == numpy()
timeit.timeit(baseline, number=10) # 0.8463204780127853
timeit.timeit(generator, number=10) # 2.048296730965376 -> great job
timeit.timeit(numpy, number=10) # 4.486689139157534 -> life sucks
Edit 1
To address some of the points raised in the comments:
I write the string aiofiles.open(filename, mode='w') as file and file.write()
Indices can generally not be expressed as a range
Indices can assumed to be always sorted at no extra cost.
ASCII characters are sufficient
Edit 2
Based on the answer of Mad Physicist, I tried the following code, with no success.
def buffer_plumbing():
m = len(elements) # total number of data points to insert
k = 7 # each element is 7 bytes long, only ascii
total_bytes = n - 1 + m * 7 # total number of bytes for the buffer
# find out the number of preceeding gaps for each element
gap = np.empty_like(indices_np)
gap[0] = indices_np[0] # that many gaps a the beginning
np.subtract(indices_np[1:], indices_np[:-1], out=gap[1:])
gap[1:] -= 1 # subtract one to get the gaps (except for the first)
# pre-allocate a large enough byte buffer
s = np.full(total_bytes , '\t', dtype='S1')
# write element into the buffer
start = 0
for i, (g, e) in enumerate(zip(gap, elements)):
start += g
s[start: start + k].view(dtype=('S', k))[:] = e
start += k + 1
return s.tostring().decode('utf-8')
timeit.timeit(buffer_plumbing, number=10) # 26.82
You can pre-sort your data, after converting it to a pair of numpy arrays. This will allow you to manipulate a single pre-existing buffer rather than copying strings over and over as you reallocate them. The difference between my suggestion and your attempt is that we will use ndarray.tobytes (or ndarray.tostring) with the assumption that you only have ASCII characters. In fact, you can completely bypass the copy operation involved in converting into a bytes object by using ndarray.tofile directly.
If you have elements in-hand, you know that the total length of your line will be the combined length of the elements and n-1 tab separators. The start of an element in the full string is therefore it's index (the number of tabs that precede it) plus the cumulative length of all the elements that come before it. Here is a simple implementation of a single-buffer fill using mostly Python loops:
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
elements = np.array(elements, dtype='S7')
order = np.argsort(indices)
elements = elements[order]
indices = indices[order]
lengths = lengths[order]
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, '\t', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
There are lots of possible optimizations to play with here, such as the possibility of allocating s using np.empty and only filling in the required elements with tabs. This will be left as an excise for the reader.
Another possibility is to avoid converting elements to a numpy array entirely (it probably just wastes space and time). You can then rewrite the for loop as
for i, l, o in zip(cumulative, lengths, order):
s[i:i + l].view(dtype=('S', l))[:] = elements[o]
You can dump the result into a bytes object with
s = s.tobytes()
OR
s = s.tostring()
You can write the result as-is to a file opened for binary writing. In fact, if you don't need a copy of the buffer in the form of a bytes, you can just write to the file directly:
s.tofile(f)
That will save you some memory and processing time.
In the same vein, you may be better off just writing to a file directly, piece by piece. This saves you not only the need to allocate the full buffer, but also the cumulative lengths. In fact, the only thing you need this way is the diff of successive indices to tell you how many tabs to insert:
indices = np.asanyarray(indices)
order = np.argsort(indices)
indices = indices[order]
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, o in zip(tabs, order):
f.write('\t' * t)
f.write(elements[o])
f.write('\t' * (n - indices[-1] - 1))
This second approach has two major advantages besides the reduced amount of calculation. The first is that it works with unicode characters rather than ASCII only. The second is that it does not allocate any buffers besides strings of tabs, which should be extremely fast.
In both cases, having elements and indices sorted into ascending order by index will speed things up dramatically. The first case reduces to
lengths = np.array([len(e) for e in elements])
indices = np.asanyarray(indices)
cumulative = np.empty_like(lengths)
cumulative[0] = 0
np.cumsum(lengths[:-1], out=cumulative[1:])
cumulative += lengths
s = np.full(cumulative[-1] + n - 1, '\t', dtype='S1')
for i, l, e in zip(cumulative, lengths, elements):
s[i:i + l].view(dtype=('S', l))[:] = e
And the second becomes just
indices = np.asanyarray(indices)
tabs = np.empty_like(indices)
tabs[0] = indices[0]
np.subtract(indices[1:], indices[:-1], out=tabs[1:])
tabs[1:] -= 1
for t, e in zip(tabs, elements):
f.write('\t' * t)
f.write(e)
f.write('\t' * (n - indices[-1] - 1))

Issues with printing in python a list of arrays

I'm new in the magic python world.
I have a problem: I save data from a file in a list of array (13 arrays of 130 element each) but when I try to print the values, only the same values with bracket are printed .
I defined the list and the array in this way:
List = []
v = np.ndarray(shape=(130,1),dtype=float)
After I fill my data structure from the file
f=open(filename, 'r').readlines()
k = 0
for i in range(0,13):
for j in range(0,130):
v[j] = float(f[k])
k += 1
List.append(v)
In the file I have for each line a float value and the total length is 13*130.
I don't even know if I feed correctly my data structure, what I only know is that the function that I would like to use expect a list of array instead of a matrix.
I tried different way for seeing if I saved in a correct way the data.
Do you have any suggestion?
You need to recreate v in every iteration of the outer loop:
for i in range(0,13):
v = np.ndarray(shape=(130,1),dtype=float)
for j in range(0,130):
v[j] = float(f[k])
k += 1
List.append(v)
Otherwise, you're just updating the same reference over and over again (list.append only appends the reference to the list, does not copy the list in question).
If your only problem is printing without the brackets, then I think what you want (see me comment on the post) is the following:
for row in list:
for item in row:
print item,
print "" # print a newline
You can try: for i in List: print(i)
If you do not want the brackets to be displayed : for i in a: print(' '.join(map(str,i)))
But if you want to have constant row widths, create a function that returns a string of the float of constant length :
def auxString(float, length):
s = str(float)
s+=(length-len(s))*' '
return s
string = lambda length: lambda float: auxString(float,length)
for i in List:
print(' '.join(map(string(5),i))) #if the maximum length of your string of float is 5

Categories

Resources