Lists in Python - python

I noticed that you can multiply list by scalar but the behavior is weird
If I multiply:
[2,3,4] * 3
I get:
[2,3,4,2,3,4,2,3,4]
I understand the results but what it's good for? Is there any other weird operations like that?

The main purpose of this operand is for initialisation. For example, if you want to initialize a list with 20 equal numbers you can do it using a for loop:
arr=[]
for i in range(20):
arr.append(3)
An alternative way will be using this operator:
arr = [3] * 20
More weird and normal list operation on lists you can find here http://www.discoversdk.com/knowledge-base/using-lists-in-python

The operation has a use of creating arrays initialized with some value.
For example [5]*1000 means "create an array of length 1000 initialized with 5".
If you want to multiply each element by 3, use
map(lambda x: 3*x, arr)

Related

Assigning loop results into an array using Python

I'm copying some code over into Python from MATLAB and realised certain parts don't transfer as easily. I want to write a loop in Python where the size of the array is increased with every iteration. I.e., assign a new variable to a different index of the array. For sake of example, consider the vector in MATLAB as x = [1 2 3 4 5 6]. The resulting loop would be:
x = [];
for j = 1:6
x(j,1) = i;
end
Now, in Python I have
r_x = []
for i in range(1, 6):
x = i
r_x.append(x)
Surely there is a more efficient way to assign values to an array when iterating in Python? Why is it not possible to do x[i,1] = i (error: list indices must be integers or slices, not tuple) or r_x.append(x) = x (error: 'int' object has no attribute 'append')?
In Python you can't create list elements by assigning to an index. You grow it incrementally using the append() method.
r_x = []
for i in range(1, 7):
r_x.append(i)
print(r_x)
Python ranges don't include the ending number, so if you want 1 to 6 you have to use range(1, 7).
However, Python also has shortcuts. For example, list comprehensions:
r_x = [i for i in range(1, 7)]
Your two errors are explained by basic Python list and int properties
Make a list:
In [31]: x = [1,2,3]
In [32]: x[1,0] = 4
Traceback (most recent call last):
Input In [32] in <cell line: 1>
x[1,0] = 4
TypeError: list indices must be integers or slices, not tuple
When 1,0 is a tuple (same as (1,0)). Lists do not allow such an index. numpy arrays, do, but that's another story.
Lists can be indexed with a scalar, an integer, that's in the right range. Too large a index, and you'll get a list index out of range. Lists don't grow to accommodate the index.
In [33]: x[1]
Out[33]: 2
In [34]: x[1] = 4
In [35]: x
Out[35]: [1, 4, 3]
List append is relatively efficient. The underlying data structure has "head room" that permits such growth.
In [36]: x.append(4)
In [37]: x
Out[37]: [1, 4, 3, 4]
Your second error must have been the result of assigning an int to the variable:
In [40]: y = 3
In [41]: y.append(3)
Traceback (most recent call last):
Input In [41] in <cell line: 1>
y.append(3)
AttributeError: 'int' object has no attribute 'append'
Only a list class object has an append method.
The basic data structure in MATLAB is a 2d matrix. It has a known size. It's been years since I worked extensively in MATLAB, but at the time, we tried to avoid iterations, preferring to work with 'whole-matrix' approaches. I don't recall being able to grow a matrix by simply indexing new values. Even if we could, it wouldn't have been efficient. In newer MATLAB, there's a lot of jit compiling, that lets you get away with a lot of things that would been inefficient earlier. Still, I suspect there's a time penalty to such practices.
I wonder, for example, if this is faster:
x = zeros(6,1);
for j = 1:6
x(j,1) = i;
end
Or better yet
x = [1:6].'
numpy arrays are closer to MATLAB matrices in character. But there are enough differences to catch wayward MATLAB users off guard.

Most Efficient Way To Separate Out Data from List within a List

I have a function with outputs 2 arrays, lets call them X_i, and Y_i which are two N x 1 arrays, where N is the number of points. By using multiprocessing's pool.apply_aync, I was able to parallelize this function which gave me my results in a HUGE list. The structure of the results are a list of M values, where each value is a list containing X_i and Y_i. So in summary, I have a huge list which M smaller lists containing the two arrays X_i and Y_i.
Now I have want append all the X_i's into one array called X and Y_i's called Y. What is the most efficient way to do this? I'm looking for some sort of parallel algorithm. Order does NOT matter!
So far I have just a simple for loop that separates this massive data array:
X = np.zeros((N,1))
Y = np.zeros((N,1))
for i in range(len(results))
X = np.append(results[i][0].reshape(N,1),X,axis = 1)
Y = np.append(results[i][1].reshape(N,1),Y,axis = 1)
I found this algorthim to be rather slow, so I need to speed it up! Thanks!
You should provide a simple scenario of your problem, break it down and give us a simple input, output scenario, it would help a lot, as all this variables and text make it a bit confusing.Maybe this can help;
You can unpack the lists, then grab the ones you need by index, append the list to your new empty X[] and append the other list you needed to Y[], at the end get the arrays out of the lists and merge those into your new N dimensional array or into a new list.
list = [[[1,2],[3,4]],[[4,5],[6,7]]]
sub_pre = []
flat_list = []
for sublist in list:
sub_pre.append(sublist)
for item in sublist:
flat_list.append(item)
print(list)
print(flat_list)
Thanks to #JonSG for the brilliant insight. This type of sorting algorithm can be sped up using array manipulation. Through the use of most parallels packages, a function that outputs in multiple arrays will most likely get put into a huge list. Here I have a list called results, which contains M smaller lists of two N x 1 arrays.
To unpack the main array and sort all the X_i and Y_i into their own X and Y arrays respectively, it can be done so like this.
np.shape(results) = (M, 2, N)
X = np.array(results)[:,0,:]
Y = np.array(results)[:,1,:]
This gave me an 100x speed increase!

Find a deleted number from a list

Considering two lists (a sequence of numbers from 1 to N (arr) and the same sequence mixed but missing one number (mixed_arr)). The goal is to find the number that was deleted.
Example:
arr = [1,2,3,4,5]
mixed_arr = [3,4,1,5]
The output should be 2.
Return 0 if no number was deleted and if there is no difference
If no number was deleted from the array and no difference with it, the function has to return 0. Note that N may be 1 or less (in the latter case, the first array will be []).
Testcases:
arr = [1,2,3,4,5,6,7,8,9]
mixed_arr = [1,9,7,4,6,2,3,8]
output = 5
arr = [1,2,3,4,5,6,7,8,9]
mixed_arr = [5,7,6,9,4,8,1,2,3]
output = 0
Here's my code:
def find_deleted_number(arr, mixed_arr):
arr.sort()
mixed_arr.sort()
for x in range(arr):
for y in range(mixed_arr):
if arr[x] != mixed_arr[y]:
return arr[x]
elif arr[x] == mixed_arr[y]:
return 0
The error I get is:
Traceback:
in <module>
in find_deleted_number
TypeError: 'list' object cannot be interpreted as an integer
Why not to use set?
>>> arr = [1,2,3,4,5,6,7,8,9]
>>> mixed_arr = [1,9,7,4,6,2,3,8]
>>> list(set(arr) - set(mixed_arr))
[5]
This general solution will handle arrays with no constraints on integers, or on the size of lists (or the size of the difference).
Edit. In your (very) specific case with positive integers, and only one missing in the other array, it's much more efficient to use the solution from comments below:
>>> abs(sum(arr) - sum(mixed_arr))
5
You could simply use symmetric_difference(), a member of set:
set(arr).symmetric_difference(mixed_arr)
See also sets.
Please note, as pointed out by DeepSpace in the comments, if it is guaranteed that both lists contain integers from 1 to N, with the exception of one missing in one of those lists, the much more efficient solution is to compute the absolute value of the difference of the sum of both lists:
abs(sum(arr) - sum(mixed_arr))
The error you're getting:
You're getting the error because of this line:
for x in range(arr): # arr is a list, not an int
You surely intended to pass the length of the array:
for x in range(len(arr)):
The same goes for the inner loop.
Your problem is that you are trying to pass an list object to the range() function. But since range() only accepts integers, Python complains. You are also making the same mistake with mixed_arr.
But there are easier ways to accomplish this. In this case, you can use set()s to find the difference between two lists:
>>> set([1,2,3,4,5]) - set([3,4,1,5])
{2}
>>>
Or this could be done using a simple list comprehension:
>>> arr = [1,2,3,4,5]
>>> mixed_arr = [3,4,1,5]
>>>
>>> [el for el in arr if el not in mixed_arr]
[2]
>>>

Python matrix indexing

I have the following code
l = len(time) #time is a 300 element list
ll = len(sample) #sample has 3 sublists each with 300 elements
w, h = ll, l
Matrix = [[0 for x in range(w)] for y in range(h)]
for n in range(0,l):
for m in range(0,ll):
x=sample[m]
Matrix[m][n]= x
When I run the code to fill the matrix I get an error message saying "list index out of range" I put in a print statement to see where the error happens and when m=0 and n=3 the matrix goes out of index.
from what I understand on the fourth line of the code I initialize a 3X300 matrix so why does it go out of index at 0X3 ?
You need to change Matrix[m][n]= x to Matrix[n][m]= x
The indexing of nested lists happens from the outside in. So for your code, you'll probably want:
Matrix[n][m] = x
If you prefer the other order, you can build the matrix differently (swap w and h in the list comprehensions).
Note that if you're going to be doing mathematical operations with this matrix, you may want to be using numpy arrays instead of Python lists. They're almost certainly going to be much more efficient at doing math operations than anything you can write yourself in pure Python.
Note that indexing in nested lists in Python happens from outside in, and so you'll have to change the order in which you index into your array, as follows:
Matrix[n][m] = x
For mathematical operations and matrix manipulations, using numpy two-dimensional arrays, is almost always a better choice. You can read more about them here.

Dealing with multi-dimensional arrays when ndims not known in advance

I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]

Categories

Resources