Say that I have an array map[x][y] where I don't know the height or width of the array. array.length() will return the overall height *width, but what if I want to know what the height and width are independently? Is that possible through python inbuilt utilities?
I imagine you're trying to code some kind of game on a two-dimensional map.
Python doesn't have multidimensional arrays in the C or C++ sense (which are, themselves, just syntax sugar around a 1D array.) Instead, Python has lists, which are a strictly one-dimensional affair.
You can fake a two-dimensional array by creating a list which contains other lists. Like so:
width = 10
height = 10
map = [ [None]*width for i in range(height) ]
And you can get the width and height by:
height = len(map)
width = len(map[0])
This will only give the expected result if every sublist of map is the same length, i.e. if the map is a rectangular list of lists. Python will not enforce this restriction for you (why would it?) so you will have to enforce it yourself.
As stated in the other answer, numpy has true N-dimensional arrays optimised for numerical math.
Sidenote: note the use of the list comprehension map = [ [None]*width for i in range(height)] as opposed to the following:
map = [ [None]*width ] * height
which has unexpected effects:
>>> a = [ [0] * 3 ] * 3 #create a 3x3 array of zeroes
>>> a
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]
>>> a[1][1] = 9 #change the number in the middle of the grid to '9'
>>> a
[[0, 9, 0], # wtf?
[0, 9, 0],
[0, 9, 0]] # wtf?
This is because the [list] * n operator for arrays doesn't make n new copies of list; instead, it makes n copies of a reference to list. Changing one of the list references will change them all.
array is a single dimensional array. If you want multiple dimensional, take a look at the standard Python list. See http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
Note that in the case of a list, you can get the "height" of the list (the number of "rows"), but the rows don't have to contain the same number of columns so you can't get the width directly. You could find the shortest row, the longest row, the average row, etc, but you'll need to loop through the list to do it.
Depending on what you're doing, you might consider numpy instead.
Disclaimer: I don't know python at all, but figured I'd take a look :)
Looking on this page, it looks like you're after map.ndim
EDIT
Hmm, it looks like i stumbled across a library called NumPy
EDIT 2
map.ndim in your case would be 2, ie - the number of dimensions in your array. map.shape will give you a tuple of integers describing the dimensions of your array. In your case 2,2 - so you would need map.shape[1] for the width of your array.
Related
I combined the lists like this allSpeed = np.concatenate((smallspeed, bigspeed)). I then sorted it and took the 100 smallest values. I now need to identify how many of each came from the original lists. Is this possible?
Yes, this is possible.
If you use argsort, you will get the indices of the sort order, e.g.
import numpy as np
a = np.array([1, 4, 0, 2])
print(np.argsort(a)) # => [2, 0, 3, 1]
Depending on whether you need the actual sorted values or just to know how many came from each array, you can work directly from the argsort result:
smallspeed = np.array([1, 3, 2, 5])
bigspeed = np.array([4, 0, 6]
all_speed = np.concatenate((smallspeed, bigspeed))
sort_order = np.argsort(all_speed)
lowest_4 = sort_order[:4] # or 100 for your case
small_count = np.count_nonzero(lowest_4 < len(smallspeed))
big_count = 4 - small_count
# or
big_count = np.count_nonzero(lowest_4 >= len(smallspeed))
Note that you will need to decide what it means if there are values that appear in both arrays and that value happens to be at the 100 value cutoff. This approach will give you an answer according to where each value came from, but that won't necessarily be meaningful. You will need to consider which sort algorithm you wish to use and which order you concatenate your arrays. For example, if you want to preferentially count "small speeds" in the lowest 100 when there are duplicates, then concatenate small + big (as you currently have), and use a stable sorting algorithm.
I learned on my web search that numpy.arange take less space than python range function. but i tried
using below it gives me different result.
import sys
x = range(1,10000)
print(sys.getsizeof(x)) # --> Output is 48
a = np.arange(1,10000,1,dtype=np.int8)
print(sys.getsizeof(a)) # --> OutPut is 10095
Could anyone please explain?
In PY3, range is an object that can generate a sequence of numbers; it is not the actual sequence. You may need to brush up on some basic Python reading, paying attention to things like lists and generators, and their differences.
In [359]: x = range(3)
In [360]: x
Out[360]: range(0, 3)
We have use something like list or a list comprehension to actually create those numbers:
In [361]: list(x)
Out[361]: [0, 1, 2]
In [362]: [i for i in x]
Out[362]: [0, 1, 2]
A range is often used in a for i in range(3): print(i) kind of loop.
arange is a numpy function that produces a numpy array:
In [363]: arr = np.arange(3)
In [364]: arr
Out[364]: array([0, 1, 2])
We can iterate on such an array, but it is slower than [362]:
In [365]: [i for i in arr]
Out[365]: [0, 1, 2]
But for doing things math, the array is much better:
In [366]: arr * 10
Out[366]: array([ 0, 10, 20])
The array can also be created from the list [361] (and for compatibility with earlier Py2 usage from the range itself):
In [376]: np.array(list(x)) # np.array(x)
Out[376]: array([0, 1, 2])
But this is slower than using arange directly (that's an implementation detail).
Despite the similarity in names, these shouldn't be seen as simple alternatives. Use range in basic Python constructs such as for loop and comprehension. Use arange when you need an array.
An important innovation in Python (compared to earlier languages) is that we could iterate directly on a list. We didn't have to step through indices. And if we needed indices along with with values we could use enumerate:
In [378]: alist = ['a','b','c']
In [379]: for i in range(3): print(alist[i]) # index iteration
a
b
c
In [380]: for v in alist: print(v) # iterate on list directly
a
b
c
In [381]: for i,v in enumerate(alist): print(i,v) # index and values
0 a
1 b
2 c
Thus you might not see range used that much in basic Python code.
the range type constructor creates range objects, which represent sequences of integers with a start, stop, and step in a space efficient manner, calculating the values on the fly.
np.arange function returns a numpy.ndarray object, which is essentially a wrapper around a primitive array. This is a fast and relatively compact representation, compared to if you created a python list, so list(range(N)), but range objects are more space efficient, and indeed, take constant space, so for all practical purposes, range(a) is the same size as range(b) for any integers a, b
As an aside, you should take care interpreting the results of sys.getsizeof, you must understand what it is doing. So do not naively compare the size of Python lists and numpy.ndarray, for example.
Perhaps whatever you read was referring to Python 2, where range returned a list. List objects do require more space than numpy.ndarray objects, generally.
arange store each individual value of the array while range store only 3 values (start, stop and step). That's the reason arange is taking more space compared to range.
As the question is about the size, this will be the answer.
But there are many advantages of using numpy array and arange than python lists for speed, space and efficiency perspective.
This is a simple example.
Assume that I have an input tensor M. Now I have a tensor of indices of M with size 2 x 3 such as [[0, 1], [2,2], [0,1]] and a new array of values which is corresponding with the index tensor is [1, 2, 3]. I want to assign these values to the input M satisfying that the value is assigned to the element of M at index [0,1] will be the min value (1 in this example).
It means M[0,1] = 1 and M[2,2] = 2.
Can I do that by using some available functions in Pytorch without a loop?
It can be done without loops, but I am generally not sure whether it is such a great idea, due to significantly increased runtime.
The basic idea is relatively simple: Since tensor assignments always assign the last element, it is sufficient to sort your tuples in M in descending order, according to the respective values stored in the value list (let's call it v).
To do this in pytorch, let us consider the following example:
import torch as t
X = t.randn([3, 3]) # Random matrix of size 3x3
v = t.tensor([1, 2, 3])
M = t.tensor([[0, 2, 0],
[1, 2, 1]]) # accessing the elements described above
# Showcase pytorch's result with "naive" tensor assignment:
X[tuple(M)] = v # This would assign the value 3 to position (0, 1)
# To correct behavior, sort v in decreasing order.
v_desc = v.sort(decreasing=True)
# v now contains both the values and the indices of original position
print(v_desc)
# torch.return_types.sort(
# values=tensor([3, 2, 1]),
# indices=tensor([2, 1, 0]))
# Access M in the correct order:
M_desc = M[:, v_desc.indices]
# Finally assign correct order:
X[tuple(M_desc)] = v_desc
Again, this is relatively complicated, because it involves sorting the values, and "re-shuffling" of the tensors. You can surely save at least some memory if you perform operations in-place, which is something I disregarded for the sake of legibility.
As an answer whether this can also be achieved without sorting, I am fairly certain that the answer will be "no"; tensor assignments could only be done on fairly simple conditionals, but not something more complicated like your inter-dependent conditionals would require.
Suppose we start with an integer numpy array with integers between 0 and 99, i.e.
x = np.array([[1,2,3,1],[10,5,0,2]],dtype=int)
Now we want to represent rows in this array with a single unique value. One simple way to do this is representing it as a floating number. An intuitive way to do this is
rescale = np.power(10,np.arange(0,2*x.shape[1],2)[::-1],dtype=float)
codes = np.dot(x,rescale)
where we exploit that the integers have at most 2 digits. (I'm casting rescale as a float to avoid exceeding the maximum value of int in case the entries of x have more elements; this is not very elegant)
This returns
array([ 1020301., 10050002.])
How can this process be reversed to obtain x again?
I'm thinking of converting codes to a string, then split the string every 2nd entry. I'm not too familiar with these string operations, especially when they have to be executed on all entries of an array simultaneously. A problem is also that the first number has a varying number of digits, so trailing zeros have to be added in some way.
Maybe something simpler is possible using some divisions or rounding, or perhaps represting the rows of the array in a different manner. Important is that at least the initial conversion is fast and vectorized.
Suggestions are welcome.
First, you need to find the correct number of columns:
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
Note that is your first column is always 0, then there is no way with your code to know it even existed: [[0, 1], [0, 2]] -> [1., 2.] -> [[1], [2]] or [[0, 0, 0, 1], [0, 0, 0, 2]]. It might be something to consider.
Anyways, here is a mockup for the string way:
def decode_with_string(codes):
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
str_format = '{:0%dd}'%(2*number_of_cols) # prepare to format numbers as string
return [[int(str_format.format(int(code))[2*i:2*i+2]) # extract the wanted digits
for i in range(number_of_cols)] # for all columns
for code in codes] # for all rows
But you can also compute the numbers directly:
def decode_direct(codes):
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
return [[floor(code/(100**index)) % 100
for index in range(number_of_cols-1, -1, -1)]
for code in codes]
Example:
>>> codes = [ 1020301., 10050002.]
>>> number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
>>> print(number_of_cols)
4
>>> print(decode_with_strings(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]
>>> print(decode_direct(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]
Here is a numpy solution:
>>> divisors = np.power(0.01, np.arange(number_of_cols-1, -1, -1))
>>> x = np.mod(np.floor(divisors*codes.reshape((codes.shape[0], 1))), 100)
Finally, you say you use float in case of overflow of int. First, the mantissa of floating point numbers is also limited, so you don't eliminate the risk of overflow. Second, in Python3, integer actually have unlimited precision.
You could exploit that Numpy stores its arrays as continuous blocks in memory. So storing the memory-block as binary string and remembering the shape of the array should be sufficient:
import numpy as np
x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits
x_sh = x.shape
# flatten array and convert to binarystring
xs = x.ravel().tostring()
# convert back and reshape:
y = np.reshape(np.fromstring(xs, np.uint8), x_sh)
The reason for flattening the array first is that you don't need to pay attention to the storage order of 2D arrays (C or FORTRAN order). Of course you also could generate a string for each row separately too:
import numpy as np
x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits
# conversion:
xss = [xr.tostring() for xr in x]
# conversion back:
y = np.array([np.fromstring(xs, np.uint8) for xs in xss])
Since your numbers are between 0 and 99, you should rather pad up to 2 digits: 0 becomes "00" , 5 becomes "05" and 50 becomes "50". That way, all you need to do is repeatedly divide your number by 100 and you'll get the values. Your encoding will also be smaller, since every number is encoded in 2 digits instead of 2-3 as you currently do.
If you want to be able to detect [0,0,0] (which is currently undistinguishable from [0] or [O.....O]) as well, add a 1 in front of your number: 1000000 is [0,0,0] and 100 is [0]. When your division returns 1, you know you've finished.
You can easily construct a string with that information and cast it to a number afterwards.
What is the easiest and cleanest way to get the first AND the last elements of a sequence? E.g., I have a sequence [1, 2, 3, 4, 5], and I'd like to get [1, 5] via some kind of slicing magic. What I have come up with so far is:
l = len(s)
result = s[0:l:l-1]
I actually need this for a bit more complex task. I have a 3D numpy array, which is cubic (i.e. is of size NxNxN, where N may vary). I'd like an easy and fast way to get a 2x2x2 array containing the values from the vertices of the source array. The example above is an oversimplified, 1D version of my task.
Use this:
result = [s[0], s[-1]]
Since you're using a numpy array, you may want to use fancy indexing:
a = np.arange(27)
indices = [0, -1]
b = a[indices] # array([0, 26])
For the 3d case:
vertices = [(0,0,0),(0,0,-1),(0,-1,0),(0,-1,-1),(-1,-1,-1),(-1,-1,0),(-1,0,0),(-1,0,-1)]
indices = list(zip(*vertices)) #Can store this for later use.
a = np.arange(27).reshape((3,3,3)) #dummy array for testing. Can be any shape size :)
vertex_values = a[indices].reshape((2,2,2))
I first write down all the vertices (although I am willing to bet there is a clever way to do it using itertools which would let you scale this up to N dimensions ...). The order you specify the vertices is the order they will be in the output array. Then I "transpose" the list of vertices (using zip) so that all the x indices are together and all the y indices are together, etc. (that's how numpy likes it). At this point, you can save that index array and use it to index your array whenever you want the corners of your box. You can easily reshape the result into a 2x2x2 array (although the order I have it is probably not the order you want).
This would give you a list of the first and last element in your sequence:
result = [s[0], s[-1]]
Alternatively, this would give you a tuple
result = s[0], s[-1]
With the particular case of a (N,N,N) ndarray X that you mention, would the following work for you?
s = slice(0,N,N-1)
X[s,s,s]
Example
>>> N = 3
>>> X = np.arange(N*N*N).reshape(N,N,N)
>>> s = slice(0,N,N-1)
>>> print X[s,s,s]
[[[ 0 2]
[ 6 8]]
[[18 20]
[24 26]]]
>>> from operator import itemgetter
>>> first_and_last = itemgetter(0, -1)
>>> first_and_last([1, 2, 3, 4, 5])
(1, 5)
Why do you want to use a slice? Getting each element with
result = [s[0], s[-1]]
is better and more readable.
If you really need to use the slice, then your solution is the simplest working one that I can think of.
This also works for the 3D case you've mentioned.