Difference between a numpy.array and numpy.array[:] - python

Me again... :)
I tried finding an answer to this question but again I was not fortunate enough. So here it is.
What is the difference between calling a numpy array (let's say "iris") and the whole group of data in this array (by using iris[:] for instance).
I´m asking this because of the error that I get when I run the first example (below), while the second example works fine.
Here is the code:
At this first part I load the library and import the dataset from the internet.
import statsmodels.api as sm
iris = sm.datasets.get_rdataset(dataname='iris',
package='datasets')['data']
If I run this code I get an error:
iris.columns.values = [iris.columns.values[x].lower() for x in range( len( iris.columns.values ) ) ]
print(iris.columns.values)
Now if I run this code it works fine:
iris.columns.values[:] = [iris.columns.values[x].lower() for x in range( len( iris.columns.values ) ) ]
print(iris.columns.values)
Best regards,

The difference is that when you do iris.columns.values = ... you try to replace the reference of the values property in iris.columns which is protected (see pandas implementation of pandas.core.frame.DataFrame) and when you do iris.columns.values[:] = ... you access the data of the np.ndarray and replace it with new values. In the second assignment statement you do not overwrite the reference to the numpy object. The [:] is a slice object that is passed to the __setitem__ method of the numpy array.
EDIT:
The exact implementation (there are multiple, here is the pd.Series implementation) of such property is:
#property
def values(self):
""" return the array """
return self.block.values
thus you try to overwrite a property that is constructed with a decorator #property followed by a getter function, and cannot be replaced since it is only provided with a getter and not a setter. See Python's docs on builtins - property()

iris.columns.values = val
calls
type(iris.columns).__setattr__(iris.columns, 'values', val)
This is running pandas' code, because type(iris.columns) is pd.Series
iris.columns.values[:] = val
calls
type(iris.columns.value).__setitem__(iris.columns.value, slice(None), val)
This is running numpy's code, because type(iris.columns.value) is np.ndarray

Related

Dynamically adding functions to array columns

I'm trying to dynamically add function calls to fill in array columns. I will be accessing the array millions of times so it needs to be quick.
I'm thinking to add the call of a function into a dictionary by using a string variable
numpy_array[row,column] = dict[key[index containing function call]]
The full scope of the code I'm working with is too large to post here is an equivalent simplistic example I've tried.
def hello(input):
return input
dict1 = {}
#another function returns the name and ID values
name = 'hello'
ID = 0
dict1["hi"] = globals()[name](ID)
print (dict1)
but it literally activates the function when using
globals()[name](ID)
instead of copy pasting hello(0) as a variable into the dictionary.
I'm a bit out of my depth here.
What is the proper way to implement this?
Is there a more efficient way to do this than reading into a dictionary on every call of
numpy_array[row,column] = dict[key[index containing function call]]
as I will be accessing and updating it millions of times.
I don't know if the dictionary is called every time the array is written to or if the location of the column is already saved into cache.
Would appreciate the help.
edit
Ultimately what I'm trying to do is initialize some arrays, dictionaries, and values with a function
def initialize(*args):
create arrays and dictionaries
assign values to global and local variables, arrays, dictionaries
Each time the initialize() function is used it creates a new set of variables (names, values, ect) that direct to a different function with a different set of variables.
I have an numpy array which I want to store information from the function and associated values created from the initialize() function.
So in other words, in the above example hello(0), the name of the function, it's value, and some other things as set up within initialize()
What I'm trying to do is add the function with these settings to the numpy array as a new column before I run the main program.
So as another example. If I was setting up hello() (and hello() was a complex function) and when I used initialize() it might give me a value of 1 for hello(1).
Then if I use initialize again it might give me a value of 2 for hello(2).
If I used it one more time it might give the value 0 for the function goodbye(0).
So in this scenaro let's say I have an array
array[row,0] = stuff()
array[row,1] = things()
array[row,2] = more_stuff()
array[row,3] = more_things()
Now I want it to look like
array[row,0] = stuff()
array[row,1] = things()
array[row,2] = more_stuff()
array[row,3] = more_things()
array[row,4] = hello(1)
array[row,5] = hello(2)
array[row,6] = goodbye(0)
As a third, example.
def function1():
do something
def function2():
do something
def function3():
do something
numpy_array(size)
initialize():
do some stuff
then add function1(23) to the next column in numpy_array
initialize():
do some stuff
then add function2(5) to the next column in numpy_array
initialize():
do some stuff
then add function3(50) to the next column in numpy_array
So as you can see. I need to permanently append new columns to the array and feed the new columns with the function/value as directed by the initialize() function without manual intervention.
So fundamentally I need to figure out how to assign syntax to an array column based upon a string value without activating the syntax on assignment.
edit #2
I guess my explanations weren't clear enough.
Here is another way to look at it.
I'm trying to dynamically assign functions to an additional column in a numpy array based upon the output of a function.
The functions added to the array column will be used to fill the array millions of times with data.
The functions added to the array can be various different function with various different input values and the amount of functions added can vary.
I've tried assigning the functions to a dictionary using exec(), eval(), and globals() but when using these during assignment it just instantly activates the functions instead of assigning them.
numpy_array = np.array((1,5))
def some_function():
do some stuff
return ('other_function(15)')
#somehow add 'other_function(15)' to the array column.
numpy_array([1,6] = other_function(15)
The functions returned by some_function() may or may not exist each time the program is run so the functions added to the array are also dynamic.
I'm not sure this is what the OP is after, but here is a way to make an indirection of functions by name:
def make_fun_dict():
magic = 17
def foo(x):
return x + magic
def bar(x):
return 2 * x + 1
def hello(x):
return x**2
return {k: f for k, f in locals().items() if hasattr(f, '__name__')}
mydict = make_fun_dict()
>>> mydict
{'foo': <function __main__.make_fun_dict.<locals>.foo(x)>,
'bar': <function __main__.make_fun_dict.<locals>.bar(x)>,
'hello': <function __main__.make_fun_dict.<locals>.hello(x)>}
>>> mydict['foo'](0)
17
Example usage:
x = np.arange(5, dtype=int)
names = ['foo', 'bar', 'hello', 'foo', 'hello']
>>> np.array([mydict[name](v) for name, v in zip(names, x)])
array([17, 3, 4, 20, 16])

Postpone function execution syntactically

I've got a quite extensive simulation tool written in python, which requires the user to call functions to set up the environment in a strict order since np.ndarrays are at first created (and changed by appending etc.) and afterwards memory views to specific cells of these arrays are defined.
Currently each part of the environment requires around 4 different function calls to be set up, with easily >> 100 parts.
Thus I need to combine each part's function calls by syntactically (not based on timers) postponing the execution of some functions until all preceding functions have been executed, while still maintaining the strict order to be able to use memory views.
Futhermore all functions to be called by the user use PEP 3102 style keyword-only arguments to reduce the probability of input-errors and all are instance methods with self as the first parameter, with self containing the references to the arrays to construct the memory views to.
My current implementation is using lists to store the functions and the dict for each function's keyworded arguments. This is shown here, omitting the class and self parameters to make it short:
def fun1(*, x, y): # easy minimal example function 1
print(x * y)
def fun2(*, x, y, z): # easy minimal example function 2
print((x + y) / z)
fun_list = [] # list to store the functions and kwargs
fun_list.append([fun1, {'x': 3.4, 'y': 7.0}]) # add functions and kwargs
fun_list.append([fun2, {'x':1., 'y':12.8, 'z': np.pi}])
fun_list.append([fun2, {'x':0.3, 'y':2.4, 'z': 1.}])
for fun in fun_list:
fun[0](**fun[1])
What I'd like to implement is using a decorator to postpone the function execution by adding a generator, to be able to pass all arguments to the functions as they are called, but not execute them, as shown below:
def postpone(myfun): # define generator decorator
def inner_fun(*args, **kwargs):
yield myfun(*args, **kwargs)
return inner_fun
fun_list_dec = [] # list to store the decorated functions
fun_list_dec.append(postpone(fun1)(x=3.4, y=7.0)) # add decorated functions
fun_list_dec.append(postpone(fun2)(x=1., y=12.8, z=np.pi))
fun_list_dec.append(postpone(fun2)(x=0.3, y=2.4, z=1.))
for fun in fun_list_dec: # execute functions
next(fun)
Which is the best (most pythonic) method to do so? Are there any drawbacks?
And most important: Will my references to np.ndarrays passed to the functions within self still be a reference, so that the memory addresses of these arrays are still correct when executing the functions, if the memory addresses change in between saving the function calls to a list (or being decorated) and executing them?
Execution speed does not matter here.
Using a generators here doesn't make much sense. You are essentially simulating partial-application. Therefore, this seems like a use-case for functools.partial. Since you are sticking with key-word only arguments, this will work just fine:
In [1]: def fun1(*, x, y): # easy minimal example function 1
...: print(x * y)
...: def fun2(*, x, y, z): # easy minimal example function 2
...: print((x + y) / z)
...:
In [2]: from functools import partial
In [3]: fun_list = []
In [4]: fun_list.append(partial(fun1, x=3.4, y=7.0))
In [5]: fun_list.append(partial(fun2, x=1., y=12.8, z=3.14))
In [6]: fun_list.append(partial(fun2, x=0.3, y=2.4,z=1.))
In [7]: for f in fun_list:
...: f()
...:
23.8
4.3949044585987265
2.6999999999999997
You don't have to use functools.partial either, you can do your partial application "manually", just to demonstrate:
In [8]: fun_list.append(lambda:fun1(x=5.4, y=8.7))
In [9]: fun_list[-1]()
46.98
Since for commenting the code would be too complicated and it is based on juanpa.arrivillaga's answer, I'll add a full post with a short explanation what I mean by updating the reference to the arrays:
def fun1(*, x, y): # easy minimal example function 1
print(x * y)
arr = np.random.rand(5)
f1_lam = lambda:fun1(x=arr, y=5.)
f1_par = partial(fun1, x=arr, y=5.)
f1_lam() # Out[01]: [0.55561103 0.9962626 3.60992174 2.55491852 3.9402079 ]
f1_par() # Out[02]: [0.55561103 0.9962626 3.60992174 2.55491852 3.9402079 ]
# manipulate array so that the memory address changes and
# passing as reference is "complicated":
arr = np.append(arr, np.ones((2,1)))
f1_lam() # Out[03]: [0.55561103 0.9962626 3.60992174 2.55491852 3.9402079 5. 5.]
f1_par() # Out[02]: [0.55561103 0.9962626 3.60992174 2.55491852 3.9402079 ]
The behaviour of lambda is exactly what I was looking for in this question.
My examples with dict and decorators don't work, as well as functools.partial. Any idea why lambda is working? And just out of interest: Would there be any way to update the references to the arrays in the dict so that it would also work this way?

Changes to copies of object mutate original object

I have a class within which there is a DataFrame type property. I want to be able to perform arithmetic on the objects using the built-ins while keeping the original objects immutable. Unfortunately, the operations seem to be mutating the original objects as well. Here's an example:
import numpy as np
import pandas as pd
class Container:
def __init__(self):
self.data = pd.DataFrame()
def generate(self):
self.data = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=['A'])
return self
def __add__(self, other):
copy = self
new = Container()
new.data['A'] = copy.data.eval(f"A + {0}".format(other))
return new
one = Container().generate()
two = one + 1
print(one.data == two.data)
I think the problem is the copy = self line, but I can't seem to preserve the original object even using the copy() method.
How do I make sure the original object doesn't change when a new one is created from it?
Surprisingly, while copy = self isn't a copy, your bug doesn't actually have anything to do with that. I don't think you even need a copy there.
Your bug is due to double-formatting a string:
f"A + {0}".format(other)
f"A + {0}" is an f-string. Unlike format, it evaluates the text 0 as a Python expression and substitutes the string representation of the resulting object into the resulting string, producing "A + 0". Calling format on that doesn't do anything, since there's no format placeholder left. You end up calling
copy.data.eval("A + 0")
instead of adding what you wanted to add.
Did you deepcopy?
from copy import deepcopy
dupe=deepcopy(thing)
#now thing and dupe are two separate objects

How can I call an instance on a former instance from the same class?

I apologize in advance if there is an obvious solution to this question or it is a duplicate.
I have a class as follows:
class Kernel(object):
""" creates kernels with the necessary input data """
def __init__(self, Amplitude, random = None):
self.Amplitude = Amplitude
self.random = random
if random != None:
self.dims = list(random.shape)
def Gaussian(self, X, Y, sigmaX, sigmaY, muX=0.0, muY=0.0):
""" return a 2 dimensional Gaussian kernel """
kernel = np.zeros([X, Y])
theta = [self.Amplitude, muX, muY, sigmaX, sigmaY]
for i in range(X):
for j in range(Y):
kernel[i][j] = integrate.dblquad(lambda x, y: G2(x + float(i) - (X-1.0)/2.0, \
y + float(j) - (Y-1.0)/2.0, theta), \
-0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]
return kernel
It just basically creates a bunch of convolution kernels (I've only included the first).
I want to add an instance (method?) to this class so that I can use something like
conv = Kernel(1.5)
conv.Gaussian(9, 9, 2, 2).kershow()
and have the array pop up using Matplotlib. I know how to write this instance and plot it with Matplotlib, but I don't know how to write this class so that for each method I would like to have this additional ability (i.e. .kershow()), I may call it in this manner.
I think I could use decorators ? But I've never used them before. How can I do this?
The name of the thing you're looking for is function or method chaining.
Strings are a really good example of this in Python. Because a string is immutable, each string method returns a new string. So you can call string methods on the return values, rather than storing the intermediate value. For example:
lower = ' THIS IS MY NAME: WAYNE '.lower()
without_left_padding = lower.lstrip()
without_right_padding = without_left_padding.rstrip()
title_cased = without_right_padding.title()
Instead you could write:
title_cased = ' THIS IS MY NAME: WAYNE '.lower().lstrip().rstrip().title()
Of course really you'd just do .strip().title(), but this is an example.
So if you want a .kernshow() option, then you'll need to include that method on whatever you return. In your case, numpy arrays don't have a .kernshow method, so you'll need to return something that does.
Your options are mostly:
A subclass of numpy arrays
A class that wraps the numpy array
I'm not sure what is involved with subclassing the numpy array, so I'll stick with the latter as an example. Either you can use the kernel class, or create a second class.
Alex provided an example of using your kernel class, but alternatively you could have another class like this:
class KernelPlotter(object):
def __init__(self, kernel):
self.kernel = kernel
def kernshow(self):
# do the plotting here
Then you would pretty much follow your existing code, but rather than return kernel you would do return KernelPlotter(kernel).
Which option you choose really depends on what makes sense for your particular problem domain.
There's another sister to function chaining called a fluent interface that's basically function chaining but with the goal of making the interface read like English. For example you might have something like:
Kernel(with_amplitude=1.5).create_gaussian(with_x=9, and_y=9, and_sigma_x=2, and_sigma_y=2).show_plot()
Though obviously there can be some problems when writing your code this way.
Here's how I would do it:
class Kernel(object):
def __init__ ...
def Gaussian(...):
self.kernel = ...
...
return self # not kernel
def kershow(self):
do_stuff_with(self.kernel)
Basically the Gaussian method doesn't return a numpy array, it just stores it in the Kernel object to be used elsewhere in the class. In particular kershow can now use it. The return self is optional but allows the kind of interface you wanted where you write
conv.Gaussian(9, 9, 2, 2).kershow()
instead of
conv.Gaussian(9, 9, 2, 2)
conv.kershow()

Nested list object does not support indexing

I have a nested list, named env, created in the constructor and another method to populate an element of the grid defined as below:
class Environment(object):
def __init__(self,rowCount,columnCount):
env = [[ None for i in range(columnCount)] for j in range(rowCount) ]
return env
def addElement(self, row, column):
self[row][column] = 0
Later in the code I create an instance of Environment by running:
myEnv = createEnvironment(6,6)
Then I want to add an element to the environment by running:
myEnv.addElement(2,2)
So what I expected to happen was that I would receive a new Environment object as a 6x6 grid with a 0 in position 2,2 of the grid. But that did not work.
I have two errors:
I am unable to return anything other than None from the init method.
The main issue us when trying to execute addElement(2, 2) I get this error:
"TypeError: 'Environment' object does not support indexing.
I looked at the __getitem__ and __setitem__ methods but was unable to get them working over a multidimensional list. Is there a better data structure I should be using to create a grid?
The problem here is that you can't replace the object with __init__. You could subclass list and do something in __new__, probably, but that would be massive overkill, the better option is just to wrap the list:
class Environment(object):
def __init__(self, rows, columns):
self.env = [[None for column in range(columns)] for row in range(rows) ]
def addElement(self, row, column):
self.env[row][column] = 0
Note that it's a little odd you claim to be calling myEnv = createEnvironment(6,6) - using a function rather than the constructor is a little odd.
If you really want your object to act like the list, you can of course provide a load of extra wrapper functions like __getitem__/__setitem__. E.g:
def __getitem__(self, row, column):
return self.env[row][column]
Which would allow you to do some_environment[5, 6], for example. (You may rather return the column, that depends on your system and what works best for you).

Categories

Resources