I am trying to calculate a dot product between two matrices, for each couple of rows.
I have matrix D with (u x 2) dimensions and matrix R with (u*2 x c) dimensions.
Below an example:
D = np.array([[0.02747092, 0.11233295],
[0.02747092, 0.07295284],
[0.01245856, 0.19935923],
[0.01245856, 0.13520913],
[0.11233295, 0.07295284]])
R = np.array([[-3. , 0. , 1. , -1. ],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[-2.33333333, -0.33333333, 1.66666667, -1.33333333],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[ 0. , -2. , 2. , -4. ],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[ 0.66666667, -3.33333333, 2.66666667, -4.33333333],
[-1.25 , 0.75 , 1.75 , -1.25 ],
[-2.33333333, -0.33333333, 1.66666667, -1.33333333],
[-3. , 0. , 1. , -1. ]])
The result should be matrix M with dimensions (u x c) as follows (example of first row):
M = np.array([[-0.2185, 0.0825, 0.2195, -0.1645],
[...]])
Which is result of dot product between the first row of D and first two rows of matrix R as such:
D_ = np.array([[0.027, 0.11]])
R_ = np.array([[-3., 0., 1., -1.],
[-1.25, 0.75, 1.75, -1.25]])
D_.dot(R_)
I tried various ways of np.tensordot after reshaping the D matrix into tensor, but without any luck. I am looking for vectorized solution and to avoid loops (which is my current solution, quite slow).
Reshape R to 3D and use np.einsum -
np.einsum('ijk,ij->ik',R.reshape(len(D),2,-1),D)
I have a set of RGB values in an array rgb_array of the form
[255.000, 56,026, 0.000]
[246.100, 60,000, 0.000]
...
>>> print(rbg_array)
1000, 3
that I'd like to plot similarly to the color gradient shown above.
How can I best use matpotlib's imshow to achieve this?
Supposing your array has N rows where each row contains 3 floats between 0 and 255, you can create an image as follows. First convert it to a numpy array of integers, and reshape it to (1, N, 3). This will make it a 1xN image. Then, display the image using imshow. You need to set an extent to get the x and y axes as in your example, or just set them to [0, 1, 0, 1]. Also the aspect ratio needs to be controlled, as otherwise the pixels would be considered "square".
import numpy as np
import matplotlib.pyplot as plt
rgb_array = [[255.000, 56.026 + (255 - 56.026) * i / 400, 255 * i / 400] for i in range(400)]
rgb_array += [[255 - 255 * i / 600, 255 - 255 * i / 600, 255] for i in range(600)]
img = np.array(rgb_array, dtype=int).reshape((1, len(rgb_array), 3))
plt.imshow(img, extent=[0, 16000, 0, 1], aspect='auto')
plt.show()
Don't use this method - #JohanC provides a much superior solution of creating an image rather than making a bar-graph.
I'm not so good on Matplotlib, but came up with this. There may be more efficient methods, so someone correct me please if this is the wrong approach.
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
NSAMPLES = 100
# Synthesize R, G, B and A channels with dummy data
# The thing to note is that the samples are REAL and in range [0..1]
r = np.linspace(0,1,NSAMPLES).astype(np.float)
g = 1.0 - r
b = np.full(NSAMPLES,0.5,np.float)
a = np.full(NSAMPLES,1,np.float)
# Merge into a single array, 4 deep
RGBA = np.dstack((r,g,b,a))
# Plot
height, width = 40, 1
plt.bar(np.arange(NSAMPLES), height, width, color=rgba.reshape(-1,4))
plt.title("Some Funky Barplot")
plt.show()
The array RGBA looks like this:
array([[[0. , 1. , 0.5 , 1. ],
[0.01010101, 0.98989899, 0.5 , 1. ],
[0.02020202, 0.97979798, 0.5 , 1. ],
[0.03030303, 0.96969697, 0.5 , 1. ],
[0.04040404, 0.95959596, 0.5 , 1. ],
[0.05050505, 0.94949495, 0.5 , 1. ],
[0.06060606, 0.93939394, 0.5 , 1. ],
[0.07070707, 0.92929293, 0.5 , 1. ],
[0.08080808, 0.91919192, 0.5 , 1. ],
[0.09090909, 0.90909091, 0.5 , 1. ],
[0.1010101 , 0.8989899 , 0.5 , 1. ],
[0.11111111, 0.88888889, 0.5 , 1. ],
[0.12121212, 0.87878788, 0.5 , 1. ],
[0.13131313, 0.86868687, 0.5 , 1. ],
[0.14141414, 0.85858586, 0.5 , 1. ],
[0.15151515, 0.84848485, 0.5 , 1. ],
[0.16161616, 0.83838384, 0.5 , 1. ],
[0.17171717, 0.82828283, 0.5 , 1. ],
[0.18181818, 0.81818182, 0.5 , 1. ],
[0.19191919, 0.80808081, 0.5 , 1. ],
[0.2020202 , 0.7979798 , 0.5 , 1. ],
[0.21212121, 0.78787879, 0.5 , 1. ],
[0.22222222, 0.77777778, 0.5 , 1. ],
[0.23232323, 0.76767677, 0.5 , 1. ],
[0.24242424, 0.75757576, 0.5 , 1. ],
[0.25252525, 0.74747475, 0.5 , 1. ],
[0.26262626, 0.73737374, 0.5 , 1. ],
[0.27272727, 0.72727273, 0.5 , 1. ],
[0.28282828, 0.71717172, 0.5 , 1. ],
[0.29292929, 0.70707071, 0.5 , 1. ],
[0.3030303 , 0.6969697 , 0.5 , 1. ],
[0.31313131, 0.68686869, 0.5 , 1. ],
[0.32323232, 0.67676768, 0.5 , 1. ],
[0.33333333, 0.66666667, 0.5 , 1. ],
[0.34343434, 0.65656566, 0.5 , 1. ],
[0.35353535, 0.64646465, 0.5 , 1. ],
[0.36363636, 0.63636364, 0.5 , 1. ],
[0.37373737, 0.62626263, 0.5 , 1. ],
[0.38383838, 0.61616162, 0.5 , 1. ],
[0.39393939, 0.60606061, 0.5 , 1. ],
[0.4040404 , 0.5959596 , 0.5 , 1. ],
[0.41414141, 0.58585859, 0.5 , 1. ],
[0.42424242, 0.57575758, 0.5 , 1. ],
[0.43434343, 0.56565657, 0.5 , 1. ],
[0.44444444, 0.55555556, 0.5 , 1. ],
[0.45454545, 0.54545455, 0.5 , 1. ],
[0.46464646, 0.53535354, 0.5 , 1. ],
[0.47474747, 0.52525253, 0.5 , 1. ],
[0.48484848, 0.51515152, 0.5 , 1. ],
[0.49494949, 0.50505051, 0.5 , 1. ],
[0.50505051, 0.49494949, 0.5 , 1. ],
[0.51515152, 0.48484848, 0.5 , 1. ],
[0.52525253, 0.47474747, 0.5 , 1. ],
[0.53535354, 0.46464646, 0.5 , 1. ],
[0.54545455, 0.45454545, 0.5 , 1. ],
[0.55555556, 0.44444444, 0.5 , 1. ],
[0.56565657, 0.43434343, 0.5 , 1. ],
[0.57575758, 0.42424242, 0.5 , 1. ],
[0.58585859, 0.41414141, 0.5 , 1. ],
[0.5959596 , 0.4040404 , 0.5 , 1. ],
[0.60606061, 0.39393939, 0.5 , 1. ],
[0.61616162, 0.38383838, 0.5 , 1. ],
[0.62626263, 0.37373737, 0.5 , 1. ],
[0.63636364, 0.36363636, 0.5 , 1. ],
[0.64646465, 0.35353535, 0.5 , 1. ],
[0.65656566, 0.34343434, 0.5 , 1. ],
[0.66666667, 0.33333333, 0.5 , 1. ],
[0.67676768, 0.32323232, 0.5 , 1. ],
[0.68686869, 0.31313131, 0.5 , 1. ],
[0.6969697 , 0.3030303 , 0.5 , 1. ],
[0.70707071, 0.29292929, 0.5 , 1. ],
[0.71717172, 0.28282828, 0.5 , 1. ],
[0.72727273, 0.27272727, 0.5 , 1. ],
[0.73737374, 0.26262626, 0.5 , 1. ],
[0.74747475, 0.25252525, 0.5 , 1. ],
[0.75757576, 0.24242424, 0.5 , 1. ],
[0.76767677, 0.23232323, 0.5 , 1. ],
[0.77777778, 0.22222222, 0.5 , 1. ],
[0.78787879, 0.21212121, 0.5 , 1. ],
[0.7979798 , 0.2020202 , 0.5 , 1. ],
[0.80808081, 0.19191919, 0.5 , 1. ],
[0.81818182, 0.18181818, 0.5 , 1. ],
[0.82828283, 0.17171717, 0.5 , 1. ],
[0.83838384, 0.16161616, 0.5 , 1. ],
[0.84848485, 0.15151515, 0.5 , 1. ],
[0.85858586, 0.14141414, 0.5 , 1. ],
[0.86868687, 0.13131313, 0.5 , 1. ],
[0.87878788, 0.12121212, 0.5 , 1. ],
[0.88888889, 0.11111111, 0.5 , 1. ],
[0.8989899 , 0.1010101 , 0.5 , 1. ],
[0.90909091, 0.09090909, 0.5 , 1. ],
[0.91919192, 0.08080808, 0.5 , 1. ],
[0.92929293, 0.07070707, 0.5 , 1. ],
[0.93939394, 0.06060606, 0.5 , 1. ],
[0.94949495, 0.05050505, 0.5 , 1. ],
[0.95959596, 0.04040404, 0.5 , 1. ],
[0.96969697, 0.03030303, 0.5 , 1. ],
[0.97979798, 0.02020202, 0.5 , 1. ],
[0.98989899, 0.01010101, 0.5 , 1. ],
[1. , 0. , 0.5 , 1. ]]])
So I have a file that looks something like this:
# 3 # Number of network ROIs
# 2 # Number of netcc matrices
# WITH_ROI_LABELS
001 002 003
1 2 3
# CC
1.0000 0.9800 0.9895
0.9800 1.0000 0.9817
0.9895 0.9817 1.0000
# FZ
4.0000 2.2965 2.6240
2.2965 4.0000 2.3426
2.6240 2.3426 4.0000
I want to extract the 3x3 matrix labelled "CC"
I want to extract the 3x3 matrix labelled "FZ"
So I did the following:
file=/users/3dfile1
A= numpy.genfromtxt(file)
m= A[:,:]
m
So the output I get looks like this:
array([[ 1. , 2. , 3. ],
[ 1. , 2. , 3. ],
[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ],
[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
However, my question is... if I have multiple files. Where the matrix size is NOT CONSISTENT. This means that in some files the matrix will be 3x3, some files 8x8, 1x1, etc. In this case, how can I code something that will:
differentiate the matrix CC from FZ
extract the matrix (can detect the size of matrix somehow and give me the exact matrix I'm looking for)
Try
import numpy as np
x = np.array([[ 1. , 2. , 3. ],
[ 1. , 2. , 3. ],
[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ],
[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
x1 = x[2:,:]
x2 = x1.reshape(2,3,3)
CC ,FZ = x2
Result:
In [23]: CC
Out[23]:
array([[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ]])
In [24]: FZ
Out[24]:
array([[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
I have a data set which has some categorical columns. Here is a small sample:
Temp precip dow tod
-20.44 snow 4 14.5
-22.69 snow 4 15.216666666666667
-21.52 snow 4 17.316666666666666
-21.52 snow 4 17.733333333333334
-20.51 snow 4 18.15
Here, the dow and precip are categorical, where as the others are continuous.
Is there a way I can create a OneHotEncoder for just those columns? I don't want to use pd.get_dummies because that won't put the data in the proper format unless of each dow and precip are in the new data.
Two things you could check out: sklearn-pandas and as mentioned by #Grr pipelines with this good intro.
So I prefer pipelines, as they are a tidy way, allow easy use with things like grid-seach, avoid leakage between folds in cross validation, etc. So I usually end up having a pipe like that (given you have precip LabelEncoded first):
from sklearn.pipeline import Pipeline, FeatureUnion, make_pipeline, make_union
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
class Columns(BaseEstimator, TransformerMixin):
def __init__(self, names=None):
self.names = names
def fit(self, X, y=None, **fit_params):
return self
def transform(self, X):
return X[self.names]
class Normalize(BaseEstimator, TransformerMixin):
def __init__(self, func=None, func_param={}):
self.func = func
self.func_param = func_param
def transform(self, X):
if self.func != None:
return self.func(X, **self.func_param)
else:
return X
def fit(self, X, y=None, **fit_params):
return self
cat_cols = ['precip', 'dow']
num_cols = ['Temp','tod']
pipe = Pipeline([
("features", FeatureUnion([
('numeric', make_pipeline(Columns(names=num_cols),Normalize())),
('categorical', make_pipeline(Columns(names=cat_cols),OneHotEncoder(sparse=False)))
])),
('model', LinearRegression())
])
The short answer is yes, but with some caveats.
First off you won't be able to use OneHotEncoder directly on the precip feature. You will need to encode those labels in to integers with LabelEncoder.
Secondly, if you just want to encode those features you can pass the proper values to the n_values and categorical_features parameters.
Example:
I will assume dow is day of the week, which will have seven values, and precip will have (rain, sleet, snow, and mix) as values.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
df2 = df.copy()
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
df2.precip = le.transform(df2.precip)
df2
Temp precip dow tod
0 -20.44 3 4 14.500000
1 -22.69 3 4 15.216667
2 -21.52 3 4 17.316667
3 -21.52 3 4 17.733333
4 -20.51 3 4 18.150000
# Initialize OneHotEncoder with 4 values for precip and 7 for dow.
ohe = OneHotEncoder(n_values=np.array([4,7]), categorical_features=[1,2])
X = ohe.fit_transform(df2)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
Ok that works, but you have to either mutate your data in place or create a copy an things can get a little messy. A more organized way to do this would be to use a Pipeline.
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
def get_precip(X):
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
return le.transform(X.precip).reshape(-1,1)
def get_dow(X):
return X.dow.values.reshape(-1,1)
def get_rest(X):
return X.drop(['precip', 'dow'], axis=1)
precip_trans = FunctionTransformer(get_precip, validate=False)
dow_trans = FunctionTransformer(get_dow, validate=False)
rest_trans = FunctionTransformer(get_rest, validate=False)
union = FeatureUnion([('precip', precip_trans), ('dow', dow_trans), ('rest', rest_trans)])
ohe = OneHotEncoder(n_values=[4,7], categorical_features=[0,1])
pipe = Pipeline([('union', union), ('one_hot', ohe)])
X = pipe.fit_transform(df)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
I do want to point out that in the upcoming release of sklearn v0.20 there will be a CategoricalEncoder which should make this kind of thing even easier.
I don't want to use pd.get_dummies because that won't put the data in
the proper format unless of each dow and precip are in the new data.
Assuming you want to encode but also maintain those two columns--are you sure this wouldn't work for you?
df = pd.DataFrame({
'temp': np.random.random(5) + 20.,
'precip': pd.Categorical(['snow', 'snow', 'rain', 'none', 'rain']),
'dow': pd.Categorical([4, 4, 4, 3, 1]),
'tod': np.random.random(5) + 10.
})
pd.concat((df[['dow', 'precip']],
pd.get_dummies(df, columns=['dow', 'precip'], drop_first=True)),
axis=1)
dow precip temp tod dow_3 dow_4 precip_rain precip_snow
0 4 snow 20.7019 10.4610 0 1 0 1
1 4 snow 20.0917 10.0174 0 1 0 1
2 4 rain 20.3978 10.5766 0 1 1 0
3 3 none 20.9804 10.0770 1 0 0 0
4 1 rain 20.3121 10.3584 0 0 1 0
In the case where you'll be interacting with new data that includes categories that df hasn't "seen," you can use
df['col'] = df['col'].cat.add_categories(...)
Where you pass a list of the set difference. This adds to the list of "recognized" categories for the resulting pd.Categorical object.