Related
I have a .dat file of coordinates (x,y and z), separated by a marker (an integer). Here's a snippet of it:
500
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
0.06223 0.06222 0
0.04705 0.05386 0
0.03388 0.04528 0
0.02281 0.03663 0
0.01391 0.02808 0
42
0.00733 0.01969 0
0.00297 0.01152 0
0.01809 -0.01422 0
0.03068 -0.01687 0
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
42
0.14166 0.09077 0
0.11918 0.08461 0
0.09838 0.07771 0
0.07937 0.07022 0
What's the best way to separate it in chunks (preferably, one array per interval between markers)?
It's just a fraction of the data, in reality there are a few thousand points.
I would suggest to apply the power of pandas and numpy libraries.
We start with loading the input file into dataframe with skipping the 1st row (skiprows=1) and explicitly specifying the number of columns via column names (names=['x','y','z']) meaning that marker lines will be treated as 1-column row with NaN values (like 42.00000 NaN NaN):
import pandas as pd
import numpy as np
coords = pd.read_table('test.dat', delim_whitespace=True, header=None,
engine='python', skiprows=1, names=['x','y','z'])
Then finding the positions of marker lines on which the coords dataframe will be splitted into chunks:
na_markers = coords.loc[coords['y'].isna()].index
Finally splitting and getting the needed numpy arrays:
coords = [chunk.dropna().to_numpy() for chunk in np.split(coords, na_markers)]
That's it, now coords contains a list of the needed coordinates "chunks":
[array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ],
[0.06223, 0.06222, 0. ],
[0.04705, 0.05386, 0. ],
[0.03388, 0.04528, 0. ],
[0.02281, 0.03663, 0. ],
[0.01391, 0.02808, 0. ]]), array([[ 0.00733, 0.01969, 0. ],
[ 0.00297, 0.01152, 0. ],
[ 0.01809, -0.01422, 0. ],
[ 0.03068, -0.01687, 0. ],
[ 0.14166, 0.09077, 0. ],
[ 0.11918, 0.08461, 0. ],
[ 0.09838, 0.07771, 0. ],
[ 0.07937, 0.07022, 0. ]]), array([[0.14166, 0.09077, 0. ],
[0.11918, 0.08461, 0. ],
[0.09838, 0.07771, 0. ],
[0.07937, 0.07022, 0. ]])]
I'm working on a project where I need to calibrate to cameras. As you know one needs to define a plane grid points in the 3D-world and find their correspondences on the image plane. Therefore, the first camera has the following 3D_grid points:
mport cv2 as cv
import numpy as np
WPoints_cam1 = np.zeros((9*3,3), np.float64)
WPoints_cam1[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4
print(WPoints_cam1)
[[0. 0. 0. ]# world coordinate center
[0.4 0. 0. ]
[0.8 0. 0. ]
[1.2 0. 0. ]
[1.6 0. 0. ]
[2. 0. 0. ]
[2.4 0. 0. ]
[2.8 0. 0. ]
[3.2 0. 0. ]
[0. 0.4 0. ]
[0.4 0.4 0. ]
[0.8 0.4 0. ]
[1.2 0.4 0. ]
[1.6 0.4 0. ]
[2. 0.4 0. ]
[2.4 0.4 0. ]
[2.8 0.4 0. ]
[3.2 0.4 0. ]
[0. 0.8 0. ]
[0.4 0.8 0. ]
[0.8 0.8 0. ]
[1.2 0.8 0. ]
[1.6 0.8 0. ]
[2. 0.8 0. ]
[2.4 0.8 0. ]
[2.8 0.8 0. ]
[3.2 0.8 0. ]]
As seen above the first grid (for the first camera) starts from the defined reference 3D_point (0,0,0) and ends by the point (3.2,0.8 0) with a constant offset of 0.4 and 9x3 dimension
Note that all Z coordinates were put to Z=0 (Zhengyou Zhang calibration)
Now my question is, as I need to define a second grid(for the second camera) that also refers to the defined 3D_coordinate center (0,0,0), I need to define a grid that starts from (3.6,0,0) and ends with (6.8,0.8,0) with the same offset 0.4 and has a dimension 9x3
I believe this is easy to do. However I can't think out of the box due to my beginner level of experience.
Would appreciate for some help and thanks in advance.
You can scale each column like this:
np.mgrid[0:8, 0:3].T.reshape(-1,2) * np.array([(7.8 - 3.6) / 7, 0.4]) + np.array([3.6, 0])
or combine it into scaling matrix like this (and then add on a vector for the translation)
np.mgrid[0:8, 0:3].T.reshape(-1,2) # np.array([[(7.8 - 3.6) / 7, 0], [0, 0.4]]).T + np.array([3.6, 0])
regarding where (7.8 - 3.6) / 7 comes from, the numerator should be self evident. The denominator is the same but for your original dimensions. With 0:8 the max is 7 and the min is 0 so the denominator becomes 7 - 0.
So I have a file that looks something like this:
# 3 # Number of network ROIs
# 2 # Number of netcc matrices
# WITH_ROI_LABELS
001 002 003
1 2 3
# CC
1.0000 0.9800 0.9895
0.9800 1.0000 0.9817
0.9895 0.9817 1.0000
# FZ
4.0000 2.2965 2.6240
2.2965 4.0000 2.3426
2.6240 2.3426 4.0000
I want to extract the 3x3 matrix labelled "CC"
I want to extract the 3x3 matrix labelled "FZ"
So I did the following:
file=/users/3dfile1
A= numpy.genfromtxt(file)
m= A[:,:]
m
So the output I get looks like this:
array([[ 1. , 2. , 3. ],
[ 1. , 2. , 3. ],
[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ],
[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
However, my question is... if I have multiple files. Where the matrix size is NOT CONSISTENT. This means that in some files the matrix will be 3x3, some files 8x8, 1x1, etc. In this case, how can I code something that will:
differentiate the matrix CC from FZ
extract the matrix (can detect the size of matrix somehow and give me the exact matrix I'm looking for)
Try
import numpy as np
x = np.array([[ 1. , 2. , 3. ],
[ 1. , 2. , 3. ],
[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ],
[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
x1 = x[2:,:]
x2 = x1.reshape(2,3,3)
CC ,FZ = x2
Result:
In [23]: CC
Out[23]:
array([[ 1. , 0.98 , 0.9895],
[ 0.98 , 1. , 0.9817],
[ 0.9895, 0.9817, 1. ]])
In [24]: FZ
Out[24]:
array([[ 4. , 2.2965, 2.624 ],
[ 2.2965, 4. , 2.3426],
[ 2.624 , 2.3426, 4. ]])
I have a data set which has some categorical columns. Here is a small sample:
Temp precip dow tod
-20.44 snow 4 14.5
-22.69 snow 4 15.216666666666667
-21.52 snow 4 17.316666666666666
-21.52 snow 4 17.733333333333334
-20.51 snow 4 18.15
Here, the dow and precip are categorical, where as the others are continuous.
Is there a way I can create a OneHotEncoder for just those columns? I don't want to use pd.get_dummies because that won't put the data in the proper format unless of each dow and precip are in the new data.
Two things you could check out: sklearn-pandas and as mentioned by #Grr pipelines with this good intro.
So I prefer pipelines, as they are a tidy way, allow easy use with things like grid-seach, avoid leakage between folds in cross validation, etc. So I usually end up having a pipe like that (given you have precip LabelEncoded first):
from sklearn.pipeline import Pipeline, FeatureUnion, make_pipeline, make_union
from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.linear_model import LinearRegression
class Columns(BaseEstimator, TransformerMixin):
def __init__(self, names=None):
self.names = names
def fit(self, X, y=None, **fit_params):
return self
def transform(self, X):
return X[self.names]
class Normalize(BaseEstimator, TransformerMixin):
def __init__(self, func=None, func_param={}):
self.func = func
self.func_param = func_param
def transform(self, X):
if self.func != None:
return self.func(X, **self.func_param)
else:
return X
def fit(self, X, y=None, **fit_params):
return self
cat_cols = ['precip', 'dow']
num_cols = ['Temp','tod']
pipe = Pipeline([
("features", FeatureUnion([
('numeric', make_pipeline(Columns(names=num_cols),Normalize())),
('categorical', make_pipeline(Columns(names=cat_cols),OneHotEncoder(sparse=False)))
])),
('model', LinearRegression())
])
The short answer is yes, but with some caveats.
First off you won't be able to use OneHotEncoder directly on the precip feature. You will need to encode those labels in to integers with LabelEncoder.
Secondly, if you just want to encode those features you can pass the proper values to the n_values and categorical_features parameters.
Example:
I will assume dow is day of the week, which will have seven values, and precip will have (rain, sleet, snow, and mix) as values.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
df2 = df.copy()
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
df2.precip = le.transform(df2.precip)
df2
Temp precip dow tod
0 -20.44 3 4 14.500000
1 -22.69 3 4 15.216667
2 -21.52 3 4 17.316667
3 -21.52 3 4 17.733333
4 -20.51 3 4 18.150000
# Initialize OneHotEncoder with 4 values for precip and 7 for dow.
ohe = OneHotEncoder(n_values=np.array([4,7]), categorical_features=[1,2])
X = ohe.fit_transform(df2)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
Ok that works, but you have to either mutate your data in place or create a copy an things can get a little messy. A more organized way to do this would be to use a Pipeline.
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
def get_precip(X):
le = LabelEncoder()
le.fit(['rain', 'sleet', 'snow', 'mix'])
return le.transform(X.precip).reshape(-1,1)
def get_dow(X):
return X.dow.values.reshape(-1,1)
def get_rest(X):
return X.drop(['precip', 'dow'], axis=1)
precip_trans = FunctionTransformer(get_precip, validate=False)
dow_trans = FunctionTransformer(get_dow, validate=False)
rest_trans = FunctionTransformer(get_rest, validate=False)
union = FeatureUnion([('precip', precip_trans), ('dow', dow_trans), ('rest', rest_trans)])
ohe = OneHotEncoder(n_values=[4,7], categorical_features=[0,1])
pipe = Pipeline([('union', union), ('one_hot', ohe)])
X = pipe.fit_transform(df)
X.toarray()
array([[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.44 , 14.5 ],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -22.69 ,
15.21666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.31666667],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -21.52 ,
17.73333333],
[ 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. ,
1. , 0. , 0. , -20.51 , 18.15 ]])
I do want to point out that in the upcoming release of sklearn v0.20 there will be a CategoricalEncoder which should make this kind of thing even easier.
I don't want to use pd.get_dummies because that won't put the data in
the proper format unless of each dow and precip are in the new data.
Assuming you want to encode but also maintain those two columns--are you sure this wouldn't work for you?
df = pd.DataFrame({
'temp': np.random.random(5) + 20.,
'precip': pd.Categorical(['snow', 'snow', 'rain', 'none', 'rain']),
'dow': pd.Categorical([4, 4, 4, 3, 1]),
'tod': np.random.random(5) + 10.
})
pd.concat((df[['dow', 'precip']],
pd.get_dummies(df, columns=['dow', 'precip'], drop_first=True)),
axis=1)
dow precip temp tod dow_3 dow_4 precip_rain precip_snow
0 4 snow 20.7019 10.4610 0 1 0 1
1 4 snow 20.0917 10.0174 0 1 0 1
2 4 rain 20.3978 10.5766 0 1 1 0
3 3 none 20.9804 10.0770 1 0 0 0
4 1 rain 20.3121 10.3584 0 0 1 0
In the case where you'll be interacting with new data that includes categories that df hasn't "seen," you can use
df['col'] = df['col'].cat.add_categories(...)
Where you pass a list of the set difference. This adds to the list of "recognized" categories for the resulting pd.Categorical object.
I have a function that returns a numpy array every second , that i want to store in another array for reference. for eg (array_a is returned)
array_a = [[ 25. 50. 25. 25. 50. ]
[ 1. 1. 1. 1. 1. ]]
array_collect = np.append(array_a,array_collect)
But when i Print array_collect , i get an added array, not a bigger array with arrays inside it.
array_collect = [ 25. 50. 25. 25. 50.
1. 1. 1. 1. 1.
25. 50. 25. 25. 50.
1. 1. 1. 1. 1.
25. 50. 25. 25. 50. ]
what i want is
array_collect = [ [[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]]
[[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]]
[[ 25. 50. 25. 25. 50. ]
[1. 1. 1. 1. 1. ]] ]
How do i get it ??
You could use vstack:
array_collect = np.array([[25.,50.,25.,25.,50.],[1.,1.,1.,1.,1.]])
array_a = np.array([[2.,5.,2.,2.,5.],[1.,1.,1.,1.,1.]])
array_collect=np.vstack((array_collect,array_a))
However, if you know the total number of minutes in advance, it would be better to define your array first (e.g. using zeros) and gradually fill it - this way, it is easier to stay within memory limits.
no_minutes = 5 #say 5 minutes
array_collect = np.zeros((no_minutes,array_a.shape[0],array_a.shape[1]))
Then, for every minute, m
array_collect[m] = array_a
Just use np.concatenate() and reshape this way:
import numpy as np
array_collect = np.array([[25.,50.,25.,25.,50.],[1.,1.,1.,1.,1.]])
array_a = np.array([[2.,5.,2.,2.,5.],[1.,1.,1.,1.,1.]])
array_collect = np.concatenate((array_collect,array_a),axis=0).reshape(2,2,5)
>>
[[[ 25. 50. 25. 25. 50.]
[ 1. 1. 1. 1. 1.]]
[[ 2. 5. 2. 2. 5.]
[ 1. 1. 1. 1. 1.]]]
I found it , this can be done by using :
np.reshape()
the new array formed can be reshaped using
y= np.reshape(y,(a,b,c))
where a is the no. of arrays stores and (b,c) is the shape of the original array