Converting time format to second in a panda dataframe - python

I have a df with time data and I would like to transform these data to second (see example below).
Compression_level Size (M) Real time (s) User time (s) Sys time (s)
0 0 265 0:19.938 0:24.649 0:3.062
1 1 76 0:17.910 0:25.929 0:3.098
2 2 74 1:02.619 0:27.724 0:3.014
3 3 73 0:20.607 0:27.937 0:3.193
4 4 67 0:19.598 0:28.853 0:2.925
5 5 67 0:21.032 0:30.119 0:3.206
6 6 66 0:27.013 0:31.462 0:3.106
7 7 65 0:27.337 0:36.226 0:3.060
8 8 64 0:37.651 0:47.246 0:2.933
9 9 64 0:59.241 1:8.333 0:3.027
This is the output I would like to obtain.
df["Real time (s)"]
0 19.938
1 17.910
2 62.619
...
I have some useful code but I do not how to itinerate this code in a data frame
x = time.strptime("00:01:00","%H:%M:%S")
datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min, seconds=x.tm_sec).total_seconds()

Add 00: from right side for 0hours, pass to to_timedelta and then add Series.dt.total_seconds:
df["Real time (s)"] = pd.to_timedelta(df["Real time (s)"].radd('00:')).dt.total_seconds()
print (df)
Compression_level Size (M) Real time (s) User time (s) Sys time (s)
0 0 265 19.938 0:24.649 0:3.062
1 1 76 17.910 0:25.929 0:3.098
2 2 74 62.619 0:27.724 0:3.014
3 3 73 20.607 0:27.937 0:3.193
4 4 67 19.598 0:28.853 0:2.925
5 5 67 21.032 0:30.119 0:3.206
6 6 66 27.013 0:31.462 0:3.106
7 7 65 27.337 0:36.226 0:3.060
8 8 64 37.651 0:47.246 0:2.933
9 9 64 59.241 1:8.333 0:3.027
Solution for processing multiple columns:
def to_td(x):
return pd.to_timedelta(x.radd('00:')).dt.total_seconds()
cols = ["Real time (s)", "User time (s)", "Sys time (s)"]
df[cols] = df[cols].apply(to_td)
print (df)
Compression_level Size (M) Real time (s) User time (s) Sys time (s)
0 0 265 19.938 24.649 3.062
1 1 76 17.910 25.929 3.098
2 2 74 62.619 27.724 3.014
3 3 73 20.607 27.937 3.193
4 4 67 19.598 28.853 2.925
5 5 67 21.032 30.119 3.206
6 6 66 27.013 31.462 3.106
7 7 65 27.337 36.226 3.060
8 8 64 37.651 47.246 2.933
9 9 64 59.241 68.333 3.027

Related

I need help comparing data within a table in python

I have the following table
a1b1
a1Eb1
a1b2
a1Eb2
a2b1
a2Eb1
a2b2
a2Eb2
a3b1
a3Eb1
a3b2
a3Eb2
2
20
8
54
3
56
3
67
2
78
7
75
8
30
6
67
6
35
4
56
3
85
6
74
5
54
4
64
7
23
6
48
4
67
4
82
6
65
7
53
8
27
7
35
5
25
3
64
4
34
2
52
4
28
8
27
6
94
2
29
i want to compare the following data:
a1b1 vs a1b2;
then generate arrays containing
a1b1
a1b2
minor a1b1
2
8
20
a1b2
a1b1
minor a1b2
6
8
30
and so for each row of the table
and for each of the following comparisons
a2b1 vs a2b2;
a3b1 vs a3b2;
I have tried to do it with pandas in python
import pandas as pd
import numpy as np
df = pd.DataFrame ({'a1b1':[2,8,5,6,4],
'a1Eb1':[20,30,54,65,34],
'a1b2':[8,6,4,7,2],
'a1Eb2':[54,67,64,53,52],
'a2b1':[3,6,7,8,4],
'a2Eb1':[56,35,23,27,28],
'a2b2':[3,4,6,7,8],
'a2Eb2':[67,56,48,35,27],
'a3b1':[2,3,4,5,6],
'a3Eb1':[78,85,67,25,94],
'a3b2':[7,6,4,3,2],
'a3Eb3':[75,74,82,64,29],
})
but i don't know how to go on.
Output expected
To the first line a1b1<a1b2 then print the following
df1=pd.DataFrame{'a1b1':[2],
'a1b2':[8],
'a1Eb1':[20]}
This can be, a DataFrame, a list or any data structure
If you want to display only specific columns of your dataframe you can use the following syntax with [[ and ]] after the name of the dataframe (df), and in between you just add the names of the columns you want to see. It can be 2,
3 or even all of the columns of the dataframes, as long as you separate their names with a comma and put them between quotes.
df[['a1b1','a1b2']] # to display two columns
df[['a2b1','a2b2']]
df[['a3b1','a3b2']]
to display 3 columns, it could for example be :
df[['a3b1','a3b2','a3b1']]
and so on.

How to read the selected lines from a dataset with non-even intervals?

I've this text file which has an increasing order in second column, but at some points some values repeat itself e.g.,0,12,12,36,... I'm referring to the rows which are separated by 0 0 and then 1 0 and so on. I just want to skip these, while reading the data. So the second column has the increasing value.
Can someone tell me any way to do that in python?
0 0 1 1 1 0 0 0
0 3 0.999551 0.998204 0.995963 2.02497e-06 8.08878e-06 1.81582e-05
0 6 0.999226 0.996908 0.993056 3.50103e-06 1.39702e-05 3.13067e-05
0 9 0.998916 0.995669 0.990283 4.90435e-06 1.95504e-05 4.3739e-05
0 12 0.998613 0.994464 0.987587 6.27845e-06 2.50041e-05 5.58512e-05
0 15 0.998309 0.993255 0.984888 7.63421e-06 3.03781e-05 6.77611e-05
0 18 0.998008 0.992055 0.982214 8.97082e-06 3.56643e-05 7.9433e-05
0 21 0.99771 0.990872 0.979581 1.03001e-05 4.09117e-05 9.09826e-05
0 24 0.997413 0.989692 0.976958 1.16094e-05 4.60742e-05 0.000102324
0 27 0.997111 0.988494 0.974298 1.29506e-05 5.13517e-05 0.000113877
0 30 0.996811 0.987306 0.971666 1.42973e-05 5.66363e-05 0.000125395
0 33 0.996514 0.986129 0.969062 1.56102e-05 6.17854e-05 0.000136606
0 36 0.99622 0.984966 0.96649 1.6868e-05 6.67128e-05 0.000147314
1 0 1 1 1 0 0 0
1 12 0.998615 0.994472 0.987606 1.24824e-05 4.97091e-05 0.000111026
1 24 0.997408 0.989674 0.976917 2.32538e-05 9.22819e-05 0.000204924
1 36 0.996216 0.98495 0.966456 3.37665e-05 0.000133547 0.000294894
1 48 0.995023 0.98024 0.956083 4.41221e-05 0.000173927 0.000381972
1 60 0.993849 0.975622 0.945978 5.45843e-05 0.000214354 0.000467853
1 72 0.992678 0.971031 0.93599 6.49638e-05 0.000254364 0.000552466
1 84 0.991501 0.966432 0.926044 7.5403e-05 0.000294247 0.000635589
1 96 0.990323 0.961846 0.916176 8.55362e-05 0.000332815 0.000715435
1 108 0.989133 0.95723 0.90631 9.602e-05 0.000372371 0.000796123
1 120 0.987925 0.952552 0.89635 0.000106095 0.000410211 0.000872709
1 132 0.986728 0.947946 0.886629 0.000116829 0.000449985 0.000951404
1 144 0.985536 0.943378 0.87706 0.000127786 0.000490311 0.00103029
1 156 0.984333 0.938787 0.867512 0.000138898 0.000531114 0.00110972
1 168 0.983124 0.93419 0.858003 0.000149945 0.000571148 0.00118605
2 0 1 1 1 0 0 0
2 60 0.993889 0.975779 0.946334 0.000122674 0.000481801 0.0010518
2 120 0.98802 0.95292 0.897129 0.000235474 0.000910013 0.0019347
2 180 0.981998 0.929939 0.849324 0.000360693 0.00136728 0.00281767
2 240 0.976087 0.907868 0.805034 0.00048759 0.00180865 0.0036021
2 300 0.970186 0.886203 0.762767 0.000606964 0.00221121 0.0042844
2 360 0.964519 0.865822 0.724262 0.000723555 0.00257783 0.0048463
2 420 0.959195 0.846993 0.689658 0.000830297 0.00290486 0.00533017
2 480 0.953931 0.828808 0.657473 0.000940967 0.00322907 0.00579317
2 540 0.948992 0.812283 0.629672 0.00105503 0.0035387 0.00617566
2 600 0.94387 0.795353 0.601452 0.00116622 0.00381699 0.00650445
2 660 0.938843 0.778862 0.57426 0.00126677 0.00406694 0.00680719
2 720 0.933909 0.762839 0.548423 0.0013606 0.0043114 0.00712883
2 780 0.929153 0.7477 0.525167 0.00145272 0.00455818 0.0074014
2 840 0.924413 0.732931 0.503387 0.00154657 0.00480149 0.00765192
2 900 0.919724 0.718536 0.482191 0.00163803 0.0050077 0.00783869
You can load the file with np.loadtxt and delete the second column with np.delete using the axis 1:
arr = np.loadtxt('test.txt')
arr = np.delete(arr, 1, axis=1)

How can I Extract only numbers from this columns?

Suppose, you have a column in excel, with values like this... there are only 5500 numbers present but it show length 5602 means that 102 strings are present
4 SELECTIO
6 N NO
14 37001
26 37002
38 37003
47 37004
60 37005
73 37006
82 37007
92 37008
105 37009
119 37010
132 37011
143 37012
157 37013
168 37014
184 37015
196 37016
207 37017
220 37018
236 37019
253 37020
267 37021
280 37022
287 Krishan
290 37023
300 37024
316 37025
337 37026
365 37027
...
74141 42471
74154 42472
74169 42473
74184 42474
74200 42475
74216 42476
74233 42477
74242 42478
74256 42479
74271 42480
74290 42481
74309 42482
74323 42483
74336 42484
74350 42485
74365 42486
74378 42487
74389 42488
74398 42489
74413 42490
74430 42491
74446 42492
74459 42493
74474 42494
74491 42495
74504 42496
74516 42497
74530 42498
74544 42499
74558 42500
Name: Selection No., Length: 5602, dtype: object
and I want to get only numeric values like this in python using pandas
37001
37002
37003
37004
37005
how can I do this? I have attached my code in python using pandas..............................................
def selection(sle):
if sle in re.match('[3-4][0-9]{4}',sle):
return 1
else:
return 0
select['status'] = select['Selection No.'].apply(selection)
and now I am geting an "argument of type 'NoneType' is not iterable" error.
Try using Numpy with np.isreal and only select numbers..
import pandas as pd
import numpy as np
df = pd.DataFrame({'SELECTIO':['N NO',37002,37003,'Krishan',37004,'singh',37005], 'some_col':[4,6,14,26,38,47,60]})
df
SELECTIO some_col
0 N NO 4
1 37002 6
2 37003 14
3 Krishan 26
4 37004 38
5 singh 47
6 37005 60
>>> df[df[['SELECTIO']].applymap(np.isreal).all(1)]
SELECTIO some_col
1 37002 6
2 37003 14
4 37004 38
6 37005 60
result:
Specific to column SELECTIO ..
df[df[['SELECTIO']].applymap(np.isreal).all(1)]
SELECTIO some_col
1 37002 6
2 37003 14
4 37004 38
6 37005 60
OR just another approach importing numbers + lambda :
import numbers
df[df[['SELECTIO']].applymap(lambda x: isinstance(x, numbers.Number)).all(1)]
SELECTIO some_col
1 37002 6
2 37003 14
4 37004 38
6 37005 60
Note: there is problem when you are extracting a column you are using ['Selection No.'] but indeed you have a Space in the name it will be like ['Selection No. '] that's the reason you are getting KeyError while executing it, try and see!
Your function contains wrong expression: if sle in re.match('[3-4][0-9]{4}',sle): - it tries to find a column value sle IN match object which "always have a boolean value of True" (re.match returns None when there's no match)
I would suggest to proceed with pd.Series.str.isnumeric function:
In [544]: df
Out[544]:
Selection No.
0 37001
1 37002
2 37003
3 asnsh
4 37004
5 singh
6 37005
In [545]: df['Status'] = df['Selection No.'].str.isnumeric().astype(int)
In [546]: df
Out[546]:
Selection No. Status
0 37001 1
1 37002 1
2 37003 1
3 asnsh 0
4 37004 1
5 singh 0
6 37005 1
If a strict regex pattern is required - use pd.Series.str.contains function:
df['Status'] = df['Selection No.'].str.contains('^[3-4][0-9]{4}$', regex=True).astype(int)

Scale values of a particular column of python dataframe between 1-10

I have a dataframe which contains youtube videos views, I want to scale these values in the range of 1-10.
Below is the sample of how values look like? How do i normalize it in the range of 1-10 or is there any more efficient way to do this thing?
rating
4394029
274358
473691
282858
703750
255967
3298456
136643
796896
2932
220661
48688
4661584
2526119
332176
7189818
322896
188162
157437
1153128
788310
1307902
One possibility is performing a scaling with max.
1 + df / df.max() * 9
rating
0 6.500315
1 1.343433
2 1.592952
3 1.354073
4 1.880933
5 1.320412
6 5.128909
7 1.171046
8 1.997531
9 1.003670
10 1.276217
11 1.060946
12 6.835232
13 4.162121
14 1.415808
15 10.000000
16 1.404192
17 1.235536
18 1.197075
19 2.443451
20 1.986783
21 2.637193
Similar solution by Wen (now deleted):
1 + (df - df.min()) * 9 / (df.max() - df.min())
rating
0 6.498887
1 1.339902
2 1.589522
3 1.350546
4 1.877621
5 1.316871
6 5.126922
7 1.167444
8 1.994266
9 1.000000
10 1.272658
11 1.057299
12 6.833941
13 4.159739
14 1.412306
15 10.000000
16 1.400685
17 1.231960
18 1.193484
19 2.440368
20 1.983514
21 2.634189

R's pdIndent function in RPy

I am working on translating the code for the lmeSplines tutorial to RPy.
I am now stuck at the following line:
fit1s <- lme(y ~ time, data=smSplineEx1,random=list(all=pdIdent(~Zt - 1)))
I have worked with nlme.lme before, and the following works just fine:
from rpy2.robjects.packages import importr
nlme = importr('nlme')
nlme.lme(r.formula('y ~ time'), data=some_data, random=r.formula('~1|ID'))
But this has an other random assignment. I am wondering how I can translate this bit and put it into my RPy code as well list(all=pdIdent(~Zt - 1)).
The structure of the (preprocessed) example data smSplineEx1 looks like this (with Zt.* up to 98):
time y y.true all Zt.1 Zt.2 Zt.3
1 1 5.797149 4.235263 1 1.168560e+00 2.071261e+00 2.944953e+00
2 2 5.469222 4.461302 1 1.487859e-01 1.072013e+00 1.948857e+00
3 3 4.567237 4.678477 1 -5.449190e-02 7.276623e-02 9.527613e-01
4 4 3.645763 4.887137 1 -5.364552e-02 -1.359115e-01 -4.333438e-02
5 5 5.094126 5.087615 1 -5.279913e-02 -1.337708e-01 -2.506194e-01
6 6 4.636121 5.280233 1 -5.195275e-02 -1.316300e-01 -2.466158e-01
7 7 5.501538 5.465298 1 -5.110637e-02 -1.294892e-01 -2.426123e-01
8 8 5.011509 5.643106 1 -5.025998e-02 -1.273485e-01 -2.386087e-01
9 9 6.114037 5.813942 1 -4.941360e-02 -1.252077e-01 -2.346052e-01
10 10 5.696472 5.978080 1 -4.856722e-02 -1.230670e-01 -2.306016e-01
11 11 6.615363 6.135781 1 -4.772083e-02 -1.209262e-01 -2.265980e-01
12 12 8.002526 6.287300 1 -4.687445e-02 -1.187854e-01 -2.225945e-01
13 13 6.887444 6.432877 1 -4.602807e-02 -1.166447e-01 -2.185909e-01
14 14 6.319205 6.572746 1 -4.518168e-02 -1.145039e-01 -2.145874e-01
15 15 6.482771 6.707130 1 -4.433530e-02 -1.123632e-01 -2.105838e-01
16 16 7.938015 6.836245 1 -4.348892e-02 -1.102224e-01 -2.065802e-01
17 17 7.585533 6.960298 1 -4.264253e-02 -1.080816e-01 -2.025767e-01
18 18 7.560287 7.079486 1 -4.179615e-02 -1.059409e-01 -1.985731e-01
19 19 7.571020 7.194001 1 -4.094977e-02 -1.038001e-01 -1.945696e-01
20 20 8.922418 7.304026 1 -4.010338e-02 -1.016594e-01 -1.905660e-01
21 21 8.241394 7.409737 1 -3.925700e-02 -9.951861e-02 -1.865625e-01
22 22 7.447076 7.511303 1 -3.841062e-02 -9.737785e-02 -1.825589e-01
23 23 7.317292 7.608886 1 -3.756423e-02 -9.523709e-02 -1.785553e-01
24 24 7.077333 7.702643 1 -3.671785e-02 -9.309633e-02 -1.745518e-01
25 25 8.268601 7.792723 1 -3.587147e-02 -9.095557e-02 -1.705482e-01
26 26 8.216013 7.879272 1 -3.502508e-02 -8.881481e-02 -1.665447e-01
27 27 8.968495 7.962427 1 -3.417870e-02 -8.667405e-02 -1.625411e-01
28 28 9.085605 8.042321 1 -3.333232e-02 -8.453329e-02 -1.585375e-01
29 29 9.002575 8.119083 1 -3.248593e-02 -8.239253e-02 -1.545340e-01
30 30 8.763187 8.192835 1 -3.163955e-02 -8.025177e-02 -1.505304e-01
31 31 8.936370 8.263695 1 -3.079317e-02 -7.811101e-02 -1.465269e-01
32 32 9.033403 8.331776 1 -2.994678e-02 -7.597025e-02 -1.425233e-01
33 33 8.248328 8.397188 1 -2.910040e-02 -7.382949e-02 -1.385198e-01
34 34 5.961721 8.460035 1 -2.825402e-02 -7.168873e-02 -1.345162e-01
35 35 8.400489 8.520418 1 -2.740763e-02 -6.954797e-02 -1.305126e-01
36 36 6.855125 8.578433 1 -2.656125e-02 -6.740721e-02 -1.265091e-01
37 37 9.798931 8.634174 1 -2.571487e-02 -6.526645e-02 -1.225055e-01
38 38 8.862758 8.687729 1 -2.486848e-02 -6.312569e-02 -1.185020e-01
39 39 7.282970 8.739184 1 -2.402210e-02 -6.098493e-02 -1.144984e-01
40 40 7.484208 8.788621 1 -2.317572e-02 -5.884417e-02 -1.104949e-01
41 41 8.404670 8.836120 1 -2.232933e-02 -5.670341e-02 -1.064913e-01
42 42 8.880734 8.881756 1 -2.148295e-02 -5.456265e-02 -1.024877e-01
43 43 8.826189 8.925603 1 -2.063657e-02 -5.242189e-02 -9.848418e-02
44 44 9.827906 8.967731 1 -1.979018e-02 -5.028113e-02 -9.448062e-02
45 45 8.528795 9.008207 1 -1.894380e-02 -4.814037e-02 -9.047706e-02
46 46 9.484073 9.047095 1 -1.809742e-02 -4.599961e-02 -8.647351e-02
47 47 8.911947 9.084459 1 -1.725103e-02 -4.385885e-02 -8.246995e-02
48 48 10.201343 9.120358 1 -1.640465e-02 -4.171809e-02 -7.846639e-02
49 49 8.908016 9.154849 1 -1.555827e-02 -3.957733e-02 -7.446283e-02
50 50 8.202368 9.187988 1 -1.471188e-02 -3.743657e-02 -7.045927e-02
51 51 7.432851 9.219828 1 -1.386550e-02 -3.529581e-02 -6.645572e-02
52 52 8.063268 9.250419 1 -1.301912e-02 -3.315505e-02 -6.245216e-02
53 53 10.155756 9.279810 1 -1.217273e-02 -3.101429e-02 -5.844860e-02
54 54 7.905281 9.308049 1 -1.132635e-02 -2.887353e-02 -5.444504e-02
55 55 9.688337 9.335181 1 -1.047997e-02 -2.673277e-02 -5.044148e-02
56 56 9.437176 9.361249 1 -9.633582e-03 -2.459201e-02 -4.643793e-02
57 57 9.165873 9.386295 1 -8.787198e-03 -2.245125e-02 -4.243437e-02
58 58 9.120195 9.410358 1 -7.940815e-03 -2.031049e-02 -3.843081e-02
59 59 9.955840 9.433479 1 -7.094432e-03 -1.816973e-02 -3.442725e-02
60 60 9.314230 9.455692 1 -6.248048e-03 -1.602897e-02 -3.042369e-02
61 61 9.706852 9.477035 1 -5.401665e-03 -1.388821e-02 -2.642014e-02
62 62 9.615765 9.497541 1 -4.555282e-03 -1.174746e-02 -2.241658e-02
63 63 7.918843 9.517242 1 -3.708898e-03 -9.606695e-03 -1.841302e-02
64 64 9.352892 9.536172 1 -2.862515e-03 -7.465935e-03 -1.440946e-02
65 65 9.722685 9.554359 1 -2.016132e-03 -5.325176e-03 -1.040590e-02
66 66 9.186888 9.571832 1 -1.169748e-03 -3.184416e-03 -6.402346e-03
67 67 8.652299 9.588621 1 -3.233650e-04 -1.043656e-03 -2.398788e-03
68 68 8.681421 9.604751 1 5.230184e-04 1.097104e-03 1.604770e-03
69 69 10.279181 9.620249 1 1.369402e-03 3.237864e-03 5.608328e-03
70 70 9.314963 9.635140 1 2.215785e-03 5.378623e-03 9.611886e-03
71 71 6.897151 9.649446 1 3.062168e-03 7.519383e-03 1.361544e-02
72 72 9.343135 9.663191 1 3.908552e-03 9.660143e-03 1.761900e-02
73 73 9.273135 9.676398 1 4.754935e-03 1.180090e-02 2.162256e-02
74 74 10.041796 9.689086 1 5.601318e-03 1.394166e-02 2.562612e-02
75 75 9.724713 9.701278 1 6.447702e-03 1.608242e-02 2.962968e-02
76 76 8.593517 9.712991 1 7.294085e-03 1.822318e-02 3.363323e-02
77 77 7.401988 9.724244 1 8.140468e-03 2.036394e-02 3.763679e-02
78 78 10.258688 9.735057 1 8.986852e-03 2.250470e-02 4.164035e-02
79 79 10.037192 9.745446 1 9.833235e-03 2.464546e-02 4.564391e-02
80 80 9.637510 9.755427 1 1.067962e-02 2.678622e-02 4.964747e-02
81 81 8.887625 9.765017 1 1.152600e-02 2.892698e-02 5.365102e-02
82 82 9.922013 9.774230 1 1.237239e-02 3.106774e-02 5.765458e-02
83 83 10.466709 9.783083 1 1.321877e-02 3.320850e-02 6.165814e-02
84 84 11.132830 9.791588 1 1.406515e-02 3.534926e-02 6.566170e-02
85 85 10.154038 9.799760 1 1.491154e-02 3.749002e-02 6.966526e-02
86 86 10.433068 9.807612 1 1.575792e-02 3.963078e-02 7.366881e-02
87 87 9.666781 9.815156 1 1.660430e-02 4.177154e-02 7.767237e-02
88 88 9.478004 9.822403 1 1.745069e-02 4.391230e-02 8.167593e-02
89 89 10.002749 9.829367 1 1.829707e-02 4.605306e-02 8.567949e-02
90 90 7.593259 9.836058 1 1.914345e-02 4.819382e-02 8.968305e-02
91 91 10.915754 9.842486 1 1.998984e-02 5.033458e-02 9.368660e-02
92 92 8.855580 9.848662 1 2.083622e-02 5.247534e-02 9.769016e-02
93 93 8.884683 9.854596 1 2.168260e-02 5.461610e-02 1.016937e-01
94 94 9.757451 9.860298 1 2.252899e-02 5.675686e-02 1.056973e-01
95 95 10.222361 9.865775 1 2.337537e-02 5.889762e-02 1.097008e-01
96 96 9.090410 9.871038 1 2.422175e-02 6.103838e-02 1.137044e-01
97 97 8.837872 9.876095 1 2.506814e-02 6.317914e-02 1.177080e-01
98 98 9.413135 9.880953 1 2.591452e-02 6.531990e-02 1.217115e-01
99 99 9.295531 9.885621 1 2.676090e-02 6.746066e-02 1.257151e-01
100 100 9.698118 9.890106 1 2.760729e-02 6.960142e-02 1.297186e-01
You can put list(all=pdIdent(~Zt - 1)) in the R's global environment using reval() method:
In [55]:
import rpy2.robjects as ro
import pandas.rpy.common as com
mydata = ro.r['data.frame']
read = ro.r['read.csv']
head = ro.r['head']
summary = ro.r['summary']
library = ro.r['library']
In [56]:
formula = '~ time'
library('lmeSplines')
ro.reval('data(smSplineEx1)')
ro.reval('smSplineEx1$all <- rep(1,nrow(smSplineEx1))')
ro.reval('smSplineEx1$Zt <- smspline(~ time, data=smSplineEx1)')
ro.reval('rnd <- list(all=pdIdent(~Zt - 1))')
#result = ro.r.smspline(formula=ro.r(formula), data=ro.r.smSplineEx1) #notice: data=ro.r.smSplineEx1
result = ro.r.lme(ro.r('y~time'), data=ro.r.smSplineEx1, random=ro.r.rnd)
In [57]:
print com.convert_robj(result.rx('coefficients'))
{'coefficients': {'random': {'all': Zt1 Zt2 Zt3 Zt4 Zt5 Zt6 Zt7 \
1 0.000509 0.001057 0.001352 0.001184 0.000869 0.000283 -0.000424
Zt8 Zt9 Zt10 ... Zt89 Zt90 Zt91 \
1 -0.001367 -0.002325 -0.003405 ... -0.001506 -0.001347 -0.000864
Zt92 Zt93 Zt94 Zt95 Zt96 Zt97 Zt98
1 -0.000631 -0.000569 -0.000392 -0.000049 0.000127 0.000114 0.000071
[1 rows x 98 columns]}, 'fixed': (Intercept) 6.498800
time 0.038723
dtype: float64}}
Be careful, the result is a little bit out of shape. Basically it is nested dictionary which can not be converted into a pandas.DataFrame.
You can access y in smsSplineEx by ro.r.smSplineEx1.rx('y'), similar to smsSplineEx1$y as you would do so in R.
Now say if you have the result variable in python, generated by
result = ro.r.lme(ro.r('y~time'), data=ro.r.smSplineEx1, random=ro.r.rnd)
and you want to plot it using R, (instead of plotting it using, say, matplotlib), you need to assign it to a variable in R workspace:
ro.R().assign('result', result)
Now there is a variable named result in R workspace, you can access it using ro.r.result.
Plotting it using R:
In [17]:
ro.reval('plot(smSplineEx1$time,smSplineEx1$y,pch="o",type="n", \
main="Spline fits: lme(y ~ time, random=list(all=pdIdent(~Zt-1)))", \
xlab="time",ylab="y")')
Out[17]:
rpy2.rinterface.NULL
In [21]:
ro.reval('lines(smSplineEx1$time, fitted(result),col=2)')
Out[21]:
rpy2.rinterface.NULL
Or you can do everything in R:
ro.reval('result <- lme(y ~ time, data=smSplineEx1,random=list(all=pdIdent(~Zt - 1)))')
ro.reval('plot(smSplineEx1$time,smSplineEx1$y,pch="o",type="n", \
main="Spline fits: lme(y ~ time, random=list(all=pdIdent(~Zt-1)))", \
xlab="time",ylab="y")')
ro.reval('lines(smSplineEx1$time, fitted(result),col=2)')
and access the R variables using:ro.r.smSplineEx1.rx2('time') or ro.r.result
Edit
Notice some R objects can not be converted to pandas.dataFrame as-is due to mixture of data structure:
In [62]:
ro.r["smSplineEx1"]
Out[62]:
<DataFrame - Python:0x108525518 / R:0x109e5da38>
[FloatVe..., FloatVe..., FloatVe..., FloatVe..., Matrix]
time: <class 'rpy2.robjects.vectors.FloatVector'>
<FloatVector - Python:0x10807e518 / R:0x1022599e0>
[1.000000, 2.000000, 3.000000, ..., 98.000000, 99.000000, 100.000000]
y: <class 'rpy2.robjects.vectors.FloatVector'>
<FloatVector - Python:0x108525a70 / R:0x102259d30>
[5.797149, 5.469222, 4.567237, ..., 9.413135, 9.295531, 9.698118]
y.true: <class 'rpy2.robjects.vectors.FloatVector'>
<FloatVector - Python:0x1085257a0 / R:0x10225dfb0>
[4.235263, 4.461302, 4.678477, ..., 9.880953, 9.885621, 9.890106]
all: <class 'rpy2.robjects.vectors.FloatVector'>
<FloatVector - Python:0x1085258c0 / R:0x10225e300>
[1.000000, 1.000000, 1.000000, ..., 1.000000, 1.000000, 1.000000]
Zt: <class 'rpy2.robjects.vectors.Matrix'>
<Matrix - Python:0x108525908 / R:0x103e8ba00>
[1.168560, 0.148786, -0.054492, ..., -0.030141, -0.030610, 0.757597]
Notice that we have a few vectors but the last one is a Matrix. We have to convert smSplineEx to python in two parts.
In [63]:
ro.r["smSplineEx1"].names
Out[63]:
<StrVector - Python:0x108525dd0 / R:0x1042ca7c0>
['time', 'y', 'y.true', 'all', 'Zt']
In [64]:
print com.convert_robj(ro.r["smSplineEx1"].rx(ro.IntVector(range(1, 5)))).head()
time y y.true all
1 1 5.797149 4.235263 1
2 2 5.469222 4.461302 1
3 3 4.567237 4.678477 1
4 4 3.645763 4.887137 1
5 5 5.094126 5.087615 1
In [65]:
print com.convert_robj(ro.r["smSplineEx1"].rx2('Zt')).head(2)
0 1 2 3 4 5 6 \
1 1.168560 2.071261 2.944953 3.782848 4.584037 5.348937 6.078121
2 0.148786 1.072013 1.948857 2.789264 3.593423 4.361817 5.095016
7 8 9 ... 88 89 90 \
1 6.772184 7.431719 8.057321 ... 0.933947 0.769591 0.619420
2 5.793601 6.458153 7.089255 ... 0.904395 0.745337 0.599976
91 92 93 94 95 96 97
1 0.484029 0.36401 0.259959 0.172468 0.102133 0.049547 0.015305
2 0.468893 0.35267 0.251890 0.167135 0.098986 0.048026 0.014836
[2 rows x 98 columns]
com.convert_robj(ro.r["smSplineEx1"]) will not work due to the mixed data structure issue.

Categories

Resources