Pandas Dataframe Comparison and Copying

Pandas Dataframe Comparison and Copying - python

Below I have two dataframes, the first being dataframe det and the second being orig. I need to compare det['Detection'] with orig['Date/Time']. Once the values are found during the comparion, I need to copy values from orig and det to some final dataframe (final). The format that I need the final dataframe in is det['Date/Time'] orig['Lat'] orig['Lon'] orig['Dep'] det['Mag'] I hope that my formatting is adequate for folks. I was not sure how to handle the dataframes so I just placed them in tables. Some additional information that probably won't matter is that det is 3385 rows by 3 columns and orig is 818 rows by 9 columns.
det:
Date/Time
Mag
Detection
2008/12/27T01:06:56.37
0.280
2008/12/27T13:50:07.00
2008/12/27T01:17:39.39
0.485
2008/12/27T01:17:39.00
2008/12/27T01:33:23.00
-0.080
2008/12/27T01:17:39.00
orig:
Date/Time
Lat
Lon
Dep
Ml
Mc
N
Dmin
ehz
2008/12/27T01:17:39.00
44.5112
-110.3742
5.07
-9.99
0.51
5
6
3.2
2008/12/27T04:33:30.00
44.4985
-110.3750
4.24
-9.99
1.63
9
8
0.9
2008/12/27T05:38:22.00
44.4912
-110.3743
4.73
-9.99
0.37
8
8
0.8
final:
det['Date/Time']
orig['Lat']
orig['Lon']
orig['Dep']
det['Mag']

You can merge the two dataframes, since you want to use Detection column from the first data frame and Date/Time column from the second dataframe, you can just rename the column of second dataframe while merging since the column name already exits in the first dataframe:
det.merge(org.rename(columns={'Date/Time': 'Detection'}))
OUTPUT:
Date/Time Mag Detection Lat Lon Dep Ml Mc N Dmin ehz
0 2008/12/27T01:17:39.39 0.485 2008/12/27T01:17:39.00 44.5112 -110.3742 5.07 -9.99 0.51 5 6 3.2
1 2008/12/27T01:33:23.00 -0.080 2008/12/27T01:17:39.00 44.5112 -110.3742 5.07 -9.99 0.51 5 6 3.2
You can then select the columns you want.

Related

How to perform calculation in string column pandas

df[concentration]
'4.0±0.0'
'2.5±0.2'
'5.8±0.1'
'45.0'
'23'
'26.07'
'64'
I want result as:
4.00
2.70
5.90
45.00
23.00
26.07
64.00
which is (4.0+0.0) taking highest possible value.
However my concentration column is not float then how can I perform calculation on such type of data?
Any kind of help will be appreciated.
Thank you.

Use split(regex, expand), convert outcome digits to float and add row wise using lambda.
Data
df=pd.DataFrame({'concentration':['4.0±0.0','2.5±0.2','5.8±0.1','45.0','23','26.07','64']})
Solution
df['concentration']=df.concentration.str.split('±', expand=True).apply(lambda x: x.astype(float)).sum(1)
concentration
0 4.00
1 2.70
2 5.90
3 45.00
4 23.00
5 26.07
6 64.00

Select value from dataframe based on other dataframe

i try to calculate the position of an object based on a timestamp. For this I have two dataframes in pandas. One for the measurement data and one for the position. All the movement is a straightforward acceleration.
Dataframe 1 contains the measurement data:
ms force ... ... ...
1 5 20
2 10 20
3 15 25
4 20 30
5 25 20
..... (~ 6000 lines)
Dataframe 2 contains "positioning data"
ms speed (m/s)
1 0 0.66
2 4500 0.66
3 8000 1.3
4 16000 3.0
5 20000 3.0
.....(~300 lines)
Now I want to calculate the position of the first dataframe with the data from secound dataframe
In Excel I solved the problem by using an array formular but now I have to use Python/Pandas and I cant find a way how to select the correct row from dataframe 2.
My idea is to make something like this: if
In the end I want to display a graph "force <-> way" and not "force <-> time"
Thank you in andvance
==========================================================================
Update:
In the meantime I could almost solve my issue. Now my Data look like this:
Dataframe 2 (Speed Data):
pos v a t t-end t-start
0 -3.000 0.666667 0.000000 4.500000 4.500000 0.000000
1 0.000 0.666667 0.187037 0.071287 4.571287 4.500000
2 0.048 0.680000 0.650794 0.010244 4.581531 4.571287
3 0.055 0.686667 0.205432 0.064904 4.646435 4.581531
...
15 0.055 0.686667 0.5 0.064904 23.0 20.0
...
28 0.055 0.686667 0.6 0.064904 35.0 34.0
...
30 0.055 0.686667 0.9 0.064904 44.0 39.0
And Dataframe 1 (time based measurement):
Fx Fy Fz abs_t expected output ('a' from DF1)
0 -13.9 170.3 45.0 0.005 0.000000
1 -14.1 151.6 38.2 0.010 0.000000
...
200 -14.1 131.4 30.4 20.015 0.5
...
300 -14.3 111.9 21.1 34.01 0.6
...
400 -14.5 95.6 13.2 40.025
So i want to check the time(abs_t) from DF1 and search for the corract 'a' in DF2
So somthing like this (pseudo code):
if (DF1['t_abs'] between (DF2['t-start'], DF2['t-end']):
DF1['a'] = DF2['a']
I could make two for loops but it looks like the wrong way and is very very slow.
I hope you understand my problem; to provide a running sample is very hard.
In Excel I did like this:

I found a very slow solution but atleast its working :(
df1['a'] = 0
for index, row in df2.iterrows():
start = row['t-start']
end = row ['t-end']
a = row ['a']
df1.loc[(df1['tabs']>start)&(df1['tabs']<end), 'a'] = a

Replace 'D' to 'E' within scientific notation for all values in a dataframe

I have an output file from old fortran code, which outputs values as double precision. Therefore any numbers given in scientific notation are in the form 1.23D+4 for example. I'm saving this csv as a pandas dataframe and with to do data analysis.
I'm looking for a way to convert the D to an E within every entry within the dataframe, dedx. I've tried:
for c in dedx.columns:
for i in dedx[c]:
if isinstance(i, str):
i = float(i.replace('D', 'E'))
This changes the value within the loop, as can be seen using print(i) but does not make changes to the actual dataframe.
A sample of the dataframe is shown below:
ENERGY(MEV) DE/DX(MEV/MM) DE/DX(MEV.CM2/MG) RANGE(MM) RANGE(MG/CM2)
0 0.01 4.908059D+01 4.811823D-02 0.000477 0.486766
1 0.50 4.917734D+02 4.821308D-01 0.002121 2.162930
2 1.00 5.261802D+02 5.158630D-01 0.003088 3.149690
3 1.50 5.105083D+02 5.004984D-01 0.004050 4.130490
4 2.00 4.842530D+02 4.747579D-01 0.005054 5.155440
5 2.50 4.568363D+02 4.478788D-01 0.006117 6.239750
6 3.00 4.309473D+02 4.224973D-01 0.007245 7.389450
7 3.50 4.072914D+02 3.993053D-01 0.008438 8.607170
8 4.00 3.859186D+02 3.783516D-01 0.009700 9.894000
9 4.50 3.666619D+02 3.594725D-01 0.011030 11.250200
10 5.00 3.492947D+02 3.424458D-01 0.012427 12.675800
11 5.50 3.335896D+02 3.270486D-01 0.013892 14.170300
12 6.00 3.193387D+02 3.130772D-01 0.015425 15.733200
13 6.50 3.063596D+02 3.003526D-01 0.017024 17.364200
14 7.00 2.944946D+02 2.887202D-01 0.018689 19.062500
15 7.50 2.836086D+02 2.780477D-01 0.020419 20.827500
16 8.00 2.735860D+02 2.682215D-01 0.022215 22.658800
17 8.50 2.643277D+02 2.591448D-01 0.024074 24.555600
18 9.00 2.557488D+02 2.507341D-01 0.025998 26.517400
19 9.50 2.477762D+02 2.429178D-01 0.027984 28.543700
20 10.00 2.403466D+02 2.356339D-01 0.030033 30.633900

You can do this kind of conversion while reading the file with pandas.read_csv rather than looping. It is more efficient.
d_to_e = lambda x : float(x.replace('D', 'E'))
df = pd.read_csv('yourfilename.csv', converters={'DE/DX(MEV/MM)' : d_to_e, 'DE/DX(MEV.CM2/MG)' : d_to_e})
The converters parameter allows you to apply a function to the data of each column. The result is stored in the dataframe. converters takes a dict with column names and functions to be applied to each column data.
I defined the function d_to_e which does the letter replacement and return a float as you did in your loop.

Try this one (to replace your whole code):
dedx[c]=dedx[c].apply(lambda x: float(x.replace("D", "E")) if isinstance(x, str))
(assuming dedx[c] is reference to dataframe column, that you want to modify)

How to get the Average of a specific category via Python

I was wondering how I could calculate the average of a specific category via Python? I have a csv file called demo.csv
import pandas as pd
import numpy as np
#loading the data into data frame
X = pd.read_csv('demo.csv')
the two columns of interest are the Category and Totals column:
Category Totals estimates
2 2777 0.43
4 1003 0.26
4 3473 0.65
4 2638 0.17
1 2855 0.74
0 2196 0.13
0 2630 0.91
2 2714 0.39
3 2472 0.51
0 1090 0.12
I'm interested in finding the average for the Totals corresponding with Category 2. I know how to do this on excel, I would just filter to only show category 2 and get the average(which ends up being 2745.5) but how would I code this via Python?

You can restrict your dataframe to the subset of the rows you want(Category=2), followed by taking mean of the columns corresponding to Totals column as follows:
df[df['Category'] == 2]['Totals'].mean()
2745.5

I'm interested in finding the average for the Totals corresponding with Category 2
You may set the category as the index then calculate the mean for any category using the .loc or .ix indexers:
df.set_index('Category').loc['2', 'Totals'].mean()
=> 2745.50
df.set_index('Category').ix['2', 'Totals'].mean()
=> 2745.50
The same can be achieved by using groupby
df.groupby('Category').Totals.mean().loc['2']
=> 2745.50
Note I'm assuming Category is a string.

Appending data row from one dataframe to another with respect to date

I am brand new to pandas and working with two dataframes. My goal is to append the non-date values of df_ls (below) column-wise to their nearest respective date in df_1. Is the only way to do this with a traditional for-loop or is their some more effective built-in method/function. I have googled this extensively without any luck and have only found ways to append blocks of dataframes to other dataframes. I haven't found a way to search through a dataframe and append a row in another dataframe at the nearest respective date. See example below:
Example of first dataframe (lets call it df_ls):
DATE ALBEDO_SUR B13_RATIO B23_RATIO B1_RAW B2_RAW
0 1999-07-04 0.070771 1.606958 1.292280 0.128069 0.103018
1 1999-07-20 0.030795 2.326290 1.728147 0.099020 0.073595
2 1999-08-21 0.022819 2.492871 1.762536 0.096888 0.068502
3 1999-09-06 0.014613 2.792271 1.894225 0.090590 0.061445
4 1999-10-08 0.004978 2.781847 1.790768 0.089291 0.057521
5 1999-10-24 0.003144 2.818474 1.805257 0.090623 0.058054
6 1999-11-09 0.000859 3.146100 1.993941 0.092787 0.058823
7 1999-12-11 0.000912 2.913604 1.656642 0.097239 0.055357
8 1999-12-27 0.000877 2.974692 1.799949 0.098282 0.059427
9 2000-01-28 0.000758 3.092533 1.782112 0.095153 0.054809
10 2000-03-16 0.002933 2.969185 1.727465 0.083059 0.048322
11 2000-04-01 0.016814 2.366437 1.514110 0.089720 0.057398
12 2000-05-03 0.047370 1.847763 1.401930 0.109767 0.083290
13 2000-05-19 0.089432 1.402798 1.178798 0.137965 0.115936
14 2000-06-04 0.056340 1.807828 1.422489 0.118601 0.093328
Example of second dataframe (let's call it df_1)
Sample Date Value
0 2000-05-09 1.68
1 2000-05-09 1.68
2 2000-05-18 1.75
3 2000-05-18 1.75
4 2000-05-31 1.40
5 2000-05-31 1.40
6 2000-06-13 1.07
7 2000-06-13 1.07
8 2000-06-27 1.49
9 2000-06-27 1.49
10 2000-07-11 2.29
11 2000-07-11 2.29
In the end, my goal is to have something like this (Note the appended values are values closest to the Sample Date, even though they dont match up perfectly):
Sample Date Value ALBEDO_SUR B13_RATIO B23_RATIO B1_RAW B2_RAW
0 2000-05-09 1.68 0.047370 1.847763 1.401930 0.109767 0.083290
1 2000-05-09 1.68 0.047370 1.847763 1.401930 0.109767 0.083290
2 2000-05-18 1.75 0.089432 1.402798 1.178798 0.137965 0.115936
3 2000-05-18 1.75 0.089432 1.402798 1.178798 0.137965 0.115936
4 2000-05-31 1.40 0.056340 1.807828 1.422489 0.118601 0.093328
5 2000-05-31 1.40 0.056340 1.807828 1.422489 0.118601 0.093328
6 2000-06-13 1.07 ETC.... ETC.... ETC ...
7 2000-06-13 1.07
8 2000-06-27 1.49
9 2000-06-27 1.49
10 2000-07-11 2.29
11 2000-07-11 2.29
Thanks for any and all help. As I said I am new to this and I have experience with this sort of thing in MATLAB but PANDAS is a new to me.
Thanks

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Dataframe Comparison and Copying - python

Related

How to perform calculation in string column pandas

Select value from dataframe based on other dataframe

Replace 'D' to 'E' within scientific notation for all values in a dataframe

How to get the Average of a specific category via Python

Appending data row from one dataframe to another with respect to date

Categories

Resources