Data sorting from Excel sheet - python

i tried to sort the values of particular row in data frame, the values are sorting but index values are not changing....i want to change the index values also according to the sorted data
rld=pd.read_excel(r"C:\Users\DELL\nagrajun sagar reservoir data - Copy.xlsx")
rl = rld.iloc[:,1].sort_values()
rl
output:
15 0.043
3 0.370
17 0.391
2 0.823
16 1.105
1 1.579
0 2.070
12 2.235
4 2.728
18 4.490
9 4.905
13 5.036
14 5.074
11 6.481
10 6.613
6 6.806
7 6.807
8 6.824
5 6.841
Name: 2 October, dtype: float64
rl[0]
output:
2.07
I expected rl[0] as 0.043 but actual result is 2.07 which is index value of before sorted list...

I suppose you can try reset_index() with (drop=True)
Something like rl=rl.reset_index(drop=True) in your case or you can do it while sorting like:
rl = rld.iloc[:,1].sort_values().reset_index(drop=True)

Related

Determining average values over irregular number of rows in a csv file

I have a csv file with days of the year in one column and temperature in another. The days are split into sections and I want to find the average temperature over each day.Eg day 0,1,2,3 etc
The measurements of temperatures has been taken irregularly meaning there are different numbers of measurements at certain times for each day.
Typically I would use df.groupby(np.arange(len(df)) // n).mean() but n, the number of rows will be varying in this case.
I have an example of what the data is like.
Days
Temp
0.75
19
0.8
18
1.2
18
1.25
18
1.75
19
3.05
18
3.55
21
3.60
21
3.9
18
4.5
20
You could convert Days to an integer and use that to group.
>>> df.groupby(df["Days"].astype(int)).mean()
Days Temp
Days
0 0.775 18.500000
1 1.400 18.333333
3 3.525 19.500000
4 4.500 20.000000

Selecting data in pandas based on conditions

I am importing data from excel using Pandas and it looks like below,
time Column1 Column2 Column3 ID
0 1.0 181.359 -1.207 9.734 10
1 2.0 181.357 -1.179 9.729 10
2 3.0 181.357 -0.713 9.732 10
3 602.0 179.148 505.520 17.774 1810
4 603.0 179.153 506.824 17.765 1810
5 604.0 179.128 506.169 17.773 1810
6 605.0 179.129 504.141 17.776 1810
7 606.0 179.165 505.214 17.774 1810
8 3003.0 180.032 278.810 17.748 2010
9 3004.0 180.025 279.382 17.749 2010
10 16955.0 450.377 7.271 17.710 4510
11 16956.0 450.375 6.806 17.720 4510
12 16957.0 450.368 7.428 17.710 4510
13 16958.0 450.372 7.892 17.723 4510
14 16959.0 450.359 8.085 17.714 4510
I want to pick up values from the Column1, 2 & 3 based on certain value of ID.
For example, if I give ID=1810 I should get values from Column1, 2 & 3 corresponding to 1810 (row 3 to 7).
I am using numpy.where function to get the correct row number
a = np.where(data['ID'] == 1810)
but could not find out how to select Column data based on that. Thank you in advance for help!
Use pandas.DataFrame.loc: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
df.loc[df['ID'] == 1810][['Column1', 'Column2', 'Column3']]

fastest way to access dataframe cell by colums values?

I have the following dataframe :
time bk1_lvl0_id bk2_lvl0_id pr_ss order_upto_level initial_inventory leadtime1 leadtime2 adjusted_leadtime
0 2020 1000 3 16 18 17 3 0.100000 1
1 2020 10043 3 65 78 72 12 0.400000 1
2 2020 1005 3 0 1 1 9 0.300000 1
3 2020 1009 3 325 363 344 21 0.700000 1
4 2020 102 3 0 1 1 7 0.233333 1
I want a function to get the pr_ss for example for (bk1_lvl0_id=1000,bk2_lvl0_id=3).
that's the code i've tried but it takes time :
def get_safety_stock(df,bk1,bk2):
##a function that returns the safety stock for any given (bk1,bk2)
for index,row in df.iterrows():
if (row["bk1_lvl0_id"]==bk1) and (row["bk2_lvl0_id"]==bk2):
return int(row["pr_ss"])
break
If your dataframe has no duplicate values based on bk1_lvl0_id and bk2_lvl0_id, You can make function as follows:
def get_safety_stock(df,bk1,bk2):
return df.loc[df.bk1_lvl0_id.eq(bk1) & df.bk2_lvl0_id.eq(bk2), 'pr_ss'][0]
Note that its accessing the first value in the Series which shouldnt be an issue if there are no duplicates in data. If you want all of them, just remove the [0] from the end and it should give you the whole series. This can be called as follows:
get_safety_stock(df, 1000,3)
>>>16

How to compare value in Pandas DataFrame against a value in the previous row AND the previous column?

I have a dataframe consisting of two columns filled with float values. I need to calculate all the values of 'h' minus all the values of 'c', at the index previous to the current 'h' value.
So for instance, for 'h' in row 1, I need to calculate 1.17322 - 1.17285 (the value of 'c' in the previous row)
I have tried several different methods to accomplish this, including the use of: .iloc, .shift(), .groupby(), and .diff(), but I cannot get exactly what I'm looking for.
If anybody could help, it would be greatly appreciated
c h
0 1.17285 1.17310
1 1.17287 1.17322
2 1.17298 1.17340
3 1.17346 1.17348
4 1.17478 1.17511
5 1.17595 1.17700
6 1.17508 1.17633
7 1.17474 1.17545
8 1.17463 1.17546
9 1.17224 1.17468
10 1.17437 1.17456
11 1.17552 1.17641
12 1.17750 1.17784
13 1.17694 1.17770
Try this using shift, for as an example:
df['c_shift'] = df['c'].shift()
df['diff'] = df['h'] - df['c_shift']
print(df)
Output:
c h c_shift diff
0 1.17285 1.17310 NaN NaN
1 1.17287 1.17322 1.17285 0.00037
2 1.17298 1.17340 1.17287 0.00053
3 1.17346 1.17348 1.17298 0.00050
4 1.17478 1.17511 1.17346 0.00165
5 1.17595 1.17700 1.17478 0.00222
6 1.17508 1.17633 1.17595 0.00038
7 1.17474 1.17545 1.17508 0.00037
8 1.17463 1.17546 1.17474 0.00072
9 1.17224 1.17468 1.17463 0.00005
10 1.17437 1.17456 1.17224 0.00232
11 1.17552 1.17641 1.17437 0.00204
12 1.17750 1.17784 1.17552 0.00232
13 1.17694 1.17770 1.17750 0.00020
Of course, you can do this in one step:
df['diff'] = df['h'] - df['c'].shift()

Appending data row from one dataframe to another with respect to date

I am brand new to pandas and working with two dataframes. My goal is to append the non-date values of df_ls (below) column-wise to their nearest respective date in df_1. Is the only way to do this with a traditional for-loop or is their some more effective built-in method/function. I have googled this extensively without any luck and have only found ways to append blocks of dataframes to other dataframes. I haven't found a way to search through a dataframe and append a row in another dataframe at the nearest respective date. See example below:
Example of first dataframe (lets call it df_ls):
DATE ALBEDO_SUR B13_RATIO B23_RATIO B1_RAW B2_RAW
0 1999-07-04 0.070771 1.606958 1.292280 0.128069 0.103018
1 1999-07-20 0.030795 2.326290 1.728147 0.099020 0.073595
2 1999-08-21 0.022819 2.492871 1.762536 0.096888 0.068502
3 1999-09-06 0.014613 2.792271 1.894225 0.090590 0.061445
4 1999-10-08 0.004978 2.781847 1.790768 0.089291 0.057521
5 1999-10-24 0.003144 2.818474 1.805257 0.090623 0.058054
6 1999-11-09 0.000859 3.146100 1.993941 0.092787 0.058823
7 1999-12-11 0.000912 2.913604 1.656642 0.097239 0.055357
8 1999-12-27 0.000877 2.974692 1.799949 0.098282 0.059427
9 2000-01-28 0.000758 3.092533 1.782112 0.095153 0.054809
10 2000-03-16 0.002933 2.969185 1.727465 0.083059 0.048322
11 2000-04-01 0.016814 2.366437 1.514110 0.089720 0.057398
12 2000-05-03 0.047370 1.847763 1.401930 0.109767 0.083290
13 2000-05-19 0.089432 1.402798 1.178798 0.137965 0.115936
14 2000-06-04 0.056340 1.807828 1.422489 0.118601 0.093328
Example of second dataframe (let's call it df_1)
Sample Date Value
0 2000-05-09 1.68
1 2000-05-09 1.68
2 2000-05-18 1.75
3 2000-05-18 1.75
4 2000-05-31 1.40
5 2000-05-31 1.40
6 2000-06-13 1.07
7 2000-06-13 1.07
8 2000-06-27 1.49
9 2000-06-27 1.49
10 2000-07-11 2.29
11 2000-07-11 2.29
In the end, my goal is to have something like this (Note the appended values are values closest to the Sample Date, even though they dont match up perfectly):
Sample Date Value ALBEDO_SUR B13_RATIO B23_RATIO B1_RAW B2_RAW
0 2000-05-09 1.68 0.047370 1.847763 1.401930 0.109767 0.083290
1 2000-05-09 1.68 0.047370 1.847763 1.401930 0.109767 0.083290
2 2000-05-18 1.75 0.089432 1.402798 1.178798 0.137965 0.115936
3 2000-05-18 1.75 0.089432 1.402798 1.178798 0.137965 0.115936
4 2000-05-31 1.40 0.056340 1.807828 1.422489 0.118601 0.093328
5 2000-05-31 1.40 0.056340 1.807828 1.422489 0.118601 0.093328
6 2000-06-13 1.07 ETC.... ETC.... ETC ...
7 2000-06-13 1.07
8 2000-06-27 1.49
9 2000-06-27 1.49
10 2000-07-11 2.29
11 2000-07-11 2.29
Thanks for any and all help. As I said I am new to this and I have experience with this sort of thing in MATLAB but PANDAS is a new to me.
Thanks

Categories

Resources