Adding value to each row - python

I have a pandas dataframe and for each row (column value) I would like to add +5. Meaning that I would leave the original numbers and add 5 to each.
Dataframe:
import pandas as pd
info= {"Num":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
data = pd.DataFrame(info)
print("Original Data frame:\n")
print(data)
Output:
Original Data frame:
Num NAME
0 12 John
1 14 Camili
2 13 Rheana
3 12 Joseph
4 14 Amanti
5 13 Alexa
6 15 Siri
Desired output:
Num NAME
0 17 John
1 19 Camili
2 18 Rheana
3 17 Joseph
4 19 Amanti
5 18 Alexa
6 20 Siri
Attempt to solve:
for i,e in enumerate(data['Num']):
data.at[i,'Num']= +5
output:
data
Out[391]:
Num NAME
0 5 John
1 5 Camili
2 5 Rheana
3 5 Joseph
4 5 Amanti
5 5 Alexa
6 5 Siri
Would appreciate an example with a for loop

You need simple
data['Num'] += 5
without for-loop

import pandas as pd
info= {"Num":[12,14,13,12,14,13,15], "NAME":['John','Camili','Rheana','Joseph','Amanti','Alexa','Siri']}
data = pd.DataFrame(info)
Answer:
for index in range(len(data)):
data['Num'].iloc[index] += 5
Output:
data
Out[617]:
Num NAME
0 17 John
1 19 Camili
2 18 Rheana
3 17 Joseph
4 19 Amanti
5 18 Alexa
6 20 Siri

Related

select specific rows from a large data frame

I have a data frame with 790 rows. I want to create a new data frame that excludes rows from 300 to 400 and leave the rest.
I tried:
df.loc[[:300, 400:]]
df.iloc[[:300, 400:]]
df_new=df.drop(labels=range([300:400]),
axis=0)
This does not work. How can I achieve this goal?
Thanks in advance
Use range or numpy.r_ for join indices:
df_new=df.drop(range(300,400))
df_new=df.iloc[np.r_[0:300, 400:len(df)]]
Sample:
df = pd.DataFrame({'a':range(20)})
# print (df)
df1 = df.drop(labels=range(7,15))
print (df1)
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
15 15
16 16
17 17
18 18
19 19
df1 = df.iloc[np.r_[0:7, 15:len(df)]]
print (df1)
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
15 15
16 16
17 17
18 18
19 19
First select index you want to drop and then create a new df
i = df.iloc[299:400].index
new_df = df.drop(i)

How can I split pandas dataframe into groups of peaks

I have a dataset in an excel file I'm trying to analyse.
Example data:
Time in s Displacement in mm Force in N
0 0 Not Relevant
1 1 Not Relevant
2 2 Not Relevant
3 3 Not Relevant
4 2 Not Relevant
5 1 Not Relevant
6 0 Not Relevant
7 2 Not Relevant
8 3 Not Relevant
9 4 Not Relevant
10 5 Not Relevant
11 6 Not Relevant
12 5 Not Relevant
13 4 Not Relevant
14 3 Not Relevant
15 2 Not Relevant
16 1 Not Relevant
17 0 Not Relevant
18 4 Not Relevant
19 5 Not Relevant
20 6 Not Relevant
21 7 Not Relevant
22 6 Not Relevant
23 5 Not Relevant
24 4 Not Relevant
24 0 Not Relevant
Imported from an xls file and then plotting a graph of time vs displacement:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel(
'DATA.xls',
engine='xlrd', usecols=['Time in s', 'Displacement in mm', 'Force in N'])
fig, ax = plt.subplots()
ax.plot(df['Time in s'], df['Displacement in mm'])
ax.set(xlabel='Time (s)', ylabel='Disp',
title='time disp')
ax.grid()
fig.savefig("time_disp.png")
plt.show()
I'd like to split the data into multiple groups to analyse separately.
So if I plot displacement against time, I get a sawtooth as a sample is being cyclically loaded.
I'd like to split the data so that each "tooth" is its own group or dataset so I can analyse each cycle
Can anyone help?
you can create a column group with a value changing at each local minimum. First get True at a local minimum and use two diff once forward and once backward. Then use cumsum to increase the group number each time a local minimum is.
df['gr'] = (~(df['Deplacement'].diff(1)>0)
& ~(df['Deplacement'].diff(-1)>0)).cumsum()
print(df)
Time Deplacement gr
0 0 0 1
1 1 1 1
2 2 2 1
3 3 3 1
4 4 2 1
5 5 1 1
6 6 0 2
7 7 2 2
8 8 3 2
9 9 4 2
10 10 5 2
11 11 6 2
12 12 5 2
13 13 4 2
14 14 3 2
15 15 2 2
16 16 1 2
17 17 0 3
18 18 4 3
19 19 5 3
you can split the data by selecting each group individually, or you could do something with a loop and do anything you want in each loop.
s = (~(df['Deplacement'].diff(1)>0)
& ~(df['Deplacement'].diff(-1)>0)).cumsum()
for _, dfg in df.groupby(s):
print(dfg)
# analyze as needed
Edit: in the case of the data in your question with 0 as a minimum, then doing df['gr'] = df['Deplacement'].eq(0).cumsum() would work as well, but it is specific to minimum being exactly 0

Disproportionate stratified sampling in Pandas

How can I randomly select one row from each group (column Name) in the following dataframe:
Distance Name Time Order
1 16 John 5 0
4 31 John 9 1
0 23 Kate 3 0
3 15 Kate 7 1
2 32 Peter 2 0
5 26 Peter 4 1
Expected result:
Distance Name Time Order
4 31 John 9 1
0 23 Kate 3 0
2 32 Peter 2 0
you can use a groupby on Name col and apply sample
df.groupby('Name',as_index=False).apply(lambda x:x.sample()).reset_index(drop=True)
Distance Name Time Order
0 31 John 9 1
1 15 Kate 7 1
2 32 Peter 2 0
You can shuffle all samples using, for example, the numpy function random.permutation. Then groupby by Name and take N first rows from each group:
df.iloc[np.random.permutation(len(df))].groupby('Name').head(1)
you can achive that using unique
df['Name'].unique()
Shuffle the dataframe:
df.sample(frac=1)
And then drop duplicated rows:
df.drop_duplicates(subset=['Name'])
df.drop_duplicates(subset='Name')
Distance Name Time Order
1 16 John 5 0
0 23 Kate 3 0
2 32 Peter 2 0
This should help, but this not random choice, it keeps the first
How about using random
like this,
Import your provided data,
df=pd.read_csv('random_data.csv', header=0)
which looks like this,
Distance Name Time Order
1 16 John 5 0
4 3 John 9 1
0 23 Kate 3 0
3 15 Kate 7 1
then get a random column name,
colname = df.columns[random.randint(1, 3)]
and below it selected 'Name',
print(df[colname])
1 John
4 John
0 Kate
3 Kate
Name: Name, dtype: object
Of course I could have condensed this to,
print(df[df.columns[random.randint(1, 3)]])

Split a Pandas DataFrame where one factor column is evenly distributed among the splits

I'm trying to split a Pandas DataFrame into multiple separate DataFrames where one of the columns is evenly distributed among the resulting DataFrames. For example, if I wanted the following DataFrame split into 3 distinct DataFrames where each one contains one record of each sector (selected at random).
So a df that looks like this:
id Name Sector
1 John A
2 Steven A
3 Jane A
4 Kyle A
5 Ashley B
6 Ken B
7 Tom B
8 Peter B
9 Elaine C
10 Tom C
11 Adam C
12 Simon C
13 Stephanie D
14 Jan D
15 Marsha D
16 David D
17 Drew E
18 Kit E
19 Corey E
20 James E
Would yield two DataFrames, one of which could look like this, while the other consist of the remaining records.
id Name Sector
1 John A
2 Steven A
7 Tom B
8 Peter B
10 Tom C
11 Adam C
13 Stephanie D
16 David D
19 Corey E
20 James E
I know np.array_split(df, 2) will get me part way there, but it may not evenly distribute the sectors like I need.
(Edited for clarity)
Update per comments and updated question:
df_1=df.groupby('Sector', as_index=False, group_keys=False).apply(lambda x: x.sample(n=2))
df_2 = df[~df.index.isin(df_1.index)]
print(df_1)
id Name Sector
2 3 Jane A
3 4 Kyle A
7 8 Peter B
5 6 Ken B
11 12 Simon C
9 10 Tom C
12 13 Stephanie D
15 16 David D
19 20 James E
17 18 Kit E
print(df_2)
id Name Sector
0 1 John A
1 2 Steven A
4 5 Ashley B
6 7 Tom B
8 9 Elaine C
10 11 Adam C
13 14 Jan D
14 15 Marsha D
16 17 Drew E
18 19 Corey E
Here is a "funky" method, using sequential numbering and random sampling:
df['grp'] = df.groupby('Sector')['Sector']\
.transform(lambda x: x.notna().cumsum().sample(frac=1))
dd = dict(tuple(df.groupby('grp')))
Output:
dd[1]
id Name Sector grp
0 1 John A 1
4 5 Ken B 1
6 7 Elaine C 1
dd[2]
id Name Sector grp
2 3 Jane A 2
5 6 Tom B 2
7 8 Tom C 2
dd[3]
id Name Sector grp
1 2 Steven A 3
3 4 Ashley B 3
8 9 Adam C 3
Details:
Create a sequence of numbers in each sector group starting from 1,
then randomize than number in the group to create a grouping key,
grp.
Use grp to groupby then create a dictionary, with keys for each grp.
Here's my way, you can groupbyby sector and randomly select from each group with a loop using the sample function:
for x, i in df.groupby('Sector'):
print(i.sample())
If you need multiple random selection use the sample function to specify how many items you want. For example:
for x, i in df.groupby('Sector'):
print(i.sample(2))
will return 2 random values from each group.

I want to get the relative index of a column in a pandas dataframe

I want to make a new column of the 5 day return for a stock, let's say. I am using pandas dataframe. I computed a moving average using the rolling_mean function, but I'm not sure how to reference lines like i would in a spreadsheet (B6-B1) for example. Does anyone know how I can do this index reference and subtraction?
sample data frame:
day price 5-day-return
1 10 -
2 11 -
3 15 -
4 14 -
5 12 -
6 18 i want to find this ((day 5 price) -(day 1 price) )
7 20 then continue this down the list
8 19
9 21
10 22
Are you wanting this:
In [10]:
df['5-day-return'] = (df['price'] - df['price'].shift(5)).fillna(0)
df
Out[10]:
day price 5-day-return
0 1 10 0
1 2 11 0
2 3 15 0
3 4 14 0
4 5 12 0
5 6 18 8
6 7 20 9
7 8 19 4
8 9 21 7
9 10 22 10
shift returns the row at a specific offset, we use this to subtract this from the current row. fillna fills the NaN values which will occur prior to the first valid calculation.

Categories

Resources