How to apply my function to the first row of a dataframe? - python

def calcScore(p):
if p[0] > p[1]:
x = 3
y = 0
elif p[0] == p[1]:
x = 1
y = 1
else:
x = 0
y = 3
return x,y
How would I apply this function to the first row of my dataframe?
I know how to apply it to the whole dataframe but can't seem to apply it to the first row only? Below is what I did with the whole dataframe. I am new to python so please forgive silly or stupid mistakes. Thank you. :)
result =(prem[['FTHG','FTAG']].apply(calcScore, axis = 1))
print(result)

apply is for applying a function to all rows or columns. If you just want one you can just do:
result = calcScore(perm.iloc[0, ['FHG', 'FtAG']])

Related

Apply custom function to entire dataframe

I have a function which call another one.
The objective is, by calling function get_substr to extract a substring based on a position of the nth occurence of a character
def find_nth(string, char, n):
start = string.find(char)
while start >= 0 and n > 1:
start = string.find(char, start+len(char))
n -= 1
return start
def get_substr(string,char,n):
if n == 1:
return string[0:find_nth(string,char,n)]
else:
return string[find_nth(string,char,n-1)+len(char):find_nth(string,char,n)]
The function works.
Now I want to apply it on a dataframe by doing this.
df_g['F'] = df_g.apply(lambda x: get_substr(x['EQ'],'-',1))
I get on error:
KeyError: 'EQ'
I don't understand it as df_g['EQ'] exists.
Can you help me?
Thanks
You forgot about axis=1, without that function is applied to each column rather than each row. Consider simple example
import pandas as pd
df = pd.DataFrame({'A':[1,2],'B':[3,4]})
df['Z'] = df.apply(lambda x:x['A']*100,axis=1)
print(df)
output
A B Z
0 1 3 100
1 2 4 200
As side note if you are working with value from single column you might use pandas.Series.apply rather than pandas.DataFrame.apply, in above example it would mean
df['Z'] = df['A'].apply(lambda x:x*100)
in place of
df['Z'] = df.apply(lambda x:x['A']*100,axis=1)

Python List in List trouble

Before start to tell my problem sorry for my grammar and English is not very good. I'm a Python learner. Today i was working on a project but I have a trouble. I'm trying to make a loop.
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
Here is my list, I'm trying to create a loop. That loop will substract every first values from each others and every seconds to seconds then print. Let me explain my trouble more simple. [[x,y][x1,y1][x2,y2] I need to substract x1-x then print the result after this x2-x1 then print the result but same time y1-y print then print so console output should looks like this;
1,1
1,2
2,1...
Method i've tried
while True:
for x,y in coordinates:
x = x - y
print(x)
This is not worked because it substracts x values to y values. I know it's too wrong.
I've research on internet but i did not understand this subject very well.
I'm looking for help. Thanks everyone.
A simple and naive implementation
def pr(arr):
i = 1
while i < len(arr):
(x,y) = arr[i]
(a,b) = arr[i-1]
print(x-a, y-b)
i += 1
if __name__ == '__main__':
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7],[1,2]]
pr(coordinates)
O/P:
1 1
1 2
2 1
2 1
-6 -5
This is fairly similar to your original code:
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
x_prev = None
for x, y in coordinates:
if x_prev is not None:
print('{}, {}'.format(x - x_prev, y - y_prev)
x_prev, y_prev = x, y
If you want to generalize a bit, for different lengths of coordinates, you could do this:
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7`],[1,2]]
prev = None
for c in coordinates:
if prev is not None:
print(', '.join(c2-c1 for c1, c2 in zip(prev, c)))
prev = c
You need to iterate over the list using range function so that you can get current and next ones together. So you can do the subtraction in the loop.
coordinates = [[1,2],[2,3],[3,5],[5,6],[7,7],[1,2]]
for i in range(len(coordinates) - 1):
print(coordinates[i+1][0] - coordinates[i][0], coordinates[i+1][1] - coordinates[i][1])

using previous row value by looping through index conditioning

If i have dataframe with column x.
I want to make a new column x_new but I want the first row of this new column to be set to a specific number (let say -2).
Then from 2nd row, use the previous row to iterate through the cx function
data = {'x':[1,2,3,4,5]}
df=pd.DataFrame(data)
def cx(x):
if df.loc[1,'x_new']==0:
df.loc[1,'x_new']= -2
else:
x_new = -10*x + 2
return x_new
df['x_new']=(cx(df['x']))
The final dataframe
I am not sure on how to do this.
Thank you for your help
This is what i have so far:
data = {'depth':[1,2,3,4,5]}
df=pd.DataFrame(data)
df
# calculate equation
def depth_cal(d):
z = -3*d+1 #d must be previous row
return z
depth_cal=(depth_cal(df['depth'])) # how to set d as previous row
print (depth_cal)
depth_new =[]
for row in df['depth']:
if row == 1:
depth_new.append('-5.63')
else:
depth_new.append(depth_cal) #Does not put list in a column
df['Depth_correct']= depth_new
correct output:
There is still two problem with this:
1. it does not put the depth_cal list properly in column
2. in the depth_cal function, i want d to be the previous row
Thank you
I would do this by just using a loop to generate your new data - might not be ideal if particularly huge but it's a quick operation. Let me know how you get on with this:
data = {'depth':[1,2,3,4,5]}
df=pd.DataFrame(data)
res = data['depth']
res[0] = -5.63
for i in range(1, len(res)):
res[i] = -3 * res[i-1] + 1
df['new_depth'] = res
print(df)
To get
depth new_depth
0 1 -5.63
1 2 17.89
2 3 -52.67
3 4 159.01
4 5 -476.03

Creation a tridiagonal block matrix in python [duplicate]

This question already has answers here:
Block tridiagonal matrix python
(9 answers)
Closed 5 years ago.
How can I create this matrix using python ?
I've already created S , T , X ,W ,Y and Z as well as the first and the last line of L.
Something like this (it's a draft!). Make a class that stores 3 lists (diagonal, upper diagonal and lower diagonal) and expose a way of editing those values.
class TBMatrix:
def __init__(self,size):
self._size = size #Has to be square I guess?
self._diagonal = [None for i in range(0,size)]
self._upper_diagonal = [None for i in range(0,size - 2)]
self._lower_diagonal = [None for i in range(0,size - 2)]
def get(self,row,col):
if row == col:
return self._diagonal[row]
if row == col - 1:
return self._lower_diagonal[col]
if row == col + 1:
return self._upper_diagonal[row]
return 0 #or None, if you want a matrix that contains objects
def set(self,row,col,value):
if row == col:
self._diagonal[row] = value
elif row == col - 1:
self._lower_diagonal[col] = value
elif row == col + 1:
self._upper_diagonal[row] = value
else:
#No effect, maybe you want to throw an exception?
pass
This is a quick draft, you'll need to do a bunch of checks to make sure there is no trying to assign an index outside of the list sizes. But this should get you started.
Another alternative is to override the __getitem__ and __setitem__ to return a row full of 0's or None except where it need to hold a spot for self._diagonal, self._upper_diagonal and self._lower_diagonal. But it just seems more complicated.

Pandas every nth row

Dataframe.resample() works only with timeseries data. I cannot find a way of getting every nth row from non-timeseries data. What is the best method?
I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:
df.iloc[::5, :]
Though #chrisb's accepted answer does answer the question, I would like to add to it the following.
A simple method I use to get the nth data or drop the nth row is the following:
df1 = df[df.index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.index % 3 == 0] # Selects every 3rd raw starting from 0
This arithmetic based sampling has the ability to enable even more complex row-selections.
This assumes, of course, that you have an index column of ordered, consecutive, integers starting at 0.
There is an even simpler solution to the accepted answer that involves directly invoking df.__getitem__.
df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df
a b c
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
For example, to get every 2 rows, you can do
df[::2]
a b c
0 x x x
2 x x x
4 x x x
There's also GroupBy.first/GroupBy.head, you group on the index:
df.index // 2
# Int64Index([0, 0, 1, 1, 2], dtype='int64')
df.groupby(df.index // 2).first()
# Alternatively,
# df.groupby(df.index // 2).head(1)
a b c
0 x x x
1 x x x
2 x x x
The index is floor-divved by the stride (2, in this case). If the index is non-numeric, instead do
# df.groupby(np.arange(len(df)) // 2).first()
df.groupby(pd.RangeIndex(len(df)) // 2).first()
a b c
0 x x x
1 x x x
2 x x x
Adding reset_index() to metastableB's answer allows you to only need to assume that the rows are ordered and consecutive.
df1 = df[df.reset_index().index % 3 != 0] # Excludes every 3rd row starting from 0
df2 = df[df.reset_index().index % 3 == 0] # Selects every 3rd row starting from 0
df.reset_index().index will create an index that starts at 0 and increments by 1, allowing you to use the modulo easily.
I had a similar requirement, but I wanted the n'th item in a particular group. This is how I solved it.
groups = data.groupby(['group_key'])
selection = groups['index_col'].apply(lambda x: x % 3 == 0)
subset = data[selection]
A solution I came up with when using the index was not viable ( possibly the multi-Gig .csv was too large, or I missed some technique that would allow me to reindex without crashing ).
Walk through one row at a time and add the nth row to a new dataframe.
import pandas as pd
from csv import DictReader
def make_downsampled_df(filename, interval):
with open(filename, 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
column_names = csv_dict_reader.fieldnames
df = pd.DataFrame(columns=column_names)
for index, row in enumerate(csv_dict_reader):
if index % interval == 0:
print(str(row))
df = df.append(row, ignore_index=True)
return df
df.drop(labels=df[df.index % 3 != 0].index, axis=0) # every 3rd row (mod 3)

Categories

Resources