I need to create a new column at the end of a data frame, where the values in that new column are the result of applying some function who's parameters are based on other columns. Specifically, from another column, but a different row. So for example, if my data frame had two columns, containing values x_i, y_i respectively, my third column would be f(x_(i-1), y_(i-1))
I know that to create create a new column, the easiest way would be to do something like
df['new_row'] = ...
But I'm not sure what I can set to that.
How do I do this?
Something like this? Or is your function more complicated?
print(df)
0 1 2 3
0 1 2 3 4
df[4]= df[2]*df[3]/.3
print(df)
0 1 2 3 4
0 1 2 3 4 40
Here's an example:
df['new_col'] = df['old_col'] * df['old_col']
Or if you wrote a custom function that took in two arrays, such as:
def f(arr1, arr2):
new_arr = # put logic here
return newer
You could try:
df['new_col'] = f(df['old_col'], df['old_col2'])
1 3
3 3
43 4
2 3
with open("file", 'r') as f:
for line in f:
n, r = line.split()
formula = pow(int(n),int(r))
print("{:4}{:4}{:9}".format(n,r, formula))
output
1 3 1
3 3 27
43 4 3418801
2 3 8
Related
I have a dataframe and a function that I want to apply elementwise.
Of course I can iterate over the dataframe but this is very slow and I want to find a quicker way.
My dataframe:
A B C D pattern
4 4 5 6 0
6 4 1 2 0
5 2 2 1 0
5 6 7 9 0
My function takes three rows as an input and returns a value that I want to store in the pattern column of current_row.
def findPattern(previous_row, current_row, next_row):
.....
return "pattern"
How can I apply this function to my dataframe without iterating over it with a for loop?
Thanks for any help :)
I have a dataframe table which looks like the following
How do I reconstruct this table and sum the rows where length is under 2. so the output df looks like the following
please any suggestion would be greatly appreciated.
thanks
Shei
Add a new column to indicate the length group:
df['len_group'] = df['length'].astype(str)
df.loc[df['length']<=2, 'len_group'] = '<=2'
Then groupby on the new column:
df.groupby('len_group')
Test Case:
df = pd.DataFrame({'length':[1,2,3,4,5], 'val':[2,3,4,5,6]})
length val
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
df['len_group'] = df['length'].astype(str)
df.loc[df['length']<=2, 'len_group'] = '<=2'
df_result = df.groupby('len_group')[['val']].sum()
len_group val
3 4
4 5
5 6
<=2 5
I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men
You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6
If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())
So I have an extremely simple dataframe:
values
1
1
1
2
2
I want to add a new column and for each row assign the sum of it's unique occurences, so the table would look like:
values unique_sum
1 3
1 3
1 3
2 2
2 2
I have seen some examples in R, but for python and pandas I have not come across anything and am stuck. I can list the value counts using .value_counts() and I have tried groupbyroutines but cannot fathom it.
Just use map to map your column onto its value_counts:
>>> x
A
0 1
1 1
2 1
3 2
4 2
>>> x['unique'] = x.A.map(x.A.value_counts())
>>> x
A unique
0 1 3
1 1 3
2 1 3
3 2 2
4 2 2
(I named the column A instead of values. values is not a great choice for a column name, because DataFrames have a special attribute called values, which prevents you from getting the column with x.values --- you'd have to use x['values'] instead.)
Given a DataFrame like this:
>>> df
0 1 2
0 2 3 5
1 3 4 7
and a function that returns multiple results, like this:
def sumprod(x, y, z):
return x+y+z, x*y*z
I want to add new columns, so the result would be:
>>> df
0 1 2 sum prod
0 2 3 5 10 30
1 3 4 7 14 84
I have been successful with functions that returns one result:
df["sum"] = p.apply(sum, axis=1)
but not if it returns more than one result.
One way to do this is to pass the columns of the DataFrame to the function by unpacking the transpose of the array:
>>> df['sum'], df['prod'] = sumprod(*df.values.T)
>>> df
0 1 2 sum prod
0 2 3 5 10 30
1 3 4 7 14 84
sumprod returns a tuple of columns and, since Python supports multiple assignment, you can assign them to new column labels as above.
You could write df['sum'], df['prod'] = sumprod(df[0], df[1], df[2]) to get the same result. This is clearer and is preferable if you need to pass the columns to the function in a particular order. On the other hand, it's a lot more verbose if you have a lot of columns to pass to the function.