I have a dataframe of numbers and would like to multiply each observation row wise or along axis = 1 and output the answer in another column. As an example:
import pandas as pd
import numpy as np
arr = np.array([2, 3, 4])
df = pd.DataFrame(arr).transpose()
df
What I would like is a column that has value 24 from multiplying column 0 by column 1 by column 2.
I tried the df.mul(axis = 1) but that didn't work.
I'm sure this is easy but all I find is multiplying each column by a constant.
This is prod
df.prod(1)
Out[69]:
0 24
dtype: int32
try to do some thing like this:
import numpy
def multiplyFunction(row):
return numpy.prod(row)
df['result'] = df.apply(multiplyFunction, axis=1)
df.head()
Result
0 1 2 result
0 2 3 4 24
Let me know if it's help
Related
I have this dataset
In [4]: df = pd.DataFrame({'A':[1, 2, 3, 4, 5]})
In [5]: df
Out[5]:
A
0 1
1 2
2 3
3 4
4 5
I want to add a new column in dataset based em last value of item, like this
A
New Column
1
2
1
3
2
4
3
5
4
I tryed to use apply with iloc, but it doesn't worked
Can you help
Thank you
With your shown samples, could you please try following. You could use shift function to get the new column which will move all elements of given column into new column with a NaN in first element.
import pandas as pd
df['New_Col'] = df['A'].shift()
OR
In case you would like to fill NaNs with zeros then try following, approach is same as above for this one too.
import pandas as pd
df['New_Col'] = df['A'].shift().fillna(0)
I have a pandas Series instance defined as follows:
import pandas as pd
timestamps = [1,2,3,4,5,6,7,8,9,10]
quantities = [1,9,6,6,6,4,4,4,5,2]
series = pd.Series(quantities, index=timestamps)
Is it possible to supply an array of index values and retrieve the quantities at them? And if it is, what's the fastest way of achieving this, please?
For example, if I supply:
timestamps = [1,1,1,4]
I expect the following back from series:
quantities = [1,1,1,6]
Thanks for any help here.
It is possible:
>>> series[[1,1,1,4]]
1 1
1 1
1 1
4 6
dtype: int64
>>> series[[1,1,1,4]].values
array([1, 1, 1, 6])
Suppose I have a pandas dataframe given by
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,2))
df
0 1
0 0.264053 -1.225456
1 0.805492 -1.072943
2 0.142433 -0.469905
3 0.758322 0.804881
4 -0.281493 0.602433
I want to return a Series object with 4 rows, containing max(df[0,0], df[1,1]), max(df[1,0], df[2,1]), max(df[2,0], df[3,1]), max(df[3,0], df[4,1]). More generally, what is the best way to compare the max of column 0 and column 1 offset by n rows?
Thanks.
You want to apply max to rows after having shifted the first column.
pd.concat([df.iloc[:, 0].shift(), df.iloc[:, 1]], axis=1).apply(max, axis=1).dropna()
So I got a pandas DataFrame with a single column and a lot of data.
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
When looping through the DataFrame it always stops after the first one.
If I convert it to a list before, then my numbers are all in braces (eg. [12] instead of 12) thus breaking my code.
Does anyone see what I am doing wrong?
import pandas as pd
def go_trough_list(df):
for number in df:
print(number)
df = pd.read_csv("my_ids.csv")
go_trough_list(df)
df looks like:
1
0 2
1 3
2 4
dtype: object
[Finished in 1.1s]
Edit: I found one mistake. My first value is recognized as a header.
So I changed my code to:
df = pd.read_csv("my_ids.csv",header=None)
But with
for ix in df.index:
print(df.loc[ix])
I get:
0 1
Name: 0, dtype: int64
0 2
Name: 1, dtype: int64
0 3
Name: 2, dtype: int64
0 4
Name: 3, dtype: int64
edit: Here is my Solution thanks to jezrael and Nick!
First I added headings=None because my data has no header.
Then I changed my function to:
def go_through_list(df)
new_list = df[0].apply(my_function,parameter=par1)
return new_list
And it works perfectly! Thank you again guys, problem solved.
You can use the index as in other answers, and also iterate through the df and access the row like this:
for index, row in df.iterrows():
print(row['column'])
however, I suggest solving the problem differently if performance is of any concern. Also, if there is only one column, it is more correct to use a Pandas Series.
What do you mean by parse it into another function? Perhaps take the value, and do something to it and create it into another column?
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
Perhaps this example will help:
import pandas as pd
df = pd.DataFrame([20, 21, 12])
def square(x):
return x**2
df['new_col'] = df[0].apply(square) # can use a lambda here nicely
You can convert column as Series tolist:
for x in df['Colname'].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({'a': pd.Series( [1, 2, 3]),
'b': pd.Series( [4, 5, 6])})
print df
a b
0 1 4
1 2 5
2 3 6
for x in df['a'].tolist():
print x
1
2
3
If you have only one column, use iloc for selecting first column:
for x in df.iloc[:,0].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({1: pd.Series( [2, 3, 4])})
print df
1
0 2
1 3
2 4
for x in df.iloc[:,0].tolist():
print x
2
3
4
This can work too, but it is not recommended approach, because 1 can be number or string and it can raise Key error:
for x in df[1].tolist():
print x
2
3
4
Say you have one column named 'myColumn', and you have an index on the dataframe (which is automatically created with read_csv). Try using the .loc function:
for ix in df.index:
print(df.loc[ix]['myColumn'])
I have a Pandas Series where each element of the series is a one row Pandas DataFrame which I would like to append together into one big DataFrame. For example:
import pandas as pd
mySeries = pd.Series( numpy.arange(start=1, stop=5, step=1) )
def myFun(val):
return pd.DataFrame( { 'square' : [val**2],
'cube' : [val**3] } )
## returns a Pandas Series where each element is a single row dataframe
myResult = mySeries.apply(myFun)
so how do I take myResult and combine all the little dataframes into one big dataframe?
import pandas as pd
import numpy as np
mySeries = pd.Series(np.arange(start=1, stop=5, step=1))
def myFun(val):
return pd.Series([val ** 2, val ** 3], index=['square', 'cube'])
myResult = mySeries.apply(myFun)
print(myResult)
yields
square cube
0 1 1
1 4 8
2 9 27
3 16 64
concat them:
In [58]: pd.concat(myResult).reset_index(drop=True)
Out[58]:
cube square
0 1 1
1 8 4
2 27 9
3 64 16
Since the original indexes are all 0, I also reset them.
Its seems overly complicated, although you probably posted a simplified example. Creating a new Series for each row creates a lot of overhead. This for example is over 200 times faster (for n=500) on my machine:
meResult = pd.DataFrame({'square': mySeries**2,'cube': mySeries**3})