why won't pandas equations use the values in the columns - python

I have a dataframe with numbers, and they are printed out using the print command so I know it is in the dataframe. But when I do my equations and my conditional they are not in the variables.
import pandas as pd
data = pd.read_excel('Cam_practice1.xlsx')
df = pd.DataFrame(data, columns = ['x_block', 'y_block'])
print(df)
equation_x = ((df.x_block))**2
equation_y = ((df.y_block))**2
eq = equation_x + equation_y
if eq <=4 :
df.to_csv('gridoutput.csv')
What I want is with the complete formula eq, when that value is less than or equal to 4 I want the row to written to a new output. Where am I going wrong?

You can do this:
equation_x = ((df.x_block))**2
equation_y = ((df.y_block))**2
eq = equation_x + equation_y
df[eq<=4].to_csv('gridoutput.csv')

Related

How to do a multiplication of two different columns and rows

How can I make this account that I made in excel in python...
I wanted to take the column "Acumulado" and multiply by the bottom row of the column 'Selic por diy' and add that value in that row, and so do the same thing successively
import pandas as pd
# Creating the dataframe
df = pd.DataFrame({"Data":['06/03/2006','07/03/2006','08/03/2006','09/03/2006','10/03/2006','13/03/2006','14/03/2006','15/03/2006','16/03/2006','17/03/2006'],
"Taxa SELIC":[17.29,17.29,17.29,16.54,16.54,16.54,16.54,16.54,16.54,16.54,]})
df['Taxa Selic %'] = df['Taxa SELIC'] / 100
df['Selic por dia'] = (1 + df['Taxa SELIC'])**(1/252)
Data frame Example
Here's an example I did in excel
Second example of how I would like it to look
Not an efficient method, but you can try this:
import numpy as np
selic_per_dia = list(df['Selic por dia'].values)
accumulado = [1000000*selic_per_dia[0]]
for i,value in enumerate(selic_per_dia):
if i==0:
continue
else:
accumulado.append(accumulado[i-1]*value)
df['Acumulado'] = accumulado
df.loc[-1] = [np.nan,np.nan,np.nan,np.nan,1000000]
df.index = df.index + 1
df = df.sort_index()

how to apply fsolve over pandas dataframe columns?

I'm trying to solve a system of equations:
and I would like to apply fsolve over a pandas dataframe.
How can I do that?
this is my code:
import numpy as np
import pandas as pd
import scipy.optimize as opt
a = np.linspace(300,400,30)
b = np.random.randint(700,18000,30)
c = np.random.uniform(1.4,4.0,30)
df = pd.DataFrame({'A':a, 'B':b, 'C':c})
def func(zGuess,*Params):
x,y,z = zGuess
a,b,c = Params
eq_1 = ((3.47-np.log10(y))**2+(np.log10(c)+1.22)**2)**0.5
eq_2 = (a/101.32) * (101.32/b)** z
eq_3 = 0.381 * x + 0.05 * (b/101.32) -0.15
return eq_1,eq_2,eq_3
zGuess = np.array([2.6,20.2,0.92])
df['result']= df.apply(lambda x: opt.fsolve(func,zGuess,args=(x['A'],x['B'],x['C'])))
But still not working, and I can't see the problem
The error: KeyError: 'A'
basically means he can't find the reference to 'A'
Thats happening because apply doesn't default to apply on rows.
By setting the parameter 1 at the end, it will iterate on each row, looking for the column reference 'A','B',...
df['result']= df.apply(lambda x: opt.fsolve(func,zGuess,args=(x['A'],x['B'],x['C'])),1)
That however, might not give the desired result, as it will save all the output (an array) into a single column.
For that, make reference to the three columns you want to create, make an interator with zip(*...)
df['output_a'],df['output_b'],df['output_c'] = zip(*df.apply(lambda x: opt.fsolve(func,zGuess,args=(x['A'],x['B'],x['C'])),1) )

Python Pandas rolling mean DataFrame Constructor not properly called

I am trying to create a simple time-series, of different rolling types. One specific example, is a rolling mean of N periods using the Panda python package.
I get the following error : ValueError: DataFrame constructor not properly called!
Below is my code :
def py_TA_MA(v, n, AscendType):
df = pd.DataFrame(v, columns=['Close'])
df = df.sort_index(ascending=AscendType) # ascending/descending flag
M = pd.Series(df['Close'].rolling(n), name = 'MovingAverage_' + str(n))
df = df.join(M)
df = df.sort_index(ascending=True) #need to double-check this
return df
Would anyone be able to advise?
Kind regards
found the correction! It was erroring out (new error), where I had to explicitly declare n as an integer. Below, the code works
#xw.func
#xw.arg('n', numbers = int, doc = 'this is the rolling window')
#xw.ret(expand='table')
def py_TA_MA(v, n, AscendType):
df = pd.DataFrame(v, columns=['Close'])
df = df.sort_index(ascending=AscendType) # ascending/descending flag
M = pd.Series(df['Close'], name = 'Moving Average').rolling(window = n).mean()
#df = pd.Series(df['Close']).rolling(window = n).mean()
df = df.join(M)
df = df.sort_index(ascending=True) #need to double-check this
return df

Excel xlwings data input for Python Technical Indicators

I am trying to replicate a simple Technical-Analysis indicator using xlwings. However, the list/data seems not to be able to read Excel values. Below is the code
import pandas as pd
import datetime as dt
import numpy as np
#xw.func
def EMA(df, n):
EMA = pd.Series(pd.ewma(df['Close'], span = n, min_periods = n - 1), name = 'EMA_' + str(n))
df = df.join(EMA)
return df
When I enter a list of excel data : EMA = ({1,2,3,4,5}, 5}, I get the following error message
TypeError: list indices must be integers, not str EMA = pd.Series(pd.ewma(df['Close'], span = n, min_periods = n - 1), name = 'EMA_' + str(n))
(Expert) help much appreciated! Thanks.
EMA() expects a DataFrame df and a scalar n, and it returns the EMA in a separate column in the source DataFrame. You are passing a simple list of values, this is not supposed to work.
Construct a DataFrame and assign the values to the Close column:
v = range(100) # use your list of values instead
df = pd.DataFrame(v, columns=['Close'])
Call EMA() with this DataFrame:
EMA(df, 5)

Pandas assign each row the mean of its bin

I have the following dataframe (p1.head(7)):
ColA
0 6.286333
1 3.317000
2 13.24889
3 26.20667
4 26.25556
5 60.59000
6 79.59000
7 1.361111
I can get the bin ranges using:
pandas.qcut(p1.ColA, 4)
Is there a way I can create a new column where each value corresponds to the mean value of the bin? I.e for each bin, (a,b], I want (a+b)/2
The key here is the retbins option on qcut.
import pandas
df = pandas.DataFrame(np.random.random(100)*100, columns=['val1'])
pctiles = pandas.qcut(df['val1'],4,retbins=True)
pctile_object = pctiles[0]
pctile_boundaries = pctiles[1]
Here pctile_object is just what qcut would return if you hadn't passed retbins=True, and pctile_boundaries is a numpy array of the interval boundaries.
import numpy
bin_halfway = pctile_boundaries[:-1] + (numpy.diff(pctile_boundaries)/2)
This gives us the halfway points of the bins.
Now we make a dataframe with just the interval names (as strings) and the halfway points.
df2 = pandas.DataFrame({'quartile boundaries': pctile_object.levels,
'midway point': bin_halfway})
Finally, merge the bin halfway points back into the original dataframe.
df['quartile boundaries'] = pctile_object
pandas.merge(df,df2,on='quartile boundaries')
Then you can drop quartile boundaries if you want.
I wrote a function to utilize #exp1orer 's logic:
def midway_quantiles(feature_series,q=4):
import pandas as pd
pctiles = pd.qcut(feature_series,q,retbins=True)
pctile_object = pctiles[0]
df1= pd.DataFrame({"feature":feature_series,"q_bound": pctile_object})
pctile_boundaries = pctiles[1]
import numpy as np
bin_halfway = pctile_boundaries[:-1] + (np.diff(pctile_boundaries)/2)
df2 = pd.DataFrame({"q_bound": pctile_object.cat.categories,
"midpoint": bin_halfway})
df3=pd.merge(df1,df2,on="q_bound",how="left")
return df3["midpoint"]

Categories

Resources