Function erroring out when calling another function

Function erroring out when calling another function - python

I'm getting the following error when calling a function from another function:
TypeError: 'GLMResultsWrapper' object is not callable
I get the error at the coeffs = model_results(model_results) line below.
This is another function that runs error free outside of the table_to_graph function. The model_results function takes the summary output from a statsmodel model and puts it into a data frame.
The table_to_graph function joins that dataframe to another table that is the df in the input. table_to_graph function below.
The ultimate function is the following:
# Add into table generation table
def table_to_graph(model_results, df):
'''
#function that combines rating tables and model summary
'''
coeffs = model_results(model_results)
try:
df['key'] = df['variable']+"_"+df['level']
df = pd.merge(df, coeffs, left_on = 'key', right_on = 'index', how = 'left')
df['factor'] = np.exp(df[factor])
df['factor'].fillna(1, inplace = True)
df['error_up'] = np.exp(df[error_up])
df['error_down'] = np.exp(df[error_down])
#title2 = title1
df = df[['model', 'variable', 'level', 'total_incurred', 'total_count', 'cmeu', 'factor', 'error_up', 'error_down'
, 'pricing_model_1_p_values']]
return df
#df1 = df1.append(df)
except:
#df['level'] = df['level'].astype('str')
df['key'] = df['variable']+"_"+df['level'].astype('str')
df['level'] = df['level'].astype('int')
df = pd.merge(df, coeffs, left_on = 'key', right_on = 'index', how = 'left')
df['factor'] = np.exp(df[factor])
df['factor'].fillna(1, inplace = True)
df['error_up'] = np.exp(df[error_up])
df['error_down'] = np.exp(df[error_down])
df = df[['model', 'variable', 'level', 'total_incurred', 'total_count', 'cmeu', 'factor', 'error_up'
, 'error_down', 'pricing_model_1_p_values']]
#df1 = df1.append(df)
return df
model_results function below:
def model_results(model_results):
'''
function that puts model parameters into a data frame
'''
df = pd.DataFrame(model_results.params, columns = ['factor'])
df['error_down'] = model_results.conf_int()[0]
df['error_up'] = model_results.conf_int()[1]
df['standard_error'] = model_results.bse
df['pvalues'] = round(model_results.pvalues, 3)
df.reset_index(inplace = True)
return df

The problem is that you are not calling the function you have defined as model_results but instead are "calling" the model_results data on the model_results data. This is why you get the error that the object is not callable.
Change either the function name or the name of the model_results data to something else, this will allow python to make a distinction between the two and do what you want it to do. Which is call the function model_results on the model_results data.

Related

How to get a Series's parent DataFrame in Pandas?

Does a pandas.Series instance do know it's parent pandas.DataFrame when it comes from there?
Example:
import pandas
df = pandas.DataFrame({'col': range(10)})
series_column = df.col
print('My parent is {}'.format(series_column.parent))
# or
print('My parent is {}'.format(df.col.parent))
My goal is to make the signature of a method easier.
def foobar(data: DataFrame, column: str):
return data[column].do_something()
# I would like to save one argument
def foobar(column: pandas.Series):
return column.parent[column].do_something()
Here is a more real world example:
def frequency(data: pandas.DataFrame, column: str, dropna: bool = False):
tab = data[column].value_counts(dropna=dropna)
# sort index if it is an ordered category
if data[column].dtype.name == 'category':
if data[column].cat.ordered:
tab = tab.sort_index()
# Series to DataFrame
tab = tab.to_frame()
# two column MultiIndex
a = random_label()
tab[a] = column
tab = tab.reset_index()
tab = tab.set_index([a, column])
tab.index.names = (None, None)
tab.columns = ['n']
return tab

Python, Streamlit AgGrid add new row to AgGrid Table

I am trying to add a new row to an AgGrid Table using streamlit and python
At this point, I just want to add 1 or more new rows to the table generated by the AgGrid by pressing the "add row" button.
After pressing the "add row" button I generate a second table with the new row mistakenly, so I get 2 data-tables instead of updating the main table.
The initial data df = get_data() is been gathered from a SQL query. I want to add a new row and (for now) save it into a CSV file or at least get the updated DF with the new row added as an output and graph it
My current code
import streamlit as st
from metrics.get_metrics import get_data
from metrics.config import PATH_SAMPLES
filename: str = 'updated_sample.csv'
save_path = PATH_SAMPLES.joinpath(filename)
def generate_agrid(data: pd.DataFrame):
gb = GridOptionsBuilder.from_dataframe(data)
gb.configure_default_column(editable=True) # Make columns editable
gb.configure_pagination(paginationAutoPageSize=True) # Add pagination
gb.configure_side_bar() # Add a sidebar
gb.configure_selection('multiple', use_checkbox=True,
groupSelectsChildren="Group checkbox select children") # Enable multi-row selection
gridOptions = gb.build()
grid_response = AgGrid(
data,
gridOptions=gridOptions,
data_return_mode=DataReturnMode.AS_INPUT,
update_on='MANUAL', # <- Should it let me update before returning?
fit_columns_on_grid_load=False,
theme=AgGridTheme.STREAMLIT, # Add theme color to the table
enable_enterprise_modules=True,
height=350,
width='100%',
reload_data=True
)
data = grid_response['data']
selected = grid_response['selected_rows']
df = pd.DataFrame(selected) # Pass the selected rows to a new dataframe df
return grid_response
def onAddRow(grid_table):
df = pd.DataFrame(grid_table['data'])
column_fillers = {
column: (False if df.dtypes[column] == "BooleanDtype"
else 0 if df.dtypes[column] == "dtype('float64')"
else '' if df.dtypes[column] == "string[python]"
else datetime.datetime.utcnow() if df.dtypes[column] == "dtype('<M8[ns]')"
else '')
for column in df.columns
}
data = [column_fillers]
df_empty = pd.DataFrame(data, columns=df.columns)
df = pd.concat([df, df_empty], axis=0, ignore_index=True)
grid_table = generate_agrid(df)
return grid_table
# First data gather
df = get_data()
if __name__ == '__main__':
# Start graphing
grid_table = generate_agrid(df)
# add row
st.sidebar.button("Add row", on_click=onAddRow, args=[grid_table])

Here is a sample minimal code.
import streamlit as st
import pandas as pd
from st_aggrid import AgGrid, GridOptionsBuilder, GridUpdateMode
def generate_agrid(df):
gb = GridOptionsBuilder.from_dataframe(df)
gb.configure_selection(selection_mode="multiple", use_checkbox=True)
gridoptions = gb.build()
grid_response = AgGrid(
df,
height=200,
gridOptions=gridoptions,
update_mode=GridUpdateMode.MANUAL
)
selected = grid_response['selected_rows']
# Show the selected row.
if selected:
st.write('selected')
st.dataframe(selected)
return grid_response
def add_row(grid_table):
df = pd.DataFrame(grid_table['data'])
new_row = [['', 100]]
df_empty = pd.DataFrame(new_row, columns=df.columns)
df = pd.concat([df, df_empty], axis=0, ignore_index=True)
# Save new df to sample.csv.
df.to_csv('sample.csv', index=False)
def get_data():
"""Reads sample.csv and return a dataframe."""
return pd.read_csv('sample.csv')
if __name__ == '__main__':
df = get_data()
grid_response = generate_agrid(df)
st.sidebar.button("Add row", on_click=add_row, args=[grid_response])
Initial output
Output after pressing add row
sample.csv
team,points
Lakers,120
Celtics,130

How to Edit Cell of Streamlit AgGrid's Row?

I have already created AgGrid by loading data from a csv file. I am adding rows one by one via an external button. But when I try to edit the line I added, it disappears. I would be very grateful if you could help me where the error is. The codes are as follows.
import pandas as pd
import streamlit as st
from st_aggrid import AgGrid, GridUpdateMode, JsCode
from st_aggrid.grid_options_builder import GridOptionsBuilder
import sys
import os
import altair as alt
from streamlit.runtime.legacy_caching import caching
def data_upload():
df = pd.read_csv("data.csv")
return df
if 'grid' in st.session_state:
grid_table = st.session_state['grid']
df = pd.DataFrame(grid_table['data'])
df.to_csv(“data.csv”, index=False)
else:
df = data_upload()
gd = GridOptionsBuilder.from_dataframe(df)
gd.configure_column("Location", editable=True)
gd.configure_column("HourlyRate", editable=True)
gd.configure_column("CollaboratorName", editable=True)
gridOptions = gd.build()
button = st.sidebar.button("Add Line")
if "button_state" not in st.session_state:
st.session_state.button_state = False
if button or st.session_state.button_state:
st.session_state.button_state = True
data = [['', '', 0]]
df_empty = pd.DataFrame(data, columns=['CollaboratorName', 'Location', "HourlyRate"])
df = pd.concat([df, df_empty], axis=0, ignore_index=True)
df.to_csv(“data.csv”, index=False)
gd= GridOptionsBuilder.from_dataframe(df)
grid_table = AgGrid(df,
gridOptions=gridOptions,
fit_columns_on_grid_load=True,
height=500,
width='100%',
theme="streamlit",
key= 'unique',
update_mode=GridUpdateMode.GRID_CHANGED,
reload_data=True,
allow_unsafe_jscode=True,
editable=True
)
if 'grid' not in st.session_state:
st.session_state['grid'] = grid_table
else:
grid_table_df = pd.DataFrame(grid_table['data'])
grid_table_df.to_csv(“data.csv”, index=False)
You can see the running app from here enter image description here

This one has a different approach but the goal could be the same.
Two radio buttons are created, if value is yes new line will be created, if value is no there is no new line.
If you want to add a new line, select yes and then add your entry. Then press the update button in the sidebar.
If you want to edit but not add a new line, select no, edit existing entry and then press the update button.
Code
import streamlit as st
from st_aggrid import AgGrid, GridOptionsBuilder
import pandas as pd
def data_upload():
df = pd.read_csv("data.csv")
return df
def show_grid(newline):
st.header("This is AG Grid Table")
df = data_upload()
if newline == 'yes':
data = [['', '', 0]]
df_empty = pd.DataFrame(data, columns=['CollaboratorName', 'Location', "HourlyRate"])
df = pd.concat([df, df_empty], axis=0, ignore_index=True)
gb = GridOptionsBuilder.from_dataframe(df)
gb.configure_default_column(editable=True)
grid_table = AgGrid(
df,
height=400,
gridOptions=gb.build(),
fit_columns_on_grid_load=True,
allow_unsafe_jscode=True,
)
return grid_table
def update(grid_table):
grid_table_df = pd.DataFrame(grid_table['data'])
grid_table_df.to_csv('data.csv', index=False)
# start
addline = st.sidebar.radio('Add New Line', options=['yes', 'no'], index=1, horizontal=True)
grid_table = show_grid(addline)
st.sidebar.button("Update", on_click=update, args=[grid_table])

That happened because of your if button: statement. Streamlit button has no callbacks so any user entry under a st.button() will always reload the page so you end up losing the data, to prevent this, you can either initialize a session state fo your button or you can use st.checkbox() in place of st.button().
In this case I am going to fix your code by initializing a session state of the button.
def data_upload():
df = pd.read_csv("data.csv")
return df
st.header("This is AG Grid Table")
if 'grid' in st.session_state:
grid_table = st.session_state['grid']
df = pd.DataFrame(grid_table['data'])
df.to_csv('data.csv', index=False)
else:
df = data_upload()
gd = GridOptionsBuilder.from_dataframe(df)
gd.configure_column("Location", editable=True)
gd.configure_column("HourlyRate", editable=True)
gd.configure_column("CollaboratorName", editable=True)
gridOptions = gd.build()
def update():
caching.clear_cache()
button = st.sidebar.button("Add Line")
# Initialized session states # New code
if "button_state" not in st.session_state:
st.session_state.button_state = False
if button or st.session_state.button_state:
st.session_state.button_state = True # End of new code
data = [['', '', 0]]
df_empty = pd.DataFrame(data, columns=['CollaboratorName', 'Location', "HourlyRate"])
df = pd.concat([df, df_empty], axis=0, ignore_index=True)
gd= GridOptionsBuilder.from_dataframe(df)
df.to_csv('data.csv', index=False)
gridOptions = gd.build()
grid_table = AgGrid(df,
gridOptions=gridOptions,
fit_columns_on_grid_load=True,
height=500,
width='100%',
theme="streamlit",
key= 'unique',
update_mode=GridUpdateMode.GRID_CHANGED,
reload_data=True,
allow_unsafe_jscode=True,
editable=True
)
if 'grid' not in st.session_state:
st.session_state['grid'] = grid_table
grid_table_df = pd.DataFrame(grid_table['data'])
grid_table_df.to_csv('data.csv', index=False)
I think your code should work fine now with regards to the button issue.

Is there a way to store output dataframes and appending them to the last output in the same dataframe

I am trying to fetch data from API for 50 parcels. I want them to be in a single data frame. While running this loop the data frame is storing only the last parcel which is satisfying the loop condition. Is there any way to store all the previous outputs also in the same dataframe.
For e.g upon running this code it only returns the data frame for foreign id=50, I want the dataframe for all 1-50.
import requests
import pandas as pd
foreign=1
while (foreign <=50):
s1_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s1?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs1product_end_time%2Cs1product_ron%2Ccohvh_avg%2Ccohvv_avg%2Cvhvv_avg'
s2_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s2?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs2product_start_time%2Cs2product_ron%2Cndvi_avg'
position = 101
foreign_n=str(foreign)
s1_time_series_url_p6 = s1_time_series_url_p6[:position] + foreign_n + s1_time_series_url_p6[position+1:]
s2_time_series_url_p6 = s2_time_series_url_p6[:position] + foreign_n + s2_time_series_url_p6[position+1:]
r_s1_time_series_p6 = requests.get(s1_time_series_url_p6)
r_s2_time_series_p6 = requests.get(s2_time_series_url_p6)
json_s1_time_series_p6 = r_s1_time_series_p6.json()
json_s2_time_series_p6 = r_s2_time_series_p6.json()
df_s1_time_series_p6 = pd.DataFrame(json_s1_time_series_p6['s1_time_series'])
df_s2_time_series_p6 = pd.DataFrame(json_s2_time_series_p6['s2_time_series'])
df_s2_time_series_p6.s2product_start_time=df_s2_time_series_p6.s2product_start_time.str[0:11]
df_s1_time_series_p6.s1product_end_time=df_s1_time_series_p6.s1product_end_time.str[0:11]
dfinal_p6 = df_s1_time_series_p6.merge(df_s2_time_series_p6, how='inner', left_on='s1product_end_time', right_on='s2product_start_time')
cols_p6 = ['parcel_foreign_id_x', 's1product_ron','parcel_foreign_id_y','s2product_ron']
dfinal_p6[cols_p6] = dfinal_p6[cols_p6].apply(pd.to_numeric, errors='coerce', axis=1)
dfinal_p6

The issue is resolved by first creating an empty data frame and then appending the outputs in the dataframe within the loop.
The updated code is as follows:
column_names = ["parcel_foreign_id_x", "s1product_end_time", "s1product_ron","cohvh_avg", "cohvv_avg", "vhvv_avg","parcel_foreign_id_y", "s2product_start_time", "s2product_ron", "ndvi_avg" ]
df = pd.DataFrame(columns = column_names)
foreign=1
while (foreign <=50):
s1_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s1?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs1product_end_time%2Cs1product_ron%2Ccohvh_avg%2Ccohvv_avg%2Cvhvv_avg'
s2_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s2?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs2product_start_time%2Cs2product_ron%2Cndvi_avg'
position = 101
foreign_n=str(foreign)
s1_time_series_url_p6 = s1_time_series_url_p6[:position] + foreign_n + s1_time_series_url_p6[position+1:]
s2_time_series_url_p6 = s2_time_series_url_p6[:position] + foreign_n + s2_time_series_url_p6[position+1:]
r_s1_time_series_p6 = requests.get(s1_time_series_url_p6)
r_s2_time_series_p6 = requests.get(s2_time_series_url_p6)
json_s1_time_series_p6 = r_s1_time_series_p6.json()
json_s2_time_series_p6 = r_s2_time_series_p6.json()
df_s1_time_series_p6 = pd.DataFrame(json_s1_time_series_p6['s1_time_series'])
df_s2_time_series_p6 = pd.DataFrame(json_s2_time_series_p6['s2_time_series'])
df_s2_time_series_p6.s2product_start_time=df_s2_time_series_p6.s2product_start_time.str[0:11]
df_s1_time_series_p6.s1product_end_time=df_s1_time_series_p6.s1product_end_time.str[0:11]
dfinal_p6 = df_s1_time_series_p6.merge(df_s2_time_series_p6, how='inner', left_on='s1product_end_time', right_on='s2product_start_time')
cols_p6 = ['parcel_foreign_id_x', 's1product_ron','parcel_foreign_id_y','s2product_ron']
dfinal_p6[cols_p6] = dfinal_p6[cols_p6].apply(pd.to_numeric, errors='coerce', axis=1)
df = pd.concat([dfinal_p6,df],ignore_index = True)
foreign = foreign+1

How to use Dask for with .groupby.apply?

I have dataframe df
I would like to partition df into sub-dataframes and apply function find_root on each of them. My function only takes columns id and parent_id as input.
Then I would like to concatenate resulted dataframes. Because my dataframe is huge (over 4 million rows), I would like to use Dask. Then I have an error
ValueError: The columns in the computed data do not match the columns in the provided metadata
Extra: []
Missing: [2]
Could you please elaborate on how to solve this error?
import pandas as pd
import networkx as nx
from dask.distributed import Client
import dask.dataframe as dd
client = Client(n_workers=2, threads_per_worker=1, processes=False, memory_limit='4GB')
def find_root(df):
g = nx.from_pandas_edgelist(df, source = 'parent_id', target = 'id', create_using = nx.DiGraph())
roots = {n for n, d in g.in_degree() if d == 0}
tmp = {}
for r in roots:
tree = dfs_tree(g, r)
tmp[r] = list(tree.nodes)
tmp = pd.DataFrame.from_dict(tmp, orient = 'index').T
tmp = tmp.melt(value_name = 'node', var_name = 'root').dropna()
return tmp
path = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/sample_df.csv'
df = dd.read_csv(path, header = 0)
df = df[['id', 'created_utc', 'ups', 'link_id', 'author', 'body', 'parent_id']]
df['parent_id'] = df['parent_id'].str.split('_', expand = True, n = 2)[1]
df['link_id'] = df['link_id'].str.split('_', expand = True, n = 2)[1]
result = df.groupby('link_id').apply(find_root, meta = object)
computed_result = result.compute()
Update: I added dtype to dd.read_csv
df = dd.read_csv(path, header = 0, dtype = {'id': 'str', 'parent_id': 'str', 'link_id': 'str'})
but the problem persists.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Function erroring out when calling another function - python

Related

How to get a Series's parent DataFrame in Pandas?

Python, Streamlit AgGrid add new row to AgGrid Table

How to Edit Cell of Streamlit AgGrid's Row?

Is there a way to store output dataframes and appending them to the last output in the same dataframe

How to use Dask for with .groupby.apply?

Categories

Resources