filtering pd.DataFrame with dash dropdown - python

I'm new to Dash and trying to filter the the following dataframe with a dropdown and show the filter dataframe. I am not able to define the callback function properly. Any sample code is really appreciated,
df = pd.DataFrame({
'col1': ['A', 'B', 'B', 'A', 'B', 'A'],
'col2': [2, 1, 9, 8, 7, 4],
'col3': [0, 1, 9, 4, 2, 3],
})

Here is the page from the docs that I think might be a good place to start. If you know the dataframe ahead of time, then you can populate the dropdown with the values that you know you'll need. If you don't, then you would need to use a callback to update the dropdown options prop.
The first example on that docs page shows how to get the value from a dropdown, and output it. In your case, you'd use the dropdown to filter the dataframe perhaps like:
#app.callback(
Output('data-table-id', 'data'),
[Input('dropdown-id', 'value')
)
def callback_func(dropdown_value):
df_filtered = df[df['col1'].eq(dropdown_value)]
return df_filtered.to_dict(orient='records)
And then you would set the output of the callback to the data prop of a Dash datatable. I've added some made-up values for the Output and Input part of the dropdown, hopefully that gives the general idea.

Related

Creating a function that accepts a data frame and returns N row with largest values

I was trying to create a function that would take a python pandas dataframe and return the nth row with the highest value available. I found the Pandas nlargest function can take more than one variable to order the top rows.I used it and came up with the function below:
def largestvalue(df, x):
lar = df.nlargest(x, ['AGI', 'COSTT4_A'])
return lar
Now in this function I specified the columns I wanted the function to select the largest x number of rows from. It sort of worked as below:
largestvalue(df_merged,2)
FunctionResult
However, I was wondering what I would need to do to provide the user with a function that could allow him to specify the column by either inputting the column name or location number for the chosen data frame. So that he could not only specify the dataframe and the number of rows but also the columns of interest
-Hassan
You can use arguments in your function definition that allow choosing the number of rows and columns.
def largestvalue(df, num_of_rows, cols):
return df.nlargest(num_of_rows, cols)
data = {'AAA': [3, 8, 2, 1],
'BBB': [5, 4, 7, 2],
'CCC': [2, 5, 6, 4]}
df = pd.DataFrame(data)
lar = largestvalue(df, 1, ['AAA', 'BBB'])

Wrong order of values on X axes when build charts with groups using plotly.py

Rows in my data consist of three columns: version, configuration, and value. I want to have two lines which represent configurations on my chart to show dependency of value (y axis) on version (x axis). Everything works perfect as long as every configuration (group) have the same set of values on x axis:
import plotly.express as px
import pandas
rows = [
['1', 'a', 4],
['1', 'b', 3],
['2', 'a', 6],
['2', 'b', 3],
['3', 'a', 6],
['3', 'b', 7],
]
df = pandas.DataFrame(columns=['version', 'config', 'value'],
data=rows)
fig = px.line(df,
x='version',
y='value',
color='config',
line_group='config'
)
fig.write_html("charts.html")
charts.html
Problems start when one category does not have some value on x axis:
rows = [
['1', 'a', 4],
['1', 'b', 3],
# ['2', 'a', 6],
['2', 'b', 3],
['3', 'a', 6],
['3', 'b', 7],
]
As you can see we have versions in the wrong order on the chart: charts.html
The problem here is that we order values on x axis based on first category in input data (a in our case). For example, when remove row from b category, I see correct order.
Using string as a version in essential in my case, one digit version is just for simplicity of the example.
Question is how to order x axis based on values in all categories?
My solution is to use category_orders argument:
fig = px.line(df,
x='version',
y='value',
color='config',
line_group='config',
category_orders={'version': df["version"]}
)
category_orders to override the default category ordering behaviour, which is to use the order in which the data appears in the input. category_orders accepts a dict whose keys are the column name to reorder and whose values are a list of values in the desired order. These orderings apply everywhere categories appear: in legends, on axes, in bar stacks, in the order of facets, in the order of animation frames etc.
Source: https://plotly.com/python/styling-plotly-express/
Running your code within a Google Colab notebook, this created the results chart with the correctly ordered axis.
Results chart:
I would assume this is caused by package discrepancies. This is what my colab notebook had for the imported packages:
pandas 1.1.5
pandas-datareader 0.9.0
pandas-gbq 0.13.3
pandas-profiling 1.4.1
plotly 4.4.1
Perhaps try running your code within Colab and or ensuring the package versions match, I would think plotly version is the most likely culprit.

Strange layout of the HDF tables from pandas.HDFStore

When I output a pandas.DataFrame as a table in HDFStore:
import pandas as pd
df=pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=range(2))
with pd.HDFStore("test.hdf5") as store:
store.put("test", df, format="table")
I get the following layout when reading in ViTables:
I can correctly read it back with pandas.read_hdf(), but I find the data difficult to read: It's in these blocks, and the name of the columns is hidden by a values_block_0 label.
Is there a way to have a more intuitive layout in the HDF?
Adding datacolumns=True in store.put() arguments gives a better layout:

Why won't barchart in Pandas stack different values?

Using Pandas, python 3. Working in jupyter.
Ive made this graph below using the following code:
temp3 = pd.crosstab(df['Credit_History'], df['Loan_Status'])
temp3.plot(kind = 'bar', stacked = True, color = ['red', 'blue'], grid = False)
print(temp3)
And then tried to do the same, but with divisions for Gender. I wanted to make this:
So I wrote this code:
And made this monstrosity. I'm unfamiliar with pivot tables in pandas, and after reading documentation, am still confused. I'm assuming that aggfunc affects the values given, but not the indices. How can I separate the loan status so that it reads as different colors for 'Y' and 'N'?
Trying a method similar to the methods used for temp3 simply yields a key error:
temp3x = pd.crosstab(df['Credit_History'], df['Loan_Status', 'Gender'])
temp3x.plot(kind = 'bar', stacked = True, color = ['red', 'blue'], grid = False)
print(temp3)
How can I make the 'Y' and 'N' appear separately as they are in the first graph, but for all 4 bars instead of using just 2 bars?
You need to make a new column called Loan_status_word and then pivot.
loan_status_word = loan_status.map({0:'No', 1:'Yes'})
df.pivot_table(values='Loan_Status',
index=['Credit_History', 'Gender'],
columns = 'loan_status_word',
aggfunc ='size')
Try to format your data such that each item you want in your legend is in a single column.
df = pd.DataFrame(
[
[3, 1],
[4, 1],
[1, 4],
[1, 3]
],
pd.MultiIndex.from_product([(1, 0), list('MF')], names=['Credit', 'Gendeer']),
pd.Index(['Yes', 'No'], name='Loan Status')
)
df
Then you can plot
df.plot.bar(stacked=True)
Below is the code to achieve the desired result:
temp4=pd.crosstab([df['Credit_History'],df['Gender']],df['Loan_Status'])
temp4.plot(kind='bar',stacked=True,color=['red','blue'],grid=False)

Why can't I assign to part of my Pandas DataFrame?

I'm confused why the following pandas does not successfully assign the last two values of column A to the first two entries of column B:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7], 'B': [10, 20, 30, 40, 50, 60, 70]})
df = df.join(pd.DataFrame({'C': ['a', 'b', 'c', 'd', 'e', 'f', 'g']}))
df['B2'] = df.B.shift(2)
df[:2].B2 = list(df[-2:].A)
What's perplexing to me is that in an (apparently) equivalent "real" application, it does appear to work (and to generate some strange behavior).
Why does the final assignment fail to change the values of the two entries in the dataframe?
It can work and that's why its insidious, see here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Generally with multi-dtyped frames it depends on the construction of when it would work (e.g. if you create it all at once, I think it will always work). Since you are creating it after (via join) it is dependent on the underlying numpy view creation mechanisms.
don't ever ever ever assign like that, use loc
df.loc[:2,'B2'] = ....

Categories

Resources