Strange layout of the HDF tables from pandas.HDFStore - python

When I output a pandas.DataFrame as a table in HDFStore:
import pandas as pd
df=pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=range(2))
with pd.HDFStore("test.hdf5") as store:
store.put("test", df, format="table")
I get the following layout when reading in ViTables:
I can correctly read it back with pandas.read_hdf(), but I find the data difficult to read: It's in these blocks, and the name of the columns is hidden by a values_block_0 label.
Is there a way to have a more intuitive layout in the HDF?

Adding datacolumns=True in store.put() arguments gives a better layout:

Related

Trailing zeros for rounded Pandas DataFrame after applying style [duplicate]

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?
Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style
This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

HDFStore and querying by attributes

I am currently running a parameter study in which the results are returned as pandas DataFrames. I want to store these DFs in a HDF5 file together with the parameter values that were used to create them (parameter foo in the example below, with values 'bar' and 'foo', respectively).
I would like to be able to query the HDF5 file based on these attributes to arrive at the respective DFs - for example, I would like to be able to query for a DF with the attribute foo equal to 'bar'. Is it possible to do this in HDF5? Or would it be smarter in this case to create a multiindex DF instead of saving the parameter values as attributes?
import pandas as pd
df_1 = pd.DataFrame({'col_1': [1, 2],
'col_2': [3, 4]})
df_2 = pd.DataFrame({'col_1': [5, 6],
'col_2': [7, 8]})
store = pd.HDFStore('file.hdf5')
store.put('table_1', df_1)
store.put('table_2', df_2)
store.get_storer('table_1').attrs.foo = 'bar'
store.get_storer('table_2').attrs.foo = 'foo'
store.close()

Formatting numbers after coloring dataframe using Styler (pandas)

I created a DataFrame in pandas for which I want to colour the cells using a colour index (low values red, high values green). I succeeded in doing so, however the colouring prevents me to format the cells.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
df = df.style.background_gradient(cmap='RdYlGn')
df
which returns
However, when I try to use df.round(2) for example to format the numbers, the following error pops up:
AttributeError: 'Styler' object has no attribute 'round'
Is there anyone who can tell me what's going wrong?
Take a look at the pandas styling guide. The df.style property returns a Styler instance, not a dataframe. From the examples in the pandas styling guide, it seems like dataframe operations (like rounding) are done first, and styling is done last. There is a section on precision in the pandas styling guide. That section proposes three different options for displaying precision of values.
import pandas as pd
df = pd.DataFrame({'a': [0.5,1.5, 5],
'b': [2, 3.5, 7] })
# Option 1. Round before using style.
style = df.round(2).style.background_gradient(cmap='RdYlGn')
style
# Option 2. Use option_context to set precision.
with pd.option_context('display.precision', 2):
style = df.style.background_gradient(cmap='RdYlGn')
style
# Option 3. Use .set_precision() method of styler.
style = df.style.background_gradient(cmap='RdYlGn').set_precision(2)
style
This works with me:
stylish_df = df.style.background_gradient(cmap='RdYlGn').format(precision=2)

Is there a way to allow NaN values to be writen to CSV from panadas?

I have the following error builtins.AssertionError: 12 columns passed, passed data had 6 columns The last 6 Columns datawise will vary so Im happy to have None in the areas the data is missing. However I cant seem to find a simple way to do this, im pretty sure there must be an option for it but I cant see it in the docs or any google searches.
Any help would be apprecaited. I would like to reiterate that I know what is causing the problem and I know data is missing from coloumns. I would like to ignore missing data and am ahppy to have None or NaN in the output csv.
I imagine you have fixed headers, so my solution would be to extend each row respectively:
import pandas as pd
import numpy as np
columns = ('Person', 'Title', 'AnotherPerson', 'AnotherPerson2', 'AnotherPerson3', 'AnotherPerson4', 'Date', 'Group')
mandatory = len(columns)
data = [[1,2,3], [1, 2], [1, 2, 3, 4]]
data = list(map(lambda x: dict(enumerate(x)), data))
data = [[item.get(i, np.nan) for i in range(mandatory)] for item in data]
df = pd.DataFrame(data=data, columns=columns)

read and write a dataset description as a pandas dataframe "attribute"?

The R-help for feather_metadata states "Returns the dimensions, field names, and types; and optional dataset description."
#hrbrmstr Kindly posted a PR to answer this SO question and make it possible to add a dataset description to a feather file from R.
I'd like to know if it is possible to read (and write) such a dataset description in python / pandas using feather.read_dataframe and feather.write_dataframe as well? I searched the documentation but couldn't find any information about this. It was hoping something like the following might work:
import feather
import pandas as pd
dat = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
dat._metadata = "A dataset description ..."
feather.write_dataframe(dat, "pydf.feather")
Or else perhaps:
feather.write_dataframe(dat, "pydf.feather", "A dataset description ...")

Categories

Resources