Pandas DataFrame Display in Jupyter Notebook - python

I want to make my display tables bigger so users can see the tables better when that are used in conjunction with Jupyter RISE (slide shows).
How do I do that?
I don't need to show more columns, but rather I want the table to fill up the whole width of the Jupyter RISE slide.
Any idea on how to do that?
Thanks

If df is a pandas.DataFrame object.
You can do:
df.style.set_properties(**{'max-width': '200px', 'font-size': '15pt'})

Related

Pandas Dataframe Table Vertical Scrollbars in Jupyter Notebook

I have a large (vertically) pandas Dataframe that I would like to display as a nice table with (vertical) scrollbars in a jupyter notebook in vs code.
I have come across post that addresses the solution, but it is 5 years old, so was wondering if there is now a better method. Here is the post:
Pandas DataFrame Table Vertical Scrollbars
Right now I use the following to see all the data:
pd.set_option("display.max_rows", None)
But this shows all the rows which becomes problematic when, say >100 rows.
Just to be clear, i am looking for a scroll bar (as in the image):
I don't think there is a solution for plain Jupyter, but for the successor JupyterLab it's quite easy, not just for DataFrames but for all outputs.
It looks like this:
To enable this view you have to set pd.set_option("display.max_rows", None) and then you have to make a right-click on the blue column and choose Enable Scrolling for Outputs:

Show completely DataFrame PySpark

I have a dataframe and I'm using PySpark, when I'm showing the data, it not showing very well, like the next image:
enter image description here
How can I fix it? Thank You.
There's not a whole lot you can do. The issue is with line wrap. A common workaround is to use pandas
df.limit(5).toPandas().head()
If you're using a Jupyter Notebook, you can read more choices here: pyspark show dataframe as table with horizontal scroll in ipython notebook

pandas style options to latex

Pandas has two nice functionalities I use a lot - that's the df.style... option and the df.to_latex() call. But I do not know how to combine both.
The .style option makes looking at tables much more pleasant. It lets you grasp information rapidly because of visual enhancements. This works perfectly in a jupyter notebook, for example. Here is an arbitrary example I copied from the documentation.
df.style.bar(subset=['A', 'B'], align='mid', color=['#d65f5f', '#5fba7d'])
This yields:
However, as nice as this looks in a jupyter notebook, I can not put this to latex code. I get the following error message instead, if chaining a 'to_latex()' call at the end of my visual enhancements: AttributeError: 'Styler' object has no attribute. Does that mean it's simply not possible, because the displayed colorful table is not a DataFrame object any more, but now a Styler object, now?
Is there any workaround? At least with easier tables, let's say where only cells have a single background color with respect to their value, instead of a 'complicated' bar graph.
As of pandas v1.3.0 these are now combined in pandas.io.formats.style.Styler.to_latex.
Instead of trying to export this formatting to bulky LaTeX markup, I would go the route explored already over in TeX.SE: add the functionality as LaTeX code that draws similar formatting based on the same data.
Red/green value bars:
Partially coloring cell background with histograms
Coloured cell backgrounds:
Are there an easy way to coloring tables depending on the value in each cell?
Shaded backgrounds (similar, but points to excellent package pgfplotstable):
Parametrize shading in table through TikZ
I don't know how you can use latex directly but you can use df.to_html(). Once you get the html you can style it as you like and then use any of the following.
Python's html2latex
Website :HTML to TEX Converter
Python's weasyprint
Pandoc
Here is one latex package that I found on googling that talks about embedding html in latex.
https://alvinalexander.com/blog/post/latex/use-html-package-in-latex-control-your-output/
I once only used latex and embedded images of tables when I used it.

How to sit the size for seaborn pairplot chart?

I am trying to draw an sns.pairplot with one value in x_axis but multiple values in y_axis
This is what I got. All the figure in one row. Does Anyone know how I can get a bigger plot? or in multiple columns?
The chart is here:
As per the comment to your question, the reason that the plot is small, is because it does not fit into the Jupyter Notebook output cell. Try right clicking and opening in a new tab or saving it to a file.
If you want display the figure within a Notebook, you can make several calls to sns.pairplot with a subset of columns each time. You could also use sns.FaceGrid with plt.scatter for more granular control over what is plotted where.

ipython notebook pandas max allowable columns

I have a simple csv file with ten columns!
When I set the following option in the notebook and print my csv file (which is in a pandas dataframe) it doesn't print all the columns from left to right, it prints the first two, the next two underneath and so on.
I used this option, why isn't it working?
pd.option_context("display.max_rows",1,"display.max_columns",100)
Even this doesn't seem to work:
pandas.set_option('display.max_columns', None)
I assume you want to display your data in the notebook than the following options work fine for me (IPython 2.3):
import pandas as pd
from IPython.display import display
data = pd.read_csv('yourdata.txt')
Either directly set the option
pd.options.display.max_columns = None
display(data)
Or, use the set_option method you showed actually works fine as well
pd.set_option('display.max_columns', None)
display(data)
If you don't want to set this options for the whole script use the context manager
with pd.option_context('display.max_columns', None):
display(data)
If this doesn't help, you might give a minimal example to reproduce your issue.
You can also display all the data by asking pandas to return HTML markup, and then having IPython render the HTML table.
import pandas as pd
from IPython.display import HTML
data = pd.read_csv('yourdata.csv')
HTML(data.to_html())
Using IPython 3.0.0 and Python 3.4, I found that display(data) as described by #Jakob will render as a table with up/down and left/right scroll bars, but the table is still wider than the cell and some columns are off-screen to the right. To see all the data, one must collapse the cell - which adds scroll bars. Consequently you have a scrolling box in a scrolling box, which is not ideal as you have to shift focus between the doubled-up scroll bars to navigate all the way through the data.
Using the HTML method, you render the enormous table as-is without any scroll bars. This cell can then be collapsed down to show only a single vertical and horizontal bar, which is more user-friendly.
The caveat to using HTML is the table takes longer to render. I was only using a ~150x50 matrix and the speed difference was noticeable, but not inconvenient. If you have an enormous table, don't use this method to display the entire thing at once. That said, if you do have an enormous table, rendering the whole thing at once is obviously going to be a bad idea however you try to do it.
I found this question as one of the first hits on Google. In jupyter lab,
pandas.set_option("display.max_columns", None)
Now seems to work fine - my example was 32 columns, it used to be truncated and is not any more.

Categories

Resources