How to get a better dataframe display in jupyter notebook? - python

I'am taking an online course on data analysis using pandas and numpy.
in course videos when tutors display the dataframe in jupyter notebook, that's what they get:
what i need
but when i display the dataframe that's what i get:
what i get
i have tried print(df) , display(df) and just df in jupyter notebook and non of them seems to do any change.

Related

Finding output cells causing large file size in jupyter notebook

I have a jupyter notebook which has ~400 cells. The total file size is 8MB so I'd like to suppress the output cells that have a large size so as to reduce the overall file size.
There are quite a few possible output cells that could be causing this (mainly matplotlib and seaborn plots) so to avoid spending time on trial and error, is there a way of finding the size of each output cell? I'd like to keep as many output plots as possible as I'll be pushing the work to github for others to see.
My idea with nbformat spelled out for running in a cell in a Jupyter notebook cell to get the code cell numbers listed largest to smallest (it will fetch a notebook example first to have something to try it on):
############### Get test notebook ########################################
import os
notebook_example = "matplotlib3d-scatter-plots.ipynb"
if not os.path.isfile(notebook_example):
!curl -OL https://raw.githubusercontent.com/fomightez/3Dscatter_plot-binder/master/matplotlib3d-scatter-plots.ipynb
### Use nbformat to get estimate of output size from code cells. #########
import nbformat as nbf
ntbk = nbf.read(notebook_example, nbf.NO_CONVERT)
size_estimate_dict = {}
for cell in ntbk.cells:
if cell.cell_type == 'code':
size_estimate_dict[cell.execution_count] = len(str(cell.outputs))
out_size_info = [k for k, v in sorted(size_estimate_dict.items(), key=lambda item: item[1],reverse=True)]
out_size_info
(To have a place to easily run that code go here and click on the launch binder button. When the session spins up, open a new notebook and paste in the code and run it. Static form of the notebook is here.)
Example I tried didn't include Plotly, but it seemed to do similar using a notebook with all Plotly plots. I don't know how it will handle a mix though. It may not sort perfectly if different kinds.
Hopefully, this gives you an idea though how to do what you wondered. The code example could be further expanded to use the retrieved size estimates to have nbformat make a copy of the input notebook without the output showing for, say, the top ten largest code cells.

Pandas Dataframe Table Vertical Scrollbars in Jupyter Notebook

I have a large (vertically) pandas Dataframe that I would like to display as a nice table with (vertical) scrollbars in a jupyter notebook in vs code.
I have come across post that addresses the solution, but it is 5 years old, so was wondering if there is now a better method. Here is the post:
Pandas DataFrame Table Vertical Scrollbars
Right now I use the following to see all the data:
pd.set_option("display.max_rows", None)
But this shows all the rows which becomes problematic when, say >100 rows.
Just to be clear, i am looking for a scroll bar (as in the image):
I don't think there is a solution for plain Jupyter, but for the successor JupyterLab it's quite easy, not just for DataFrames but for all outputs.
It looks like this:
To enable this view you have to set pd.set_option("display.max_rows", None) and then you have to make a right-click on the blue column and choose Enable Scrolling for Outputs:

Show completely DataFrame PySpark

I have a dataframe and I'm using PySpark, when I'm showing the data, it not showing very well, like the next image:
enter image description here
How can I fix it? Thank You.
There's not a whole lot you can do. The issue is with line wrap. A common workaround is to use pandas
df.limit(5).toPandas().head()
If you're using a Jupyter Notebook, you can read more choices here: pyspark show dataframe as table with horizontal scroll in ipython notebook

Pandas DataFrame Display in Jupyter Notebook

I want to make my display tables bigger so users can see the tables better when that are used in conjunction with Jupyter RISE (slide shows).
How do I do that?
I don't need to show more columns, but rather I want the table to fill up the whole width of the Jupyter RISE slide.
Any idea on how to do that?
Thanks
If df is a pandas.DataFrame object.
You can do:
df.style.set_properties(**{'max-width': '200px', 'font-size': '15pt'})

Jupyter NoteBook Slide - Dataframe format and size

For a "code presenting session", I would like to transform my Jupyter NoteBook to slides. It works pretty well actually but for the Dataframe display.
I have currently this display in my notebook:
and I get that on the Slide:
Why? Is it possible to get the same rendering on both notebook/slides?
Thanks

Categories

Resources