When I use style to format a pandas dataframe in a Jupyter notebook, the name of the columns (df.columns.name) is not displayed.
How can I fix this?
Set up:
import pandas as pd
from IPython.display import HTML, display
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns = [-1,0,1], index=[-1,0,1])
df.index.name = 'A'
df.columns.name = 'B'
This is how the data look like:
display(df) # Has name of columns 'B'
Now, I want to add percentage formatting to all columns:
display(df.style.format("{:.1%}"))
but I have lost the name of columns!
I try your code, not find your problems, I get this:
I think you update your pandas or jupyter version will fix it.
Related
I want to read the data after the string "Executed Trade". I want to do that dynamically. Not using "skiprows". I know openpyxl can be an option. But I am still struggling to do so. Could you guys please help me with that thing cause I have many files like the one is shown in image.
Try:
import pandas as pd
#change the Excel filename and the two mentions of 'col1' for whatever the column is
df = pd.read_excel('dictatorem.xlsx')
df = df.iloc[df.col1[df.col1 == 'Executed Trades'].index.tolist()[0]+1:]
df.columns = df.iloc[0]
df = df[1:]
df = df.reset_index(drop=True)
print(df)
Example input/output:
I am stuck here, but I it's a two part question. Looking at the output of .describe(include = 'all'), not all columns are showing; how do I get all columns to show?
This is a common problem that I have all of the time with Spyder, how to have all columns to show in Console. Any help is appreciated.
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
import seaborn as sns
mydata = pd.read_csv("E:\ho11.csv")
mydata.head()
print(mydata.describe(include="all", exclude = None))
mydata.info()
OUTPUT:
code output
Solution
You could use either of the following methods:
Method-1:
source
pd.options.display.max_columns = None
Method-2:
source
pd.set_option('display.max_columns', None)
# to reset this
pd.reset_option('display.max_columns')
Method-3:
source
# assuming df is your dataframe
pd.set_option('display.max_columns', df.columns.size)
# to reset this
pd.reset_option('display.max_columns')
Method-4:
source
# assuming df is your dataframe
pd.set_option('max_columns', df.columns.size)
# to reset this
pd.reset_option('max_columns')
To not wrap the output into multiple lines do this
source
pd.set_option('display.expand_frame_repr', False)
References
I will recommend you to explore the following resources for more details and examples.
How to show all of columns name on pandas dataframe?
How do I expand the output display to see more columns of a pandas DataFrame?
How to show all columns / rows of a Pandas Dataframe?
Since you are using Spyder the easiest thing to do would be:
myview = mydata.describe()
Then you can inspect 'myview' in the variable explorer.
Using pd.set_option listed column names in the console truncated in the middle with three dots.
To print a full list of the column names from a dataframe to the console in Spyder:
list(df.columns)
I have a best practice question. Today i learned how to Read and write files in Pandas. How to create a Table, how to add a column and row and how to drop them.
I have an excel file with the following content:
I create a new Column "Price_average" and I average "Price_min" and "Price_max" and output it as output_1.xlsx
#!/usr/bin/env python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
df = pd.read_excel('original.xlsx')
print (df)
df['Price_average'] = (df.Price_min + df.Price_max)/2
df.to_excel('output_1.xlsx', sheet_name='sheet1', index=False)
print (df)
I then prop the columns "Price_min" and "price_max" with:
df = df.drop(['Price_min', 'Price_max'], axis=1)
And lets say I want to Create This table now:
I can either delete "Age" and "Price_average" and and swap "email" with "brand" or can I simply select the Columns I want to create a new spreadsheet?
Whats the best and cleanest way to do it? To subtract the unwanted Columns from the file and rearrange and if wanted rename the columns or Pick and choose the needed columns and create a new file with them in the right order. Any suggestions? And what's the cleanest way to solve it?
You can try this,
selected = df[['Age', 'Price_average', 'Email', 'Brand']]
If you want to change column names,
renamed = selected.rename(columns={'Brand': 'brand', 'Email':'email'})
Currently self-teaching Python and running into some issues. My challenge requires me to count the number of unique values in a column of an excel spreadsheet in which the rows have no missing values. Here is what I've got so far but I can't seem to get it to work:
import xlrd
import pandas as pd
workbook = xlrd.open_workbook("*name of excel spreadsheet*")
worksheet = workbook.sheet_by_name("*name of specific sheet*")
pd.value_counts(df.*name of specific column*)
s = pd.value_counts(df.*name of specific column*)
s1 = pd.Series({'nunique': len(s), 'unique values': s.index.tolist()})
s.append(s1)
print(s)
Thanks in advance for any help.
Use the built in to find the unique in the columns:
sharing an example with you:
import pandas as pd
df=pd.DataFrame(columns=["a","b"])
df["a"]=[1,3,3,3,4]
df["b"]=[1,2,2,3,4]
print(df["a"].unique())
will give the following result:
[1 3 4]
So u can store it as a list to a variable if you like, with:
l_of_unique_vals=df["a"].unique()
and find its length or do anything as you like
df = pd.read_excel("nameoffile.xlsx", sheet_name=name_of_sheet_you_are_loading)
#in the line above we are reading the file in a pandas dataframe and giving it a name df
df["column you want to find vals from"].unique()
First you can use Pandas read_exel and then unique such as #Inder suggested.
import pandas as pd
df = pd.read_exel('name_of_your_file.xlsx')
print(df['columns'].unique())
See more here.
If I use DataFrame.set_index, I get this result:
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df.set_index('name')
Note the unnecessary row... I know it does this because it reserves the upper left cell for the column title, but I don't care about it, and it makes my table look somewhat unprofessional if I use it in a presentation.
If I don't use DataFrame.set_index, the extra row is gone, but I get numeric row indices, which I don't want:
If I use to_html(index=False) then I solve those problems, but the first column isn't bold:
import pandas as pd
from IPython.display import HTML
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
HTML(df.to_html(index=False))
If I want to control styling to make the names boldface, I guess I could use the new Styler API via HTML(df.style.do_something_here().render()) but I can't figure out how to achieve the index=False functionality.
What's a hacker to do? (besides construct the HTML myself)
I poked around in the source for Styler and figured it out; if you set df.index.names = [None] then this suppresses the "extra" row (along with the column header that I don't really care about):
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df = df.set_index('name')
df.index.names = [None]
df
These days pandas actually has a keyword for this:
df.to_html(index_names=False)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html