I am stuck here, but I it's a two part question. Looking at the output of .describe(include = 'all'), not all columns are showing; how do I get all columns to show?
This is a common problem that I have all of the time with Spyder, how to have all columns to show in Console. Any help is appreciated.
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
import seaborn as sns
mydata = pd.read_csv("E:\ho11.csv")
mydata.head()
print(mydata.describe(include="all", exclude = None))
mydata.info()
OUTPUT:
code output
Solution
You could use either of the following methods:
Method-1:
source
pd.options.display.max_columns = None
Method-2:
source
pd.set_option('display.max_columns', None)
# to reset this
pd.reset_option('display.max_columns')
Method-3:
source
# assuming df is your dataframe
pd.set_option('display.max_columns', df.columns.size)
# to reset this
pd.reset_option('display.max_columns')
Method-4:
source
# assuming df is your dataframe
pd.set_option('max_columns', df.columns.size)
# to reset this
pd.reset_option('max_columns')
To not wrap the output into multiple lines do this
source
pd.set_option('display.expand_frame_repr', False)
References
I will recommend you to explore the following resources for more details and examples.
How to show all of columns name on pandas dataframe?
How do I expand the output display to see more columns of a pandas DataFrame?
How to show all columns / rows of a Pandas Dataframe?
Since you are using Spyder the easiest thing to do would be:
myview = mydata.describe()
Then you can inspect 'myview' in the variable explorer.
Using pd.set_option listed column names in the console truncated in the middle with three dots.
To print a full list of the column names from a dataframe to the console in Spyder:
list(df.columns)
Related
I am reading output of dataframe.isnull().sum() but it shows as collapsed. How can I expand the cell so that i can see all columsn NAs count. There are total 81 columsn but i am seeing only few
Try add this in your code :
import pandas as pd
pd.set_option('display.max_rows', None)
You can also print the entire data frame as well:
print(df.to_string())
I am new to plotting charts in python. I've been told to use Pandas for that, using the following command. Right now it is assumed the csv file has headers (time,speed, etc). But how can I change it to when the csv file doesn't have headers? (data starts from row 0)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("P1541350772737.csv")
#df.head(5)
df.plot(figsize=(15,5), kind='line',x='timestamp', y='speed') # scatter plot
You can specify x and y by the index of the columns, you don't need names of the columns for that:
Very simple: df.plot(figsize=(15,5), kind='line',x=0, y=1)
It works if x column is first and y column is second and so on, columns are numerated from 0
For example:
The same result with the names of the columns instead of positions:
I may havve missinterpreted your question but II'll do my best.
Th problem seems to be that you have to read a csv that have no header but you want to add them. I would use this code:
cols=['time', 'speed', 'something', 'else']
df = pd.read_csv('useful_data.csv', names=cols, header=None)
For your plot, the code you used should be fine with my correction. I would also suggest to look at matplotlib in order to do your graph.
You can try
df = pd.read_csv("P1541350772737.csv", header=None)
with the names-kwarg you can set arbitrary column headers, this implies silently headers=None, i.e. reading data from row 0.
You might also want to check the doc https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Pandas is more focused on data structures and data analysis tools, it actually supports plotting by using Matplotlib as backend. If you're interested in building different types of plots in Python you might want to check it out.
Back to Pandas, Pandas assumes that the first row of your csv is a header. However, if your file doesn't have a header you can pass header=None as a parameter pd.read_csv("P1541350772737.csv", header=None) and then plot it as you are doing it right now.
The full list of commands that you can pass to Pandas for reading a csv can be found at Pandas read_csv documentation, you'll find a lot of useful commands there (such as skipping rows, defining the index column, etc.)
Happy coding!
For most commands you will find help in the respective documentation. Looking at pandas.read_csv you'll find an argument names
names : array-like, default None
List of column names to use. If file contains no header row, then you should explicitly
pass header=None.
So you will want to give your columns names by which they appear in the dataframe.
As an example: Suppose you have this data file
1, 2
3, 4
5, 6
Then you can do
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("data.txt", names=["A", "B"], header=None)
print(df)
df.plot(x="A", y="B")
plt.show()
which outputs
A B
0 1 2
1 3 4
2 5 6
Currently self-teaching Python and running into some issues. My challenge requires me to count the number of unique values in a column of an excel spreadsheet in which the rows have no missing values. Here is what I've got so far but I can't seem to get it to work:
import xlrd
import pandas as pd
workbook = xlrd.open_workbook("*name of excel spreadsheet*")
worksheet = workbook.sheet_by_name("*name of specific sheet*")
pd.value_counts(df.*name of specific column*)
s = pd.value_counts(df.*name of specific column*)
s1 = pd.Series({'nunique': len(s), 'unique values': s.index.tolist()})
s.append(s1)
print(s)
Thanks in advance for any help.
Use the built in to find the unique in the columns:
sharing an example with you:
import pandas as pd
df=pd.DataFrame(columns=["a","b"])
df["a"]=[1,3,3,3,4]
df["b"]=[1,2,2,3,4]
print(df["a"].unique())
will give the following result:
[1 3 4]
So u can store it as a list to a variable if you like, with:
l_of_unique_vals=df["a"].unique()
and find its length or do anything as you like
df = pd.read_excel("nameoffile.xlsx", sheet_name=name_of_sheet_you_are_loading)
#in the line above we are reading the file in a pandas dataframe and giving it a name df
df["column you want to find vals from"].unique()
First you can use Pandas read_exel and then unique such as #Inder suggested.
import pandas as pd
df = pd.read_exel('name_of_your_file.xlsx')
print(df['columns'].unique())
See more here.
When I use style to format a pandas dataframe in a Jupyter notebook, the name of the columns (df.columns.name) is not displayed.
How can I fix this?
Set up:
import pandas as pd
from IPython.display import HTML, display
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns = [-1,0,1], index=[-1,0,1])
df.index.name = 'A'
df.columns.name = 'B'
This is how the data look like:
display(df) # Has name of columns 'B'
Now, I want to add percentage formatting to all columns:
display(df.style.format("{:.1%}"))
but I have lost the name of columns!
I try your code, not find your problems, I get this:
I think you update your pandas or jupyter version will fix it.
If I use DataFrame.set_index, I get this result:
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df.set_index('name')
Note the unnecessary row... I know it does this because it reserves the upper left cell for the column title, but I don't care about it, and it makes my table look somewhat unprofessional if I use it in a presentation.
If I don't use DataFrame.set_index, the extra row is gone, but I get numeric row indices, which I don't want:
If I use to_html(index=False) then I solve those problems, but the first column isn't bold:
import pandas as pd
from IPython.display import HTML
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
HTML(df.to_html(index=False))
If I want to control styling to make the names boldface, I guess I could use the new Styler API via HTML(df.style.do_something_here().render()) but I can't figure out how to achieve the index=False functionality.
What's a hacker to do? (besides construct the HTML myself)
I poked around in the source for Styler and figured it out; if you set df.index.names = [None] then this suppresses the "extra" row (along with the column header that I don't really care about):
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df = df.set_index('name')
df.index.names = [None]
df
These days pandas actually has a keyword for this:
df.to_html(index_names=False)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html