How to print a specific row of a pandas DataFrame? - python

I have a massive DataFrame, and I'm getting the error:
TypeError: ("Empty 'DataFrame': no numeric data to plot", 'occurred at index 159220')
I've already dropped nulls, and checked dtypes for the DataFrame so I have no guess as to why it's failing on that row.
How do I print out just that row (at index 159220) of the DataFrame?

When you call loc with a scalar value, you get a pd.Series. That series will then have one dtype. If you want to see the row as it is in the dataframe, you'll want to pass an array like indexer to loc.
Wrap your index value with an additional pair of square brackets
print(df.loc[[159220]])

To print a specific row we have couple of pandas method
loc - It only get label i.e column name or Features
iloc - Here i stands for integer, actually row number
ix - It is a mix of label as well as integer
How to use for specific row
loc
df.loc[row,column]
For first row and all column
df.loc[0,:]
For first row and some specific column
df.loc[0,'column_name']
iloc
For first row and all column
df.iloc[0,:]
For first row and some specific column i.e first three cols
df.iloc[0,0:3]

Use ix operator:
print df.ix[159220]

If you want to display at row=159220
row=159220
#To display in a table format
display(df.loc[row:row])
display(df.iloc[row:row+1])
#To display in print format
display(df.loc[row])
display(df.iloc[row])

Sounds like you're calling df.plot(). That error indicates that you're trying to plot a frame that has no numeric data. The data types shouldn't affect what you print().
Use print(df.iloc[159220])

You can also index the index and use the result to select row(s) using loc:
row = 159220 # this creates a pandas Series (`row` is an integer)
row = [159220] # this creates a pandas DataFrame (`row` is a list)
df.loc[df.index[row]]
This is especially useful if you want to select rows by integer-location and columns by name. For example:
rows = 159220
cols = ['col2', 'col6']
df.loc[df.index[row], cols] # <--- OK
df.iloc[rows, cols] # <--- doesn't work
df.loc[cols].iloc[rows] # <--- OK but creates an intermediate copy

Related

Pandas - Find a column with a specific value in the entire dataframe

I have a DataFrame which has a few columns. There is a column with a value that only appears once in the entire dataframe. I want to write a function that returns the column name of the column with that specific value. I can manually find which column it is with the usual data exploration, but since I have multiple dataframes with the same properties, I need to be able to find that column for multiple dataframes. So a somewhat generalized function would be of better use.
The problem is that I don't know beforehand which column is the one I am looking for since in every dataframe the position of that particular column with that particular value is different. Also the desired columns in different dataframes have different names, so I cannot use something like df['my_column'] to extract the column.
Thanks
You'll need to iterate columns and look for the value:
def find_col_with_value(df, value):
for col in df:
if (df[col] == value).any():
return col
This will return the name of the first column that contains value. If value does not exist, it will return None.
Check the entire DataFrame for the specific value, checking any to see if it ever appears in a column, then slice the columns (or the DataFrame if you want the Series)
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.normal(0, 5, (100, 200)),
columns=[chr(i+40) for i in range(200)])
df.loc[5, 'Y'] = 'secret_value' # Secret value in column 'Y'
df.eq('secret_value').any().loc[lambda x: x].index
# or
df.columns[df.eq('secret_value').any()]
Index(['Y'], dtype='object')
I have another solution:
names = ds.columns
for i in names:
for j in ds[i]:
if j == 'your_value':
print(i)
break
Here you are collecting all the names of columns and then iterating all dataset while it will be found. Then print the name of column.

How do I assign to a new variable the value of a specific column at index 1

I have a CSV with the following columns: STATION, DATE, TEMP, etc.
I need to assign a value at index 0 of the DATE column to a new variable (let's call it first_observation). So, I need to specify the index of the column DATE. The DataFrame is called "Data" and the column "DATE".
I tried something like this:
data = pd.read_csv(fp, sep='\s+', skiprows=[1], na_values=['-9999'])
first_observation = data.loc[idx[0], 'DATE']
But it is not working.
The question differs from the answer pandas - how to access cell in pandas, equivalent of df[3,4] in R since it discusses a bit different terms. My column name is a string and row names are integers. In the other question, this corresponds as the answer:
first_obs = data.at['Column_name', 'Row_name']
The same can't apply or I don't know how to apply it in this case where the answer is:
first_obs = data['Column_name'][0] - the [0] being the index of the row
Please correct me if I am wrong.
DataFrame.loc method takes rows as first argument and columns as second argument.
Rows are also represented by index of the DataFrame, and index starts with 0.
So first row or index[0] is represented by [0:1] as the first argument. For the selected column, the second argument is 'DATE'.
Hnece, to get first row element of the 'DATE' column,
first_observation = data.loc[0:1, 'DATE']
can be used.
Note that you have to pass rows first and columns second as arguments for .loc and .at methods.

Python Pandas Don't Repeat Item Labels

I have a table: Table
How would I roll up Group, so that the group numbers don't repeat? I don't want to pd.df.groupby, as I don't want to summarize the other columns. I just want to not repeat item labels, sort of like an Excel pivot table.
Thanks!
In your dataframe it appears that 'Group' is in the index, the purpose of the index is to label each row. Therefore, is unusual and uncommon to have blank row indexes.
You you could so this:
df2.reset_index().set_index('Group', append=True).swaplevel(0,1,axis=0)
Or if you really must show blank row indexes you could do this, but you must change the dtype of the index to str.
df1 = df.set_index('Group').astype(str)
df1.index = df1.index.where(~df1.index.duplicated(),[' '])

How can I retrieve the label index of a pandas dataframe row given its integer index?

Forgive me if the answer is simplistic. I am a beginner of Pandas. Basically, I want to retrieve the label index of a row of my pandas dataframe. I know the integer index of it.
For example, suppose that I want to get the label index of the last row of my pandas dataframe df. I tried:
df.iloc[-1].index
But that retrieved the column headers of my dataframe, rather than the label index of the last row. How can I get that label index?
Passing a scalar to iloc will return a Series of the last row, putting the columns into the index. Pass iloc a list to return a dataframe which will allow you to grab the index how you normally would.
df.iloc[[-1]].index
You can also grab the index first and then get the last value with df.index[-1]

How to set in pandas the first column and row as index?

When I read in a CSV, I can say pd.read_csv('my.csv', index_col=3) and it sets the third column as index.
How can I do the same if I have a pandas dataframe in memory? And how can I say to use the first row also as an index? The first column and row are strings, rest of the matrix is integer.
You can try this regardless of the number of rows
df = pd.read_csv('data.csv', index_col=0)
Making the first (or n-th) column the index in increasing order of verboseness:
df.set_index(list(df)[0])
df.set_index(df.columns[0])
df.set_index(df.columns.tolist()[0])
Making the first (or n-th) row the index:
df.set_index(df.iloc[0].values)
You can use both if you want a multi-level index:
df.set_index([df.iloc[0], df.columns[0]])
Observe that using a column as index will automatically drop it as column. Using a row as index is just a copy operation and won't drop the row from the DataFrame.
Maybe try set_index()?
df = df.set_index([2])
Maybe try df = pd.read_csv(header = 0)

Categories

Resources