How to set in pandas the first column and row as index? - python

When I read in a CSV, I can say pd.read_csv('my.csv', index_col=3) and it sets the third column as index.
How can I do the same if I have a pandas dataframe in memory? And how can I say to use the first row also as an index? The first column and row are strings, rest of the matrix is integer.

You can try this regardless of the number of rows
df = pd.read_csv('data.csv', index_col=0)

Making the first (or n-th) column the index in increasing order of verboseness:
df.set_index(list(df)[0])
df.set_index(df.columns[0])
df.set_index(df.columns.tolist()[0])
Making the first (or n-th) row the index:
df.set_index(df.iloc[0].values)
You can use both if you want a multi-level index:
df.set_index([df.iloc[0], df.columns[0]])
Observe that using a column as index will automatically drop it as column. Using a row as index is just a copy operation and won't drop the row from the DataFrame.

Maybe try set_index()?
df = df.set_index([2])

Maybe try df = pd.read_csv(header = 0)

Related

Transpose column/row, change column name and reset index

I have a Pandas DF and I need to:
Transpose my columns to rows,
Transform these rows to indexes,
Set the actual columns as titles for each columns (and not as part of rows)
How can I do that?
Here is my DF before the transpostion:
Here is my Df after my failed transposition:
After transposing, use:
df.columns = df.iloc[0]
to set column headers to the first row.
Then use the 'set_axis()' function to set indices for your rows. An explanation for this function is linked
here

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

How to find if a values exists in all rows of a dataframe?

I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.
Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.
You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task
You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.

Using pandas, how do I loop through a dataframe row by row but with each row being its own dataframe

I am aware of:
for index, row in dataframe.iterrows():
But I wish for each row to be its own dataframe instead of the type series. How would I go about doing this? Do I have to convert it or this a better way of looping through?
If you need each row as DataFrame
for i in range(len(df)):
df.iloc[[i],:]
for index in df.index :
df.loc[[index],:]
I would suggest loading the dataframes into a list of dataframes, then you can access them individually.
df_list = []
for index, row in df.iterrows():
df_list.append(df[index:index+1])
then you can access the list df_list[0] for example
You can use numpy.split
dfs=np.split(df,list(range(1,len(df))),axis=0)
Depends on the usage.
From what I could guess (a function that only applies on a DataFrame), you have two options :
Option 1
convert your row to a frame :
df_row = row.to_frame()
Option 2
group your df (doing something as silly as resetting its index to have uniquely dataFrames of only 1 row), and apply the function to it :
df.reset_index().groupby('index_0').apply(func)

How to print a specific row of a pandas DataFrame?

I have a massive DataFrame, and I'm getting the error:
TypeError: ("Empty 'DataFrame': no numeric data to plot", 'occurred at index 159220')
I've already dropped nulls, and checked dtypes for the DataFrame so I have no guess as to why it's failing on that row.
How do I print out just that row (at index 159220) of the DataFrame?
When you call loc with a scalar value, you get a pd.Series. That series will then have one dtype. If you want to see the row as it is in the dataframe, you'll want to pass an array like indexer to loc.
Wrap your index value with an additional pair of square brackets
print(df.loc[[159220]])
To print a specific row we have couple of pandas method
loc - It only get label i.e column name or Features
iloc - Here i stands for integer, actually row number
ix - It is a mix of label as well as integer
How to use for specific row
loc
df.loc[row,column]
For first row and all column
df.loc[0,:]
For first row and some specific column
df.loc[0,'column_name']
iloc
For first row and all column
df.iloc[0,:]
For first row and some specific column i.e first three cols
df.iloc[0,0:3]
Use ix operator:
print df.ix[159220]
If you want to display at row=159220
row=159220
#To display in a table format
display(df.loc[row:row])
display(df.iloc[row:row+1])
#To display in print format
display(df.loc[row])
display(df.iloc[row])
Sounds like you're calling df.plot(). That error indicates that you're trying to plot a frame that has no numeric data. The data types shouldn't affect what you print().
Use print(df.iloc[159220])
You can also index the index and use the result to select row(s) using loc:
row = 159220 # this creates a pandas Series (`row` is an integer)
row = [159220] # this creates a pandas DataFrame (`row` is a list)
df.loc[df.index[row]]
This is especially useful if you want to select rows by integer-location and columns by name. For example:
rows = 159220
cols = ['col2', 'col6']
df.loc[df.index[row], cols] # <--- OK
df.iloc[rows, cols] # <--- doesn't work
df.loc[cols].iloc[rows] # <--- OK but creates an intermediate copy

Categories

Resources