Removing header column from pandas dataframe

Removing header column from pandas dataframe - python

I have the foll. dataframe:
df
A B
0 23 12
1 21 44
2 98 21
How do I remove the column names A and B from this dataframe? One way might be to write it into a csv file and then read it in specifying header=None. is there a way to do that without writing out to csv and re-reading?

I think you cant remove column names, only reset them by range with shape:
print df.shape[1]
2
print range(df.shape[1])
[0, 1]
df.columns = range(df.shape[1])
print df
0 1
0 23 12
1 21 44
2 98 21
This is same as using to_csv and read_csv:
print df.to_csv(header=None,index=False)
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(header=None,index=False)), header=None)
0 1
0 23 12
1 21 44
2 98 21
Next solution with skiprows:
print df.to_csv(index=False)
A,B
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(index=False)), header=None, skiprows=1)
0 1
0 23 12
1 21 44
2 98 21

How to get rid of a header(first row) and an index(first column).
To write to CSV file:
df = pandas.DataFrame(your_array)
df.to_csv('your_array.csv', header=False, index=False)
To read from CSV file:
df = pandas.read_csv('your_array.csv')
a = df.values
If you want to read a CSV file that doesn't contain a header, pass additional parameter header:
df = pandas.read_csv('your_array.csv', header=None)

I had the same problem but solved it in this way:
df = pd.read_csv('your-array.csv', skiprows=[0])

Haven't seen this solution yet so here's how I did it without using read_csv:
df.rename(columns={'A':'','B':''})
If you rename all your column names to empty strings your table will return without a header.
And if you have a lot of columns in your table you can just create a dictionary first instead of renaming manually:
df_dict = dict.fromkeys(df.columns, '')
df.rename(columns = df_dict)

You can first convert the DataFrame to an Numpy array, using this:
s1=df.iloc[:,0:2].values
Then, convert the numpy array back to DataFrame:
s2=pd.DataFrame(s1)
This will return a DataFrame with no Columns.
enter image description here

This works perfectly:
To get the dataframe without the header use:
totalRow = len(df.index)
df.iloc[1: totalRow]
Or you can use the second method like this:
totalRow = df.index.stop
df.iloc[1, totalRow]

Related

Increment a value of column in pandas/csv file when the row is appended in python

I have this code which selects a column from a csv file and appends it as a row to another csv file:
def append_pandas(s,d):
import pandas as pd
df = pd.read_csv(s, sep=';', header=None)
df_t = df.T
df_t.iloc[0:1, 0:1] = 'Time Point'
df_t.columns = df_t.iloc[0]
df_new = df_t.drop(0)
pdb = pd.read_csv(d, sep=';')
newpd = pdb.append(df_new)
from pandas import DataFrame
newpd.to_csv(d, sep=';')
As you can see, there is a Time Point column, and every time the row is appended, I want the value in this column to increment by 1. For example, when the first row is appended, it is 0, the second row will have 1, the third row will have 3 etc.
Could you please help with this?
The resulting file looks like this:
Time Point A B C ...
1 23 65 98
2 10 24 85
3 1 54 72
4 33 77 0
5 7 73 81
6 122 43 5 # <- row added with new Time Point
P.S. The Row which is being appended doesn't have a Time Point value and looks like this:
Well ID Cell Count
A 100
B 200
C 54
D 77
E 73
F 49
The resulting file shouldn't have the headers of the first file added either ('Well ID','Cell Count'); so just the values of the 'Cell Count' column.
Please, help :(

Try the following code and check the output CSV file:
import io
import pandas as pd
# Load an empty CSV file and read it as dataframe
empty_csv = 'Time Point;A;B;C;D;E;F'
df = pd.read_csv(io.StringIO(empty_csv), sep=';')
# Load a CSV file which will be added to `df` defined above
add_csv = 'Well ID;Cell Count\nA;100\nB;200\nC;54\nD;77\nE;73\nF;49\n'
df_add = pd.read_csv(io.StringIO(add_csv), sep=';')
def append_a_row(df, df_add):
df_add = df_add.set_index('Well ID').T
df_add.insert(0, 'Time Point', len(df)+1)
return df.append(df_add)
df_new = append_a_row(df, df_add)
# Save as csv
d = 'path_to_the_output.csv'
df_new.to_csv(d, sep=';', index=False)

Formatting Excel to DataFrame

excel sheet snapshot
Please take a look at my excel sheet snapshot attached on the top-left end. When I create a DataFrame from this sheet my first column and row are filled with NaN. I need to skip this blank row and column to select the second row and column for DataFrame creation.
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
0 NaN ID SCOPE TASK
1 NaN 34 XX something_1
2 NaN 534 SS something_2
3 NaN 43 FF something_3
4 NaN 32 ZZ something_4
I want my DataFrame to look like this
0 ID SCOPE TASK
1 34 XX something_1
2 534 SS something_2
3 43 FF something_3
4 32 ZZ something_4
I tried this code but didn't get what I expected
df = pd.read_excel("Book1.xlsx")
df.columns = df.iloc[0]
df.drop(df.index[1])
df.head()
NaN ID SCOPE TASK
0 NaN ID SCOPE TASK
1 NaN 34 XX something_1
2 NaN 534 SS something_2
3 NaN 43 FF something_3
4 NaN 32 ZZ something_4
Still I need to drop the first column and 0 the index row from here.
Can anyone help?

Specify the row number which will be the header (column names) of the dataframe using header parameter; in your case it is 1. Also, specify the column names using usecols parameter, in your case, they are 'ID', 'SCOPE', and 'TASK'.
df = pd.read_excel('your_excel_file.xlsx', header=1, usecols=['ID','SCOPE', 'TASK'])
Check out header and usecols from here.

if its an entire column you wish to delete, try this -
del df["name of the column"]
here's an eg.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,2),columns=['a','b'])
# created a random dataframe 'df' with 'a' and 'b' as columns
del df['a'] # deleted column 'a' using 'del'
print(df) # no column 'a' in 'df' now

You can actually do it all while reading your excel file with pandas. You want to :
skip the first line : use the argument skiprows=0
use the columns from B to D : use the argument usecols="B:D"
use the row #2 as the header (I assumed here) : use the argument header=1 (0 indexed)
Answer :
df = pd.read_excel("Book1.xlsx", skiprows=0, usecols="B:D", header=1)
Edit : you don't even need to use skiprows when using header.
df = pd.read_excel("Book1.xlsx", usecols="B:D", header=1)

Preferred method for setting a row as a header after transposing a DataFrame

What's the best practice for such a simple task?
Is there any way to speed up the messy lines of code with a simple method?
Let's say we have the following raw data:
#Example Data
df = pd.read_csv('example.csv', header = None, sep=';')
display(df)
To manipulate the data appropriately we transpose and edit:
#Transpose the Dataframe
df = df.transpose()
display(df)
#Set the first row as a header
header_row = 0 #<-- Input, should be mandatory? could it be default?
#Set the column names to the first row values
df.columns = df.iloc[header_row]
#Get rid of the unnecessary row
df = df.drop(header_row).reset_index(drop=True)
#Display final result
display(df)

you can simply do that by
df.set_index(0,inplace=True)
df.T
an example of same kind is given below
df2=pd.DataFrame({"Col":[9,8,0],"Row":[6,4,22],"id":[26,55,27]})
df2
Col Row id
0 9 6 26
1 8 4 55
2 0 22 27
df2.set_index("id",inplace=True)
df2.T
id 26 55 27
Col 9 8 0
Row 6 4 22

How to split column names and it's value to different columns instead of all in one (Pandas)

I'm using pandas and I'm trying to read a csv that looks like the following:
And I'm trying to separate the column names and its values. So, my desirable end results is: df.head()
A B C D
12 32 43 23
33 42 32 44
11 43 65 23
55 66 77 88
I have tried using both
df = pd.read_csv("test.csv",sep=",") and df = pd.read_csv("test.csv",delimiter=",") when reading the csv file but it's not working.
Any ideas?

You are looking for str.split.
I created sample data like yours:
Code:
import pandas as pd
df = pd.read_csv('stack.csv')
df[['A','B','C','D']] = df['A,B,C,D'].str.split(',',expand=True)
#output:
# A,B,C,D A B C D
#0 1,2,3,4 1 2 3 4
#1 5,6,7,8 5 6 7 8
#2 9,10,11,12 9 10 11 12
del df['A,B,C,D'] #deleting the first column(A,B,C,D) as you don't need
df
Output:

Delimiter and sep attributes of read_csv function works at the tabular level, as the delimiter by which u want to split is present with in a cell it doesnt apply on that.
Instead further processing can be done. This code below does well.
df=pd.read_csv("nba_logreg.csv")
df2= df['a,b,c,d'].str.split(',')
df2.columns=df.columns.str.split(',')[0]
df2

I have created an exact same csv file. With the command: pd.read_csv ("test.csv"), it works for me the way you want it to. What exactly does not work for you or what kind of error do you get?

Drop / move header of a dataframe into first row

I have a dataframe who looks like this:
A B 10
0 A B 20
1 C A 10
so the headers are not the real headers of the dataframe (I have to map them from another dataframe), how can I drop the headers in this case into the first row, that it looks like this:
0 1 2
0 A B 10
1 A B 20
2 C A 10
Note that pd.read_csv(..., header=None) leads to an error in this case, I don't know why, so I am searching for a solution to fix it after I load the file.

The best is avoid it by header=None parameter in read_csv:
df = pd.read_csv(file, header=None)
If not possible append columns names converted to one row DataFrame to original data and then set range to columns names:
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
print (df)
0 1 2
0 A B 10
1 A B 20
2 C A 10

Let us try reset_index for fixing
df = df.T.reset_index().T

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing header column from pandas dataframe - python

I have the foll. dataframe: df A B 0 23 12 1 21 44 2 98 21 How do I remove the column names A and B from this dataframe? One way might be to write it into a csv file and then read it in specifying header=None. is there a way to do that without writing out to csv and re-reading?

I had the same problem but solved it in this way: df = pd.read_csv('your-array.csv', skiprows=[0])

You can first convert the DataFrame to an Numpy array, using this: s1=df.iloc[:,0:2].values Then, convert the numpy array back to DataFrame: s2=pd.DataFrame(s1) This will return a DataFrame with no Columns. enter image description here

This works perfectly: To get the dataframe without the header use: totalRow = len(df.index) df.iloc[1: totalRow] Or you can use the second method like this: totalRow = df.index.stop df.iloc[1, totalRow]

Related

Increment a value of column in pandas/csv file when the row is appended in python

Formatting Excel to DataFrame

Preferred method for setting a row as a header after transposing a DataFrame

How to split column names and it's value to different columns instead of all in one (Pandas)

Drop / move header of a dataframe into first row

Categories

Resources