Set the first column of pandas dataframe as header - python

This is my output DataFrame from reading an excel file
I would like my first column to be index/header
one Entity
0 two v1
1 three Prod
2 four 2015-05-27 00:00:00
3 five 2018-04-27 00:00:00
4 six Both
5 seven id
6 eight hello

To Set the first column of pandas data frame as header
set "header=1" while reading file
eg: df = pd.read_csv(inputfilePath, header=1)
set skiprows=1 while reading the file
eg: df = df.read_csv(inputfilepath, skiprows=1)
set iloc[0] in dataframe
eg: df.columns = df.iloc[0]
I hope this will help.

One way is using T twice
df=df.T.set_index(0).T

Related

Python pandas drop issue on a date header

I have an excel document that has three rows ahead of the main header(name of columns).
Excel document
When loading the data in pandas data frame using :
import pandas
df = pandas.read_excel('output/tracker.xlsx')
print(df)
I get this data(which is fine):
Date/Time:13/06/2022 Unnamed: 1 Unnamed: 2 Unnamed: 3
0 NaN NaN NaN NaN
1 NaN 2763 2763 NaN
2 NaN Site ID Company Site ID Region
3 203990318_700670803 203990318 689179 Nord-Ost
I do not need the first three rows so I run :
df = df.iloc[2:]
It removes the rows that have ID of 0 and 1.
But it doesn't remove the Date/Time:13/06/2022 Unnamed: 1 etc row.
How do I remove that top row?
Rather directly load the data without the useless rows using the skiprows parameter of pandas.read_excel:
df = pandas.read_excel('output/tracker.xlsx', skiprows=3)
I get this data(which is fine):
pandas.read_excel by default assumes 1st row is header, i.e. it does hold names for columns, which looking into snippet of your data is not case, use header=None to inform pandas that there are not names of column, but rather data, that is
import pandas
df = pandas.read_excel('output/tracker.xlsx',header=None)
print(df)
then you should be able to remove these as you already did

How to split a column in DataFrame with pandas?

I imported a dataset looks like this.
Peak, Trough
0 1857-06-01, 1858-12-01
1 1860-10-01, 1861-06-01
2 1865-04-01, 1867-12-01
3 1869-06-01, 1870-12-01
4 1873-10-01, 1879-03-01
5 1882-03-01, 1885-05-01
6 1887-03-01, 1888-04-01
it is a CSV file. But when I check the .shape, it is
(7, 1)
I thought CSV file can automatically be seperated by its commas, however this one doesn't work.
I want to split this column into two, sperated by comma, and also the column names as well. How can I do that?
Use 'sep' tag in read_csv
It's like:
df = read_csv(path, sep = ', ')
Same data to text file or csv and then use read_csv with parameter skipinitialspace=True and parse_dates for convert values to datetimes:
df = pd.read_csv('data.txt', skipinitialspace=True, parse_dates=[0,1])
print (df.head())
Peak Trough
0 1857-06-01 1858-12-01
1 1860-10-01 1861-06-01
2 1865-04-01 1867-12-01
3 1869-06-01 1870-12-01
4 1873-10-01 1879-03-01
print (df.dtypes)
Peak datetime64[ns]
Trough datetime64[ns]
dtype: object
If data are in excel in one column is possible use Series.str.split by first column, convert to datetimes and last set new columns names:
df = pd.read_excel('data.xlsx')
df1 = df.iloc[:, 0].str.split(', ', expand=True).apply(pd.to_datetime)
df1.columns = df.columns[0].split(', ')
print (df1.head())
Peak Trough
0 1857-06-01 1858-12-01
1 1860-10-01 1861-06-01
2 1865-04-01 1867-12-01
3 1869-06-01 1870-12-01
4 1873-10-01 1879-03-01
print (df1.dtypes)
Peak datetime64[ns]
Trough datetime64[ns]
dtype: object

Pandas Read csv function not alleging column headers well

I have a csv file with no headers. The first column is ID and so on... Here is how I read that file in pandas.
rss_content=pd.read_csv("rss_content.csv",header=None,names=["id","feedId","url","imageUrl","title","desc","author","createTimestamp"])
However when the file gets imported I see the first two columns of the data become index and the Id column gets assigned to third column and so on. Basically the headers are shifted by two columns to right and first two columns have no header.
Why is that and how to fix it?
Assuming you have the following CSV file:
1,2,3,4,5
11,22,33,44,55
If you specify too less column names the rest columns will become index columns:
In [1]: fn = r'D:\temp\.data\41066716.csv'
In [2]: df = pd.read_csv(fn, header=None, names=['a','b','c'])
In [3]: df
Out[3]:
a b c
1 2 3 4 5
11 22 33 44 55

Can pandas read a transposed CSV?

Can pandas read a transposed CSV? Here's the file (note I'd also like to select a subset of columns):
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
Would like to get this DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
pd.read_csv('file.csv', index_col=0, header=None).T
In addition, if your file looks like this:
"some-line-you-want-to-skip"
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
It is possible to do the following:
df = pd.read_csv(filename, skiprows=1, header=None).T # Read csv, and transpose
df.columns = df.iloc[0] # Set new column names
df.drop(0,inplace=True) # Drop duplicated row
This will also end up with the df looking the way you want

Pandas indexing and accessing columns by names

I am trying to access pandas dataframe by column names after indexing the df with a specific column and it returns incorrect column values.
import pandas as pd
rs =pd.read_csv('rs.txt', header="infer", sep="\t", names=['id', 'exp','fov','cycle', 'color', 'values'], index_col=2)
rs.cycle.head()
I am indexing the df here with 'fov' and I want to access the 'cycle' column, it gives me the color column instead. I think I am missing something here?
EDIT
The first few lines of the input file are:
6 3 1 G 0.96593
6 3 1 O 0.88007
6 3 1 R 0.94305
6 3 2 B 0.90554
6 3 2 G 0.93146
I think the problem arises because your data file has 5 columns and your names list has 6 elements. To verify, check the first few values in the id column- these will all be set to 6 if I am right. The First few items in the exp column will have the value 3.
To fix this, read your input file like so:
rs =pd.read_csv('rs.txt', header="infer", sep="\t", names=['exp','fov','cycle', 'color', 'values'], index_col=2
Pandas will automatically insert row identifiers.

Categories

Resources