Can pandas read a transposed CSV? - python

Can pandas read a transposed CSV? Here's the file (note I'd also like to select a subset of columns):
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
Would like to get this DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9

pd.read_csv('file.csv', index_col=0, header=None).T

In addition, if your file looks like this:
"some-line-you-want-to-skip"
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
It is possible to do the following:
df = pd.read_csv(filename, skiprows=1, header=None).T # Read csv, and transpose
df.columns = df.iloc[0] # Set new column names
df.drop(0,inplace=True) # Drop duplicated row
This will also end up with the df looking the way you want

Related

How to read an excel file with data and some empty cells in panda's python?

I have an excel file with huge dataset. I tried to read the excel file using the below command using pandas.
df = pd.read_csv(f'{cwd}/data.csv', keep_default_na=False, header=None)
print(df)
However the empty rows found in the csv file is missing in the output. I get something like below.
Input: Output from the code:
1 1
2 2
3 3
4
4 5
5 6
6
You need to specify the parameter skip_blank_lines=False from pandas.read_csv. Here's a fixed version of your code:
import pandas as pd
df = pd.read_csv(f'{cwd}/data.csv', header=None, na_filter=False, skip_blank_lines=False)
df
Outputs:
Or:
import pandas as pd
df = pd.read_csv(f'{cwd}/data.csv', header=None, skip_blank_lines=False)
df
Outputs:

Drop / move header of a dataframe into first row

I have a dataframe who looks like this:
A B 10
0 A B 20
1 C A 10
so the headers are not the real headers of the dataframe (I have to map them from another dataframe), how can I drop the headers in this case into the first row, that it looks like this:
0 1 2
0 A B 10
1 A B 20
2 C A 10
Note that pd.read_csv(..., header=None) leads to an error in this case, I don't know why, so I am searching for a solution to fix it after I load the file.
The best is avoid it by header=None parameter in read_csv:
df = pd.read_csv(file, header=None)
If not possible append columns names converted to one row DataFrame to original data and then set range to columns names:
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
print (df)
0 1 2
0 A B 10
1 A B 20
2 C A 10
Let us try reset_index for fixing
df = df.T.reset_index().T

Set the first column of pandas dataframe as header

This is my output DataFrame from reading an excel file
I would like my first column to be index/header
one Entity
0 two v1
1 three Prod
2 four 2015-05-27 00:00:00
3 five 2018-04-27 00:00:00
4 six Both
5 seven id
6 eight hello
To Set the first column of pandas data frame as header
set "header=1" while reading file
eg: df = pd.read_csv(inputfilePath, header=1)
set skiprows=1 while reading the file
eg: df = df.read_csv(inputfilepath, skiprows=1)
set iloc[0] in dataframe
eg: df.columns = df.iloc[0]
I hope this will help.
One way is using T twice
df=df.T.set_index(0).T

How to read all rows of a csv file using pandas in python?

I am using the pandas module for reading the data from a .csv file.
I can write out the following code to extract the data belonging to an individual column as follows:
import pandas as pd
df = pd.read_csv('somefile.tsv', sep='\t', header=0)
some_column = df.column_name
print some_column # Gives the values of all entries in the column
However, the file that I am trying to read now has more than 5000 columns and writing out the statement
some_column = df.column_name
is now not feasible. How can I get all the column values so that I can access them using indexing?
e.g to extract the value present at the 100th row and the 50th column, I should be able to write something like this:
df([100][50])
Use DataFrame.iloc or DataFrame.iat, but python counts from 0, so need 99 and 49 for select 100. row and 50. column:
df = df.iloc[99,49]
Sample - select 3. row and 4. column:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,10],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 10 6 3
print (df.iloc[2,3])
10
print (df.iat[2,3])
10
Combination for selecting by column name and position of row is possible by Series.iloc or Series.iat:
print (df['D'].iloc[2])
10
print (df['D'].iat[2])
10
Pandas has indexing for dataframes, so you can use
df.iloc[[index]]["column header"]
the index is in a list as you can pass multiple indexes at one in this way.

Removing header column from pandas dataframe

I have the foll. dataframe:
df
A B
0 23 12
1 21 44
2 98 21
How do I remove the column names A and B from this dataframe? One way might be to write it into a csv file and then read it in specifying header=None. is there a way to do that without writing out to csv and re-reading?
I think you cant remove column names, only reset them by range with shape:
print df.shape[1]
2
print range(df.shape[1])
[0, 1]
df.columns = range(df.shape[1])
print df
0 1
0 23 12
1 21 44
2 98 21
This is same as using to_csv and read_csv:
print df.to_csv(header=None,index=False)
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(header=None,index=False)), header=None)
0 1
0 23 12
1 21 44
2 98 21
Next solution with skiprows:
print df.to_csv(index=False)
A,B
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(index=False)), header=None, skiprows=1)
0 1
0 23 12
1 21 44
2 98 21
How to get rid of a header(first row) and an index(first column).
To write to CSV file:
df = pandas.DataFrame(your_array)
df.to_csv('your_array.csv', header=False, index=False)
To read from CSV file:
df = pandas.read_csv('your_array.csv')
a = df.values
If you want to read a CSV file that doesn't contain a header, pass additional parameter header:
df = pandas.read_csv('your_array.csv', header=None)
I had the same problem but solved it in this way:
df = pd.read_csv('your-array.csv', skiprows=[0])
Haven't seen this solution yet so here's how I did it without using read_csv:
df.rename(columns={'A':'','B':''})
If you rename all your column names to empty strings your table will return without a header.
And if you have a lot of columns in your table you can just create a dictionary first instead of renaming manually:
df_dict = dict.fromkeys(df.columns, '')
df.rename(columns = df_dict)
You can first convert the DataFrame to an Numpy array, using this:
s1=df.iloc[:,0:2].values
Then, convert the numpy array back to DataFrame:
s2=pd.DataFrame(s1)
This will return a DataFrame with no Columns.
enter image description here
This works perfectly:
To get the dataframe without the header use:
totalRow = len(df.index)
df.iloc[1: totalRow]
Or you can use the second method like this:
totalRow = df.index.stop
df.iloc[1, totalRow]

Categories

Resources