Can pandas read a transposed CSV? Here's the file (note I'd also like to select a subset of columns):
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
Would like to get this DataFrame:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
pd.read_csv('file.csv', index_col=0, header=None).T
In addition, if your file looks like this:
"some-line-you-want-to-skip"
A,x,x,x,x,1,2,3
B,x,x,x,x,4,5,6
C,x,x,x,x,7,8,9
It is possible to do the following:
df = pd.read_csv(filename, skiprows=1, header=None).T # Read csv, and transpose
df.columns = df.iloc[0] # Set new column names
df.drop(0,inplace=True) # Drop duplicated row
This will also end up with the df looking the way you want
Related
I have an excel file with huge dataset. I tried to read the excel file using the below command using pandas.
df = pd.read_csv(f'{cwd}/data.csv', keep_default_na=False, header=None)
print(df)
However the empty rows found in the csv file is missing in the output. I get something like below.
Input: Output from the code:
1 1
2 2
3 3
4
4 5
5 6
6
You need to specify the parameter skip_blank_lines=False from pandas.read_csv. Here's a fixed version of your code:
import pandas as pd
df = pd.read_csv(f'{cwd}/data.csv', header=None, na_filter=False, skip_blank_lines=False)
df
Outputs:
Or:
import pandas as pd
df = pd.read_csv(f'{cwd}/data.csv', header=None, skip_blank_lines=False)
df
Outputs:
I have a dataframe who looks like this:
A B 10
0 A B 20
1 C A 10
so the headers are not the real headers of the dataframe (I have to map them from another dataframe), how can I drop the headers in this case into the first row, that it looks like this:
0 1 2
0 A B 10
1 A B 20
2 C A 10
Note that pd.read_csv(..., header=None) leads to an error in this case, I don't know why, so I am searching for a solution to fix it after I load the file.
The best is avoid it by header=None parameter in read_csv:
df = pd.read_csv(file, header=None)
If not possible append columns names converted to one row DataFrame to original data and then set range to columns names:
df = df.columns.to_frame().T.append(df, ignore_index=True)
df.columns = range(len(df.columns))
print (df)
0 1 2
0 A B 10
1 A B 20
2 C A 10
Let us try reset_index for fixing
df = df.T.reset_index().T
This is my output DataFrame from reading an excel file
I would like my first column to be index/header
one Entity
0 two v1
1 three Prod
2 four 2015-05-27 00:00:00
3 five 2018-04-27 00:00:00
4 six Both
5 seven id
6 eight hello
To Set the first column of pandas data frame as header
set "header=1" while reading file
eg: df = pd.read_csv(inputfilePath, header=1)
set skiprows=1 while reading the file
eg: df = df.read_csv(inputfilepath, skiprows=1)
set iloc[0] in dataframe
eg: df.columns = df.iloc[0]
I hope this will help.
One way is using T twice
df=df.T.set_index(0).T
I am using the pandas module for reading the data from a .csv file.
I can write out the following code to extract the data belonging to an individual column as follows:
import pandas as pd
df = pd.read_csv('somefile.tsv', sep='\t', header=0)
some_column = df.column_name
print some_column # Gives the values of all entries in the column
However, the file that I am trying to read now has more than 5000 columns and writing out the statement
some_column = df.column_name
is now not feasible. How can I get all the column values so that I can access them using indexing?
e.g to extract the value present at the 100th row and the 50th column, I should be able to write something like this:
df([100][50])
Use DataFrame.iloc or DataFrame.iat, but python counts from 0, so need 99 and 49 for select 100. row and 50. column:
df = df.iloc[99,49]
Sample - select 3. row and 4. column:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,10],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 10 6 3
print (df.iloc[2,3])
10
print (df.iat[2,3])
10
Combination for selecting by column name and position of row is possible by Series.iloc or Series.iat:
print (df['D'].iloc[2])
10
print (df['D'].iat[2])
10
Pandas has indexing for dataframes, so you can use
df.iloc[[index]]["column header"]
the index is in a list as you can pass multiple indexes at one in this way.
I have the foll. dataframe:
df
A B
0 23 12
1 21 44
2 98 21
How do I remove the column names A and B from this dataframe? One way might be to write it into a csv file and then read it in specifying header=None. is there a way to do that without writing out to csv and re-reading?
I think you cant remove column names, only reset them by range with shape:
print df.shape[1]
2
print range(df.shape[1])
[0, 1]
df.columns = range(df.shape[1])
print df
0 1
0 23 12
1 21 44
2 98 21
This is same as using to_csv and read_csv:
print df.to_csv(header=None,index=False)
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(header=None,index=False)), header=None)
0 1
0 23 12
1 21 44
2 98 21
Next solution with skiprows:
print df.to_csv(index=False)
A,B
23,12
21,44
98,21
print pd.read_csv(io.StringIO(u""+df.to_csv(index=False)), header=None, skiprows=1)
0 1
0 23 12
1 21 44
2 98 21
How to get rid of a header(first row) and an index(first column).
To write to CSV file:
df = pandas.DataFrame(your_array)
df.to_csv('your_array.csv', header=False, index=False)
To read from CSV file:
df = pandas.read_csv('your_array.csv')
a = df.values
If you want to read a CSV file that doesn't contain a header, pass additional parameter header:
df = pandas.read_csv('your_array.csv', header=None)
I had the same problem but solved it in this way:
df = pd.read_csv('your-array.csv', skiprows=[0])
Haven't seen this solution yet so here's how I did it without using read_csv:
df.rename(columns={'A':'','B':''})
If you rename all your column names to empty strings your table will return without a header.
And if you have a lot of columns in your table you can just create a dictionary first instead of renaming manually:
df_dict = dict.fromkeys(df.columns, '')
df.rename(columns = df_dict)
You can first convert the DataFrame to an Numpy array, using this:
s1=df.iloc[:,0:2].values
Then, convert the numpy array back to DataFrame:
s2=pd.DataFrame(s1)
This will return a DataFrame with no Columns.
enter image description here
This works perfectly:
To get the dataframe without the header use:
totalRow = len(df.index)
df.iloc[1: totalRow]
Or you can use the second method like this:
totalRow = df.index.stop
df.iloc[1, totalRow]