Read Excel file with blank cells as Pandas dataframe with multiindex - python

Suppose there is a Excel file:
Is there a way to read it directly as a Pandas dataframe with multiindex, without filling blank spaces in the first column?

Data:
Code:
df = pd.read_excel('test.xlsx')
.ffill():
df.i0.ffill(inplace=True)
set_index():
df.set_index(['i0', 'i1'], inplace=True)

Related

Delete index column in pandas dataframe

How to delete index column in pandas Dataframe? I do have '0,1,2,3' numbers columnwise and I want to delete it to plot the heatmap of my dataframe.
To write:
df.to_csv(filename, index=False)
and to read from the CSV:
df.read_csv(filename, index_col=False)

Unable to read a column of an excel by Column Name using Pandas

Excel Sheet
I want to read values of the column 'Site Name' but in this sheet, the location of this tab is not fixed.
I tried,
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites', usecols=['Site Name'])
but got value error,
ValueError: Usecols do not match columns, columns expected but not found: ['RBS Name']
The output should be, List of RBS=['TestSite1', 'TestSite2',........]
try reading the excel columns by this
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
for i in df.index:
print(df['Site Name'][i])
You can first check dataframe without mentioning column name while reading excel file.
Then try to read column names.
Code is as below
import pandas as pd
df = pd.read_excel('TestFile.xlsx', sheet_name='List of problematic Sites')
print(df.head)
print(df.columns)

Reset labels in Pandas DataFrame, Python

I have a csv file with a wrong first row data. The names of labels are in the row number 2. So when I am storing this file to the DataFrame the names of labels are incorrect. And correct names become values of the row 0. Is there any function similar to reset_index() but for columns? PS I can not change csv file. Here is an image for better understanding. DataFrame with wrong labels
Hello let's suppose you csv file is data.csv :
Try this code:
import pandas as pd
#reading the csv file
df = pd.read_csv('data.csv')
#changing the headers name to integers
df.columns = range(df.shape[1])
#saving the data in another csv file
df.to_csv('data_without_header.csv',header=None,index=False)
#reading the new csv file
new_df = pd.read_csv('data_without_header.csv')
#plotting the new data
new_df.head()
If you do not care about the rows preceding your column names, you can pass in the "header" argument with the value of the correct row, for example if the proper column names are in row 2:
df = pd.read_csv('my_csv.csv', header=2)
Keep in mind that this will erase the previous rows from the DataFrame. If you still want to keep them, you can do the following thing:
df = pd.read_csv('my_csv.csv')
df.columns = df.iloc[2, :] # replace columns with values in row 2
Cheers.

To Re arrange the columns of dataframe from csv and add format to empty cells

I need to read a csv file in python and then re arrange the columns of csv and make a new dataframe made of the rearranged columns
I tried using list, but it might work slow..
Any alternative using numpy or pandas?
Edit:
I am rearranging the row using df.reindex()
I am currently doing this and thus exporting the df after leaving 4 rows blank
df_reindexed.to_excel(writer, sheet_name='Sheet1',startrow=4, index=False)
I need to add format and text to cells in those top 4 rows, corresponding to the column name in the following rows.
I know I can use iloc, but is there anyway to do it so that i can select a cell above a cell with specified name?
import pandas as pd
# read a CSV with pandas
src = "your/path"
old_df = pd.read_csv(src, sep=",")
# the columns that you want
desired_cols = ['c1','c2']
# pandas will return a new df only with the columns that you want
new_df = old_df[desired_cols]
Another way to do it is:
desired_cols = ['c1', 'c2', 'c3']
df_final = df_final.reindex(columns = desired_cols)

Pandas Data Frame saving into csv file

I wonder how to save a new pandas Series into a csv file in a different column. Suppose I have two csv files which both contains a column as a 'A'. I have done some mathematical function on them and then create a new variable as a 'B'.
For example:
data = pd.read_csv('filepath')
data['B'] = data['A']*10
# and add the value of data.B into a list as a B_list.append(data.B)
This will continue until all of the rows of the first and second csv file has been reading.
I would like to save a column B in a new spread sheet from both csv files.
For example I need this result:
colum1(from csv1) colum2(from csv2)
data.B.value data.b.value
By using this code:
pd.DataFrame(np.array(B_list)).T.to_csv('file.csv', index=False, header=None)
I won't get my preferred result.
Since each column in a pandas DataFrame is a pandas Series. Your B_list is actually a list of pandas Series which you can cast to DataFrame() constructor, then transpose (or as #jezrael shows a horizontal merge with pd.concat(..., axis=1))
finaldf = pd.DataFrame(B_list).T
finaldf.to_csv('output.csv', index=False, header=None)
And should csv have different rows, unequal series are filled with NANs at corresponding rows.
I think you need concat column from data1 with column from data2 first:
df = pd.concat(B_list, axis=1)
df.to_csv('file.csv', index=False, header=None)

Categories

Resources