I want to concat multiple dataframe with different sheet names and different columns, then export to excel.
column = [["Banana","apple"],
["Banana","Grape"],
["Apple","Pizza"]]
for i in range(3):
random_data = np.random.randint(10,25,size=(5,3))
df = pd.DataFrame(random_data, columns= column[i])
I hope there are three sheets, with different column names given.
I've tried something like pd.concat([sheet_df, df]), In this case, all the columns will show in that dataframe even that df doesn't have that column, but I don't want to.
I appreciate your help!
Use an ExcelWriter:
from pandas import ExcelWriter
...
sheets = ['Sheet1', 'Sheet2', 'Sheet3']
path = r'yourpath.xlsx'
with ExcelWriter(path, engine='openpyxl') as writer:
for cols, sheet in zip(column, sheets):
random_data = np.random.randint(10,25,size=(5,2))
df = pd.DataFrame(random_data, columns=cols)
df.to_excel(writer, sheet)
Related
I need to merge different excel sheets into one and also add a new column as a corresponding sheet name
The below code merge all sheets, but how do I add a sheet name as a column ??
import pandas as pd
df = pd.concat(pd.read_excel(r"C:\\Users\\xx\\FC_List.xlsx", sheet_name=None), ignore_index=True)
print(single_df)
df.to_csv(r"C:\\Users\\Users\\FC_List.csv", index=False)
below code fetch sheet name
import pandas as pd
df = pd.read_excel(r"C:\\Users\\cc\\FC_List.xlsx", None);
df.keys()
can u advise how to add both together as a new column
Split it into steps.
import pandas as pd
dfs = pd.read_excel(r"C:\\Users\\xx\\FC_List.xlsx", sheet_name=None)
df = pd.concat(dfs,keys=dfs.keys())
This will set your index as the column name, you can then reset it and rename it.
you could also do something like.
df = pd.concat([sheet.assign(src_sheet=sheet_name) for sheet_name,sheet in dfs.items()])
I am trying to copy multiple columns from one xlsx file to another, my code only works for copying only one column, how can I copy more than one?
column = data_Sheet['NDB_No']
with pd.ExcelWriter('parsedData.xlsx', mode='w') as writer:
column.to_excel(writer, sheet_name= "new sheet name", index = False)
Choose multiple columns from one dataframe like this:
list_of_columns = ['col1', 'col2', 'col3',...] ## Put your actual columns here
columns = data_Sheet[list_of_columns]
Now write this into another excel:
with pd.ExcelWriter('parsedData.xlsx', mode='w') as writer:
columns.to_excel(writer, sheet_name= "new sheet name", index = False)
So I have multiple data tables saved as pandas dataframes, and I want to output all of them into the same CSV for ease of access. However, I am not really sure the best way to go about this, as I want to maintain each dataframes inherent structure (ie columns and index), so I cant combine them all into 1 single dataframe.
Is there a method by which I can write them all at once with ease, akin the the usual pd.to_csv method?
Use mode='a':
df = pd.DataFrame(np.random.randint(0,100,(4,4)))
df1 = pd.DataFrame(np.random.randint(0,500,(5,5)))
df.to_csv('out.csv')
df1.to_csv('out.csv', mode='a')
!type out.csv
Output:
,0,1,2,3
0,0,0,36,53
1,5,38,17,79
2,4,42,58,31
3,1,65,41,57
,0,1,2,3,4
0,291,358,119,267,430
1,82,91,384,398,99
2,53,396,121,426,84
3,203,324,262,452,47
4,127,131,460,356,180
For Excel you can do:
from pandas import ExcelWriter
frames = [df1, df2, df3]
saveFile = 'file.xlsx'
writer = ExcelWriter(saveFile)
for x in range(len(frames)):
sheet_name = 'sheet' + str(x+1)
frames[x].to_excel(writer, sheet_name)
writer.save()
You should now have all of your dataframes in 3 different sheets: sheet1, sheet2 and sheet3.
I have an Excel file with 100 sheets. I need to extract data from each sheets column P beginning from row 7 & create a new file with all extracted data in same column. In my Output file, the data is located in different column, ie(Sheet 2's data in column R, Sheet 3's in column B)
How can I make the data in the same column in the new Output excel? Thank you.
ps. Combining all sheets' column P data into a single column in single sheet is enough for me
import pandas as pd
import os
Flat_Price = "Flat Pricing.xlsx"
dfs = pd.read_excel(Flat_Price, sheet_name=None, usecols = "P", skiprows=6, indexcol=1, sort=False)
df = pd.concat(dfs)
print(df)
writer = pd.ExcelWriter("Output.xlsx")
df.to_excel(writer, "Sheet1")
writer.save()
print (os.path.abspath("Output.xlsx"))
You need parameter header=None for default 0 column name:
dfs = pd.read_excel(Flat_Price,
sheet_name=None,
usecols = "P",
skiprows=6,
indexcol=1,
header=None)
Then is possible extract number from first level of MultiIndex, convert to integer and sorting by sort_index:
df =df.set_index([df.index.get_level_values(0).str.extract('(\d+)',expand=False).astype(int),
df.index.get_level_values(1)]).sort_index()
I'm trying to use pandas.read_excel() to import multiple worksheets from a spreadsheet. If I do not specify the columns with the parse_cols keyword I'm able to get all the data from the sheets, but I can't seem to figure out how to specify specific columns for each sheet.
import pandas as pd
workSheets = ['sheet1', 'sheet2', 'sheet3','sheet4']
cols = ['A,E','A,E','A,C','A,E']
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols='A:E') #This works fine
df = pd.read_excel(excelFile, sheetname=workSheets, parse_cols=cols) #This returns empty dataFrames
Does anyone know if there is a way, using read_excel(), to import multiple worksheets from excel, but also specify specific columns based on which worksheet?
Thanks.
When you pass a list of sheet names to read_excel, it returns a dictionary. You can achieve the same thing with a loop:
workSheets = ['sheet1', 'sheet2', 'sheet3', 'sheet4']
cols = ['A,E', 'A,E', 'A,C', 'A,E']
df = {}
for ws, c in zip(workSheets, cols):
df[ws] = pd.read_excel(excelFile, sheetname=ws, parse_cols=c)
Below is update for Python 3.6.5 & Pandas 0.23.4:
pd.read_excel(excelFile, sheet_name=ws, usecols=c)