I have an xls file which containes multiple sheets, i want to merge all this sheet in one and only one sheet.
import numpy as np
import pandas as pd
import glob
import os
import xlrd
df = pd.concat(map(pd.read_excel, glob.glob(os.path.join('', "bank.xls"))))
Tried this got a warning
WARNING *** file size (25526815) not 512 + multiple of sector size (512)
And nothing Happened
I want to concat all this sheet
This works for me (just tested).
import pandas as pd
import sys
input_file = 'C:\\your_path\\Book1.xlsx'
output_file = 'C:\\your_path\\BookFinal.xlsx'
df = pd.read_excel(input_file, None)
all_df = []
for key in df.keys():
all_df.append(df[key])
data_concatenated = pd.concat(all_df,axis=0,ignore_index=True)
writer = pd.ExcelWriter(output_file)
data_concatenated.to_excel(writer,sheet_name='merged',index=False)
writer.save()
Related
So I have code like this:
pip install pyxlsb
import numpy as np
import pandas as pd
import openpyxl as xl
import csv as cs
import xlwings as xw
import pyxlsb as pyx
from pyxlsb import open_workbook
df=[]
with open_workbook(path) as wb:
with wb.get_sheet('XYZ') as XYZ:
for row in XYZ.rows():
df.append([item.v for item in row])
df1 = pd.DataFrame(df[1:], columns=df[0])
df2 = pd.read_excel(path1)
df2 = pd.DataFrame(df2)
This code is working so far but I can not find the solution to delete all rows in sheet XYZ (.xlsb file) and insert all data from df2 (.xlsx file) and than save all this in current .xlsb file (path)
I am trying to iterate through json files in a folder and append them all into one pandas dataframe.
If I say
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
df_all = pd.DataFrame()
with open("building_data/rooms.json") as file:
data = json.load(file)
df = json_normalize(data['rooms'])
df_y.append(df, ignore_index=True)
I get a dataframe with the data from the one file. If I turn this thinking into a for loop, I have tried
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
df_all = pd.DataFrame()
for file in os.listdir(directory):
with open(directory_in_str+'/'+filename) as file:
data = json.load(file)
df = json_normalize(data['rooms'])
df_all.append(df, ignore_index=True)
print(df_all)
This returns an empty dataframe. Does anyone know why this is happening? If I print df before appending it, it prints the correct values, so I am not sure why it is not appending.
Thank you!
Instead of append next DataFrame I would try to join them like that:
if df_all.empty:
df_all = df
else:
df_all = df_all.join(df)
When joining DataFrames, you can specify on what they should be joined - on index or on specific (key) column, as well as how (default option is similar to appending - 'left').
Here's docs about pandas.DataFrame.join.
In these instances I load everything from json into a list by appending each file's returned dict onto that list. Then I pass the list to pandas.DataFrame.from_records (docs)
In this case the source would become something like...
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os
directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)
json_data = []
for file in os.listdir(directory):
with open(directory_in_str+'/'+filename) as file:
data = json.load(file)
json_data.append( json_normalize(data['rooms']) )
df_all = pandas.DataFrame.from_records( json_data )
print(df_all)
I am still learning python. I am trying to import multiple workbooks and all the worksheets into one data frame.
Here is what I have so far:
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
for file in glob.glob("*.xls"): #import every file that ends in .xls
df = pd.read_excel(file)
all_data = all_data.append(df, ignore_index = True)
all_data.shape #12796 rows with 19 columns # we will have to find a way to check if this is accurate
I am having real trouble finding any documentation that will confirm/explain whether or not this code imports all the data sheets in every workbook. Some of these files have 15-20 sheets
Here is a link to where I found the glob explanation: http://pbpython.com/excel-file-combine.html
Any and all advice is greatly appreciated. I am still really new to R and Python so if you could explain this in as much detail as possible I would greatly appreciate it!
What you are missing is importing all the sheets in the workbook.
import pandas as pd
import numpy as np
import os #checking the working directory
print(os.getcwd())
all_data = pd.DataFrame() #creating an empty data frame
rows = 0
for file in glob.glob("*.xls"): #import every file that ends in .xls
# df = pd.read_excel(file).. This will import only first sheet
xls = pd.ExcelFile(file)
sheets = xls.sheet_names # To get names of all the sheets
for sheet_name in sheets:
df = pd.read_excel(file, sheetname=sheet_name)
rows += df.shape[0]
all_data = all_data.append(df, ignore_index = True)
print(all_data.shape[0]) # Now you will get all the rows which should be equal to rows
print(rows)
I have multiple folders and subfolders, containing Excel workbooks with multiple tabs. How do I concat all the information into 1 pandas dataframe?
Here is my code so far:
from pathlib import Path
import os
import pandas as pd
import glob
p = Path(r'C:\Users\user1\Downloads\key_folder')
globbed_files = p.glob('**/**/*.xlsx')
df = []
for file in globbed_files:
frame = pd.read_excel(file, sheet_name = None, ignore_index=True)
frame['File Path'] = os.path.basename(file)
df.append(frame)
# df = pd.concat([d.values() for d in df], axis = 0, ignore_index=True)
df = pd.concat(df, axis=0, ignore_index = True)
This is generating the following error:
cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
When I ran pd.DataFrame(df), I saw that each Excel spreadsheet tab is a separate column. The cells contain the data and headers in text form, forming a really long string.
Any help is appreciated! Thank you!
Here is the final code:
from pathlib import Path
import os
import pandas as pd
import glob
import xlrd
p = Path('path here')
globbed_files = p.glob('**/**/*.xlsx')
list_dfs = []
dfs = []
for file in globbed_files:
xls = xlrd.open_workbook(file, on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(file,sheet_name)
df['Sheet Name'] = sheet_name
list_dfs.append(df)
dfs = pd.concat(list_dfs,axis=0)
dfs.to_excel('merged spreadsheet.xlsx')
I am attempting to use the range of a list, specifically col_test and then I would like to use that range to specify the cells to be filled on the new worksheet.
I want the list col_test to fill the second column in the new sheet starting at row 6.
Here I am trying to use the "write" function to do this but I do not know the correct parameters to use.
import os
import glob
import pandas as pd
for csvfile in glob.glob(os.path.join('.', '*.csv')):
df = pd.read_csv(csvfile)
col_test = df['Test #'].tolist()
col_retest = df['Retest #'].tolist()
from xlrd import open_workbook
from xlutils.copy import copy
rb = open_workbook("Excel FDT Master_01_update.xlsx")
wb = copy(rb)
s = wb.get_sheet(3)
s.write(range_of_col_test, col_test)
wb.save('didthiswork.xls')