Saving data to an Excel file but too many entries

Saving data to an Excel file but too many entries - python

I am using this script to grab a CSV from a local microcontroller and am storing the information in an Excel file. The issue I am running into is I hit the limit for how many entries can be in an Excel file so I need to find a way to adapt the script to say something like
if excel_file == full:
open new excel sheet and print data there
Does anyone have any ideas?
Here is the exact error in case anyone is curious:
ValueError('This sheet is too large! Your sheet size is: 1744517, 27 Max sheet size is: 1048576, 16384')

Solved by putting everything into a pandas dataframe and using
df_1 = df.iloc[:1000000,:]
df_2 = df.iloc[1000001:,:]
which splits the df after a million entries then added them to different sheets of the same Excel file.

Related

Openpyxl Unable to edit data in large excel files (Keep the original formatting of cells like cell background)

Problem: I have to change few cells value in a excel and save it.
file details:
file size:80 mb
no of sheets: 12
average columns per sheet: 2800
average rows per sheet: 1200
Tried solution:
wb = openpyxl.load_workbook(template_path)
issue with this is whenever i am trying this there is memory issue
cant use read only and write only as i have to update current excel sheet.
Suggest other library if that solves this issue

Extracting data from excel using python and writing to an empty excel file

I have a large set of data that I am trying to extract from multiple excel files that have multiple sheets using python and then write that data into a new excel file. I am new with python and have tried to use various tutorials to come up with code that can help me automate the process. However, I have reached a point where I am stuck and need some guidance on how to write the data that I extract to a new excel file. If someone could point me in the write direction, it would be greatly appreciated. See code below:
import os
from pandas.core.frame import DataFrame
path = r"Path where all excel files are located"
os.chdir(path)
for WorkingFile in os.listdir(path):
if os.path.isfile(WorkingFile):
DataFrame = pd.read_excel(WorkingFile, sheet_name = None, header = 12, skipfooter = 54)
DataFrame.to_excel(r'Empty excel file where to write all the extracted data')
When I execute the code I get an error "AttributeError: 'dict' object has no attribute 'to_excel'. So I am not sure how to rectify this error, any help would be appreciated.
Little bit more background on what I am trying to do. I have a folder with about 50 excel files, each file might have multiple sheets. The data I need is located on a table that consists of one row and 14 columns and is in the same location on each file and each sheet. I need to pull that data and compile it into a single excel file. When I run the code above and and a print statement, it is showing me the exact data I want but when I try to write it to excel it doesn't work.
Thanks for help in advance!

Not sure why you're importing DataFrame instead of pandas. Looks like your code is incomplete. Below code will clear the doubts you have. (Not include any conditions for excluding non excel files/dir etc )
import pandas as pd
import os
path = "Dir path to excel files" #Path
df = pd.DataFrame() # Initialize empty df
for file in os.listdir(path):
data = pd.read_excel(path + file) # Read each file from dir
df = df.append(data, ignore_index=True) # and append to empty df
# process df
df.to_excel("path/file.xlsx")

Pandas: ValueError: Worksheet index 0 is invalid, 0 worksheets found

Simple problem that has me completely dumbfounded. I am trying to read an Excel document with pandas but I am stuck with this error:
ValueError: Worksheet index 0 is invalid, 0 worksheets found
My code snippet works well for all but one Excel document linked below. Is this an issue with my Excel document (which definitely has sheets when I open it in Excel) or am I missing something completely obvious?
Excel Document
EDIT - Forgot the code. It is quite simply:
import pandas as pd
df = pd.read_excel(FOLDER + 'omx30.xlsx')
FOLDER Is the absolute path to the folder in which the file is located.

Your file is saved as Strict Open XML Spreadsheet (*.xlsx). Because it shares the same extension as Excel Workbook, it isn't obvious that the format is different. Open the file in Excel and Save As. If the selected option is Strict Open XML Spreadsheet (*.xlsx), change it to Excel Workbook (*.xlsx), save it and try loading it again with pandas.

EDIT: with the info that you have the original .csv, re-do your cleaning and save it as a .csv from Excel; or, if you prefer, pd.read_csv the original, and do your cleaning from the CLI with pandas directly.

It maybe your excel delete the first sheet of index 0, and now the actual index is > 0, but the param sheet_name of function pd.read_excel is 0, so the error raised.

It seems there indeed is a problem with my excel file. We have not been able to figure out what though. For now the path of least resistance is simply saving as a .csv in excel and using pd.read_csv to read this instead.

I have a CSV file with many columns and many rows. How do I create a one column one Excel sheet from Python?

This is my database:
https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with++Multiple+Types+of+Sound+Recordings
This database consist of training data and test data. The training data consists of many features; one column is one feature. I intend to convert each column into a separate Excel sheet.
The following is my Python code that I formulated to convert the entire text file into a CSV. But I intend to convert the entire text file into Excel sheets. For example, the entire text file contains 10 columns, so I want to create 10 Excel sheets with each column separated into one Excel sheet. Can any expert guide me on how to do it? I am completely new to Python so I hope someone can help me.
import pandas as pd
read_file = pd.read_csv (r'C://Users/RichardStone/Pycharm/Project/train_data.txt')
read_file.to_csv (r'C://Users/RichardStone/Pycharm/Project/train_data.csv', index=None)

Try this.
sheetnames = list()
for i in range(len(read_file.columns)):
sheetnames.append('Sheet' + str(i+1))
for i in range(len(read_file.columns)):
read_file.iloc[:, i].to_excel(sheetnames[i] + '.xlsx', index = False)

Loop through list of pandas dataframes and write them to different tabs in one Excel file (from Jupyter notebook)

I have a dataframe in my Jupyter notebook that I can successfully write to an Excel file with pandas ExcelWriter, but I'd rather split the dataframe into smaller dataframes (based on its index), then loop through them to write each to a different sheet in one Excel file. This seems syntactically correct but my code cell just runs without ever finishing:
path = r'/root/notebooks/my_file.xlsx'
writer = ExcelWriter(path)
sheets = df.index.unique().tolist()
for sheet in sheets:
df.loc[sheet].to_excel(writer, sheet_name=sheet, index=False)
writer.save()
I've tried a few different approaches without any luck. Am I missing something simple?

It is hard to determine the issue in your system without the error message (as you have said, you have an infinite loop). You might check the size of your dataset as you are putting only one row for each excel sheet. If you have plenty of rows, then you will have that many sheets.
However, I tried your code with my own dataset and there are some errors that can be fixed anyway.
path = 'raw/test_so.xlsx'
writer = pd.ExcelWriter(path)
sheets = df.index.unique().tolist()
for sheet in sheets:
df.loc[[sheet]].to_excel(writer, sheet_name=str(sheet), index=False)
writer.save()
See the df.loc[[sheet]] for each sheet to still get the dataframe format on excel (with column headers).
If your dataframe index is in integer, make sure that you do sheet_name=str(sheet), as it can't accept integer for the sheet name.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving data to an Excel file but too many entries - python

Solved by putting everything into a pandas dataframe and using df_1 = df.iloc[:1000000,:] df_2 = df.iloc[1000001:,:] which splits the df after a million entries then added them to different sheets of the same Excel file.

Related

Openpyxl Unable to edit data in large excel files (Keep the original formatting of cells like cell background)

Extracting data from excel using python and writing to an empty excel file

Pandas: ValueError: Worksheet index 0 is invalid, 0 worksheets found

I have a CSV file with many columns and many rows. How do I create a one column one Excel sheet from Python?

Loop through list of pandas dataframes and write them to different tabs in one Excel file (from Jupyter notebook)

Categories

Resources