pandas dataframe to excel - python

I am trying to save to an excel file from a panda dataframe. After some methods of scraping the data I end up having the final method, where I generate the data to an excel file.
The problem is that I want the sheet_name to be an input variable for each scrape I do.
But with the code below, I got the error:
ValueError: No engine for filetype: ''
def datacollection(self,filename):
tbl= self.find_element_by_xpath("/html/body/form/div[3]/div[2]/div[3]/div[3]/div[1]/table").get_attribute('outerHTML')
df=pd.read_html(tbl)
print(df[0])
print(type(df[0]))
final=pd.DataFrame(df[0])
final.to_excel(r'C:\Users\ADMIN\Desktop\PROJECTS\Python',sheet_name=f'{filename}')

I believe the problem here is that you are asking it to write to a file called Python, without any file extension.
You could name it Python.xlsx for example.
Or, if Python was the directory name, then it should be Python/somefilename.xlsx
EDIT: Given that you were trying to name the file after filename, you are using the sheet_name parameter wrong, which names the sheet instead of the file. Ditch the sheet_name and change the last line to:
final.to_excel(fr'C:\Users\ADMIN\Desktop\PROJECTS\Python\{filename}.xlsx')

You need to give a file extension for the excel file:
final.to_excel(r'C:\Users\ADMIN\Desktop\PROJECTS\Python.xlsx',sheet_name=f'{filename}')

SOLUTION:
If using f' the path access must be changed from \ to / as:
def datacollection(self,filename):
tbl= self.find_element_by_xpath("/html/body/form/div[3]/div[2]/div[3]/div[3]/div[1]/table").get_attribute('outerHTML')
df=pd.read_html(tbl)
print(df[0])
print(type(df[0]))
final=pd.DataFrame(df[0])
final.to_excel(f'C:/Users/ADMIN/Desktop/PROJECTS/Python/{filename}.xlsx')

This might solve the error !!
final.to_excel(f'C:\Users\ADMIN\Desktop\PROJECTS\Python\{filename}.xlsx')

Related

Pandas: ValueError: Worksheet index 0 is invalid, 0 worksheets found

Simple problem that has me completely dumbfounded. I am trying to read an Excel document with pandas but I am stuck with this error:
ValueError: Worksheet index 0 is invalid, 0 worksheets found
My code snippet works well for all but one Excel document linked below. Is this an issue with my Excel document (which definitely has sheets when I open it in Excel) or am I missing something completely obvious?
Excel Document
EDIT - Forgot the code. It is quite simply:
import pandas as pd
df = pd.read_excel(FOLDER + 'omx30.xlsx')
FOLDER Is the absolute path to the folder in which the file is located.
Your file is saved as Strict Open XML Spreadsheet (*.xlsx). Because it shares the same extension as Excel Workbook, it isn't obvious that the format is different. Open the file in Excel and Save As. If the selected option is Strict Open XML Spreadsheet (*.xlsx), change it to Excel Workbook (*.xlsx), save it and try loading it again with pandas.
EDIT: with the info that you have the original .csv, re-do your cleaning and save it as a .csv from Excel; or, if you prefer, pd.read_csv the original, and do your cleaning from the CLI with pandas directly.
It maybe your excel delete the first sheet of index 0, and now the actual index is > 0, but the param sheet_name of function pd.read_excel is 0, so the error raised.
It seems there indeed is a problem with my excel file. We have not been able to figure out what though. For now the path of least resistance is simply saving as a .csv in excel and using pd.read_csv to read this instead.

Python - create CSV file

I am using the code below to create a file using Python. I don't get any error message when I run it but at the same time no file gets created
df_csv = pd.read_csv (r'X:\Google Drive\Personal_encrypted\Training\Ex_Files_Python_Excel\Exercise Files\names.csv', header=None)
df_csv.to_csv = (r"C:\temp\modified_names.csv")
You are setting df_csv.to_csv to a tuple, which is not how you call methods in python.
Solution:
df_csv.to_csv(r"C:\temp\modified_names.csv")
DataFrame.to_csv documentation here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
Edit: I also noticed the title says "Create Excel File"
To do that you would do the following:
df_csv.to_excel(r"C:\temp\modified_names.xlsx")
Documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
I usually make the .csv file like this:
import csv
with open(FILENAME, 'w') as file:
csv_write = csv.writer(file,delimiter='\t')
csv_write.writerow(LINE)
LINE : is an array of row you want to write

Python function to import filename and city name

Very new to python and trying my best to learn. I understand the concept of functions and they don;t seem complicated but for some reason I have the worst time with them.
I have an excel spreadsheet I need to open the file and read data in from a specific sheet w/in the file.
I setup the function like so:
def process_data(file, city):
file_name = "../data/" + file # path to file + file name
sheet = city # sheet name
process_data("Jan 10.xlsx", "Seattle")
but it doesn't work. I ultimately want to read this into a panda dataframe so I can manipulate the data. Can someone give a newbie a little guidance?
All help is greatly appreciated....
pip install pandas # if pandas isn't installed
Import pandas as pd #importing pandas
df_sheet = pd.read_excel('file_name', 'sheet_name) # creating dataframe
First parameter is for the file_name, and second parameter is for the
Sheet name from that excel file. Hope this helps.
Here is the link for official Pandas read_excel()

Convert Excel zip file content to actual Excel file?

I am using cmis package available in python to download the document from FileNet repository. I am using getcontentstream method available in the package. However it returns content file that beings with 'Pk' and ends in 'PK'. when I googled I came to know it is excel zip package content. is there a way to save the content into an excel file. I should be able to open the downloaded excel. I am using below code. but getting byte-liked object is required not str. I noticed type of result is string.io.
# expport the result
result = testDoc.getContentStream()
outfile = open(sample.xlsx, 'wb')
outfile.write(result.read())
result.close()
outfile.close()
Hi there and welcome to stackoverflow. There are a few bits I noticed about your post.
To answer the error code you are getting directly. You called the outfile FileStream to be in terms of binary, however the result.read() must be in Unicode string format which is why you are getting this error. You can try to encode it before passing it to the outfile.write() function (ex: outfile.write(result.read().encode())).
You can also simply just write Unicode directly by:
result = testDoc.getContentStream()
result_text = result.read()
from zipfile import ZipFile
with ZipFile(filepath, 'w') as zf:
zf.writestr('filename_that_is_zipped', result_text)
Not I am not sure what you have in your ContentStream but note that a excel file is made up of xml files zipped up. The minimum file structure you need for an excel file is as follows:
_rels/.rels contains excel schemas
docProps/app.xml contains number of sheets and sheet names
docProps/core.xml boiler plate user info and date created
xl/workbook.xml contains sheet names rdId to workbook link
xl/worksheets/sheet1.xml (and more sheets in this folder) contains cell data for each sheet
xl/_rels/workbook.xml.rels contains sheet file locations within zipfile
xl/sharedStrings.xml if you have string only cell values
[Content_Types].xmlapplies schemas to file types
I recently went through piecing together an excel file from scratch, if you want to see the code check out https://github.com/PydPiper/pylightxl

Python cannot use loadtxt for csv file

I have an excel spreadsheet with all numbers on it, when I try to open it it gives me error:
for fname in glob.glob("Train*"):
prob = 0
a = array(loadtxt(fname, skiprows=1, dtype=object)[prob], dtype=float)
ERROR: a = array(loadtxt(fname, skiprows=1, dtype=object)[prob], dtype=float)
ValueError: setting an array element with a sequence.
I remember this working before but I haven't opened it in a while, not sure what is wrong.
Break it down.
The first step is to identify the file that is giving you the problem. Insert
print fname
as the first line inside the loop. The last name it prints before the error is the file in question.
Then, at the command prompt run
loadtxt("thebadfilename", skiprows=1, dtype=object)
See what you get.
At about this point you should see what is going wrong.
As said in the comments numpy.loadtxt cannot read Excel files.
You could try pandas.ExcelFile to read your data (not sure if this will work as you didn't gave an example.
docstring:
Class for parsing tabular excel sheets into DataFrame objects.
Uses xlrd for parsing .xls files or openpyxl for .xlsx files.
See ExcelFile.parse for more documentation
Parameters
----------
path : string or file-like object
Path to xls file
kind : {'xls', 'xlsx', None}, default None

Categories

Resources