So basically the authentication to my sharepoint is successful, but then Pandas can't read the xlsx file (which is stored as a byte object).
I get the error:
"ValueError: File is not a recognized excel file"
Code:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import pandas as pd
#target url taken from sharepoint and credentials
url = 'https://**[company-name]**-my.sharepoint.com/:x:/p/**[email-prefix]**/EYSZCv_Su0tBkarOa5ggMfsB-5DAB-FY8a0-IKukCIaPOw?e=iW2K6r' # this is just the link you get when clicking "copy link" on sharepoint
username = '...'
password = '...'
ctx_auth = AuthenticationContext(url)
if ctx_auth.acquire_token_for_user(username, password):
ctx = ClientContext(url, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
response = File.open_binary(ctx, url)
#save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start
#read excel file and each sheet into pandas dataframe
df = pd.read_excel(bytes_file_obj)
df
Any thoughts on to what could be going wrong here?
I also got the same error (& arrived at this page).
I could solve this, changing the url link.
Using file path (got from 'copy path' on opened excel file), maybe it will work...
example:
url = 'https://**[company-name]**-my.sharepoint.com/personal/**[email-prefix]**/Documents/filename.xlsx?web=1'
Osugi's method above worked for me! For added clarity: I had to open the Excel file in the actual Excel application, not OneDrive. I did this by clicking File -> info -> Open in Desktop App.
Once in the Excel application, I went File -> info -> Copy path. I pasted that path as my URL and it worked.
Related
I am trying to read an excel file from SharePoint to python and I get the following error:
ValueError: Excel file format cannot be determined, you must specify an engine manually
My Code:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
url_sp = 'https://company.sharepoint.com/teams/TeamE'
username_sp = 'MyUsername'
password_sp = 'MyPassword'
folder_url_sp = '/Shared%20Documents/02%20Team%20IAP/06_Da-An/Data/E/Edate.xlsx?web=1'
#Authentication
ctx_auth = AuthenticationContext(url_sp)
if ctx_auth.acquire_token_for_user(username_sp, password_sp):
ctx = ClientContext(url_sp, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print('Authentication sucessfull')
else:
print(ctx_auth.get_last_error())
import io
response = File.open_binary(ctx,folder_url_sp)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
data = pd.read_excel(bytes_file_obj,sheet_name = None)
Can it be related to the fact that the Excel file consists of several worksheets?
Can you help me further?
Thanks in advance
Several Sheets should not be a problem. Have you tried specifying an engine in your code like the error message says ?
data = pd.read_excel(bytes_file_obj, sheet_name=None, engine= ... )
Possible options can be found in the documentation of pandas here (scroll down to engine: str, default None). The explanation
If io is not a buffer or path, this must be set to identify io
seem to fit your fit your problem
I have this dataframe, and I want to save it as a excel file in a sharepoint folder.
This is my code:
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
# auth
client_credentials = ClientCredential(var_client_id, var_client_secret)
ctx = ClientContext(var_sp_site).with_credentials(client_credentials)
df = pd.DataFrame(sql_table)
var_relative_url = "sharepoint_path/sharepoint_path"
target_folder = ctx.web.get_folder_by_server_relative_url(var_relative_url)
target_folder.upload_file(content=df.to_excel(excel_writer='teste.xlsx'), file_name='teste.xlsx').execute_query() # Here is my problem
When I execute this code, the excel file is created at the folder, but when I try to open the file on sharepoint interface it raises a error ("cannot be opened").
This code will run on a cloud function, so I can't use local files to upload.
I'm investigating this issue right now. Not solved yet buy I can give you a work around: use .save()
wb = pd.ExcelWriter( outputFile, mode='w', engine="openpyxl" )
myDataFrame.to_excel( wb, sheet_name='sheet1', index=False )
wb.save()
From error to warning ;)
I am using the following code to connect to my sharepoint site and try to access the files on the site:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
####inputs########
# This will be the URL that points to your sharepoint site.
# Make sure you change only the parts of the link that start with "Your"
url_shrpt = 'https://YourOrganisation.sharepoint.com/sites/YourSharepointSiteName'
username_shrpt = 'YourUsername'
password_shrpt = 'YourPassword'
folder_url_shrpt = '/sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/'
#######################
###Authentication###For authenticating into your sharepoint site###
ctx_auth = AuthenticationContext(url_shrpt)
if ctx_auth.acquire_token_for_user(username_shrpt, password_shrpt):
ctx = ClientContext(url_shrpt, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print('Authenticated into sharepoint as: ',web.properties['Title'])
else:
print(ctx_auth.get_last_error())
############################
####Function for extracting the file names of a folder in sharepoint###
###If you want to extract the folder names instead of file names, you have to change "sub_folders = folder.files" to "sub_folders = folder.folders" in the below function
global print_folder_contents
def print_folder_contents(ctx, folder_url):
try:
folder = ctx.web.get_folder_by_server_relative_url(folder_url)
fold_names = []
sub_folders = folder.files #Replace files with folders for getting list of folders
ctx.load(sub_folders)
ctx.execute_query()
for s_folder in sub_folders:
fold_names.append(s_folder.properties["Name"])
return fold_names
except Exception as e:
print('Problem printing out library contents: ', e)
######################################################
# Call the function by giving your folder URL as input
filelist_shrpt=print_folder_contents(ctx,folder_url_shrpt)
#Print the list of files present in the folder
print(filelist_shrpt)
However I get the message:
Authenticated into sharepoint as: My Team
Problem printing out library contents: (None, None, "400 Client Error: Bad Request for url: /sites/YourSharepointSiteName/Shared%20Documents/YourSharepointFolderName/")
Why am I able to access the site but cannot access the folder or files in it? I would really appreciate some insight on this
I need to take my SharePoint excel file in to pandas data frame because I need to do analysis using python for that excel file. to access the SharePoint I use bellow code and it works. From bellow code I can access my excel file which located in SharePoint. Now I want take my excel file in to pandas data frame.so how I can modify bellow code?
from office365.sharepoint.client_context import ClientContext
SP_SITE_URL ='https://asdfgh.sharepoint.com/sites/ABC/'
SP_DOC_LIBRARY ='Publications'
USERNAME ='asd#fgh.onmicrosoft.com'
PASSWORD ='******'
# 1. Create a ClientContext object and use the user’s credentials for authentication
ctx =ClientContext(SP_SITE_URL).with_user_credentials(USERNAME, PASSWORD)
# 2. Read file entities from the SharePoint document library
files = ctx.web.lists.get_by_title(SP_DOC_LIBRARY).root_folder.files
ctx.load(files)
ctx.execute_query()
# 3. loop through file entities
for filein files:
# 4. Access the file object properties
print(file.properties['Name'], file.properties['UniqueId'])
# 5. Access list item object through the file object
item = file.listItemAllFields
ctx.load(item)
ctx.execute_query()
print('Access metadata - Category: {0}, Status: {1}'.format(item.properties['Category'], item.properties['Status']))
# 4. The Output:
# File Handling in SharePoint Document Library Using Python.docx 77819f08-5fbe-450f-9f9b-d3ae2862cbb5
# Access metadata - Category: Python, Status: Submitted
For it operate through, the file will be needed to be present in the memory of the system.
Find the path of the file - It should be in of the Meta-Data of the file which you are already.
With the below library :
from office365.sharepoint.files.file import File
You could use the below code to go ahead and store it in the memory and read from the Panda data frame.
response = File.open_binary(ctx, url)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) #set file object to start
df = pd.read_excel(bytes_file_obj, sheetname = <Sheetname>)
In Python I am utilizing Office 365 REST Python Client library to access and read an excel workbook that contains many sheets.
While the authentication is successful, I am unable to append the right path of sheet name to the file name in order to access the 1st or 2nd worksheet by its name, which is why the output from the sheet is not JSON, rather IO Bytes which my code is not able to process.
My end goal is to simply access the specific work sheet by its name 'employee_list' and transform it into JSON or Pandas Data frame for further usage.
Code snippet below -
import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO
username = 'abc#a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/sites/SAMPLE/_layouts/15/Doc.aspx?OR=teams&action=edit&sourcedoc={739271873}'
# HOW TO ACCESS WORKSHEET BY ITS NAME IN ABOVE LINE
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions("{0}/_api/web/".format(site_url))
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content) # ERROR ENCOUNTERED JSON DECODE ERROR SINCE DATA IS IN BYTES
You can access it by sheet index, check the following code....
import xlrd
loc = ("File location")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
# For row 0 and column 0
print(sheet.cell_value(1, 0))
You can try to add the component 'sheetname' to the url like so.
https://site/lib/workbook.xlsx#'Sheet1'!A1
It seems that URL constructed to access data is not correct. You should test full URL in your browser as working and then modify code to get going. You may try this with some changes, I have verified that URL formed with this logic would return JSON data.
import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO
username = 'abc#a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/_vti_bin/ExcelRest.aspx/RootFolder/ExcelFileName.xlsx/Model/Ranges('employee_list!A1%7CA10')?$format=json'
# Replace RootFolder/ExcelFileName.xlsx with actual path of excel file from the root.
# Replace A1 and A10 with actual start and end of cell range.
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions(site_url)
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content)
Source: https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sample-uri-for-excel-services-rest-api
The update I'm using (Office365-REST-Python-Client==2.3.11) allows simpler access to an Excel file in the SharePoint repository.
# from original_question import pd,\
# username,\
# password,\
# UserCredential,\
# File,\
# BytesIO
user_credentials = UserCredential(user_name=username,
password=password)
file_url = ('https://sample.sharepoint.com'
'/sites/SAMPLE/{*recursive_folders}'
'/sample_worksheet.xlsx')
## absolute path of excel file on SharePoint
excel_file = BytesIO()
## initiating binary object
excel_file_online = File.from_url(abs_url=file_url)
## requesting file from SharePoint
excel_file_online = excel_file_online.with_credentials(
credentials=user_credentials)
## validating file with accessible credentials
excel_file_online.download(file_object=excel_file).execute_query()
## writing binary response of the
## file request into bytes object
We now have a binary copy of the Excel file as BytesIO named excel_file. Progressing, reading it as pd.DataFrame is straight-forward like usual Excel file stored in local drive. Eg.:
pd.read_excel(excel_file) # -> pd.DataFrame
Hence, if you are interested in a specific sheet like 'employee_list', you may preferably read it as
employee_list = pd.read_excel(excel_file,
sheet_name='employee_list')
# -> pd.DataFrame
or
data = pd.read_excel(excel_file,
sheet_name=None) # -> dict
employee_list = data.get('employee_list')
# -> [pd.DataFrame, None]
I know you stated you can't use a BytesIO object, but for those coming here who are reading the file in as a BytesIO object like I was looking for, you can use the sheet_name arg in pd.read_excel:
url = "https://sharepoint.site.com/sites/MySite/MySheet.xlsx"
sheet_name = 'Sheet X'
response = File.open_binary(ctx, relative_url)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name = sheet_name) //call sheet name