Convert Sharepoint List into Pandas Dataframe - python

I have a table in Sharepoint that I'm wanting to convert into a Pandas Dataframe. I've largely used this question to try and frame a solution Get SharePoint List with Python. I'm having issues however.
Here is what I have so far...
import pandas as pd
from shareplum import Site
from requests_ntlm import HttpNtlmAuth
url = 'https://share.corporation.com/sites/group/subgroup/'
username = 'username'
password = 'password'
cred = HttpNtlmAuth(username, password)
site = Site(url, auth=cred, verify_ssl=False)
Up to this point, I can run the code without an error being thrown. However, when I run this bit:
sp_list = site.List('Q22020') # this creates SharePlum object
ShareplumRequestError: Shareplum HTTP Post Failed : 500 Server Error: Internal Server Error for url: https://share.corporation.com/sites/group/subgroup/_vti_bin/lists.asmx
I'm actually not entirely sure that my site.List('Q22020') is even correct.
However, following the instructions from this video: https://www.youtube.com/watch?v=dvFbVPDQYyk
When I manually enter the following url into my browser, it does generate an xml file, so I believe it's correct: https://share.corporation.com/sites/group/subgroup/_vti_bin/ListData.svc/Q22020

A friend pass me this code early. ListaSP returns a Dataframe with your Sharepoint list contents
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
def dataframeSP(lista):
sp_list = lista
sp_lists = ctx.web.lists
s_list = sp_lists.get_by_title(sp_list)
l_items = s_list.get_items()
ctx.load(l_items)
ctx.execute_query()
columnas=list(pd.DataFrame.from_dict(l_items[0].properties.items()).iloc[:,0])
valores=list()
for item in l_items:
data=list(pd.DataFrame.from_dict(item.properties.items()).iloc[:,1])
valores.append(data)
resultado=pd.DataFrame(valores,columns=columnas)
return resultado
client_id = "########"
client_secret = "##############"
site_url = "https://YOURSHAREPOINT.sharepoint.com/YOURLIST"
ctx = ClientContext(site_url).with_credentials(ClientCredential(client_id, client_secret))
listaSP = ctx.web.lists.get_by_title("THE NAME OF YOUR SHAREPOINT LIST")

Try:
https://share.corporation.com/sites/group/subgroup/Lists/Q22020/_vti_bin/lists.asmx
If not, go to the list on the web and have a look at the URL once you are looking at a view of the 'Q22020' list. Your "url" parameter may be incorrect.

I had the same problem and followed the same logic of getting the list name from URL. However, I found that the list name actually had a space in it, despite the URL not showing it. Adding the space solved the issue.
Using your example, if the URL is https://share.corporation.com/sites/group/subgroup/_vti_bin/ListData.svc/Q22020
but the list is actually
'Q2 2020' then you would change your code to:
sp_list = site.List('Q2 2020')

Related

using python to automate sharepoint actions on organisation site

I'm trying to automate some data processing in my org but I'm struggling to parse the use of the python module to make the requests. When I use
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext, UserCredential
from office365.sharepoint.files.file import File
sharepoint_base_url = 'https://organisation.sharepoint.com/sites/suborganisation'
sharepoint_user = 'username#org.org'
sharepoint_password = 'password'
ctx = ClientContext(sharepoint_base_url).with_credentials(UserCredential(sharepoint_user, sharepoint_password))
web = ctx.web
ctx.load(web)
ctx.execute_query()
print(f"Web title: {web.properties['Title']}")
It will access the main site, but as soon as I try to work with any of the sub directories I start to run into errors. I think the problem is either that I just don't understand the relative urls, or that there's something about the structure of the site
For example the full url for the folder I want to point (with modification obviously) to is
folder_in_sharepoint = 'https://organisation.sharepoint.com/sites/suborganisation/current_project/Forms/AllItems.aspx?id=%2Fsites%2suborganisation%2Fcurrent_project%2FApplications%202023&viewid=longasphanumericstring' # copied directly from the browser
def folder_details(ctx, folder_in_sharepoint):
folder = ctx.web.get_folder_by_server_relative_url(folder_in_sharepoint)
fold_names = []
sub_folders = folder.files
ctx.load(sub_folders)
ctx.execute_query()
for s_folder in sub_folders:
fold_names.append(s_folder.properties["Name"])
return fold_names
#listing objects in the folder
file_list = folder_details(ctx, folder_in_sharepoint)
I get the error
ClientRequestException: (None, None, "400 Client Error: Bad Request for url: https://organisation.sharepoint.com/sites/suborganisation/_api/Web/getFolderByServerRelativeUrl('https:%2F%2Forganisation.sharepoint.com%2Fsites%2Fsuborganisation%2Fcurrent_project%2FForms%2FAllItems.aspx%3Fid=%252Fsites%252Fsuborganisation%252Fcurrent_project%252FApplications%25202023%26viewid=longalphanumericstring')/Files")
It appears to me that ctx.web.get_folder_by_server_relative_url(folder_in_sharepoint) is where the problem is, and it's something to do with what I'm passing as the relative url, but I've tried commenting out every line, and trying multiple permutation of the url I can make and I'm still getting nowhere.
Any guidance appreciated.

Unable to extract the table from API using python

I am trying to extract a table using an API but I am unable to do so. I am pretty sure that I am not using it correctly, and any help would be appreciated.
Actually I am trying to extract a table from this API but unable to figure out the right way on how to do it. This is what is mentioned in the website. I want to extract Latest_full_data table.
This is my code to get the table but I am getting error:
import urllib
import requests
import urllib.request
locu_api = 'api_Key'
def locu_search(query):
api_key = locu_api
url = 'https://www.quandl.com/api/v3/databases/WIKI/metadata?api_key=' + api_key
response = urllib.request.urlopen(url).read()
json_obj = str(response, 'utf-8')
datanew = json.loads(json_obj)
return datanew
When I do print(datanew). Update: Even if I change it to return data new, error is still the same.
I am getting this below error:
name 'datanew' is not defined
I had the same issues with urrlib before. If possible, try to use requests it's a better designed and working library in my opinion. Also, it is capable of reading JSON with a single function so no need to run it through multiple lines Sample code here:
import requests
locu_api = 'api_Key'
def locu_search():
url = 'https://www.quandl.com/api/v3/databases/WIKI/metadata?api_key=' + api_key
return requests.get(url).json()
locu_search()
Edit:
The endpoint that you are calling might not be the correct one. I think you are looking for the following one:
import requests
api_key = 'your_api_key_here'
def locu_search(dataset_code):
url = f'https://www.quandl.com/api/v3/datasets/WIKI/{dataset_code}/metadata.json?api_key={api_key}'
req = requests.get(url)
return req.json()
data = locu_search("FB")
This will return with all the metadata regarding a company. In this case Facebook.
Maybe it doesn't apply to your specific problem, but what I normally do is the following:
import requests
def get_values(url):
response = requests.get(url).text
values = json.loads(response)
return values

Google Indexing API - Invalid attribute. 'url' is not in standard URL format - But my URL is Correct

I am currently using Indexing API v3.
When I am using this API in a loop, I got this error:
Invalid attribute. 'url' is not in standard URL format
But I am pretty sure that my URL is correct, because it is download from Google search console:
Here is the code:
from oauth2client.service_account import ServiceAccountCredentials
import httplib2
import json
import pandas as pd
JSON_KEY_FILE = "key.json"
SCOPES = ["https://www.googleapis.com/auth/indexing"]
credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_KEY_FILE, scopes=SCOPES)
http = credentials.authorize(httplib2.Http())
# This file contains 2 column, URL and date
csv = pd.read_csv("my_data.csv")
csv[["URL"]][0:10].apply(lambda x: indexURL(x.to_string(), http), axis=1)
def indexURL(url, http):
ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"
content = {}
content['url'] = url
content['type'] = "URL_UPDATED"
json_ctn = json.dumps(content)
response, content = http.request(ENDPOINT, method="POST", body=json_ctn)
result = json.loads(content.decode())
if("error" in result):
print("Error({} - {}): {}".format(result["error"]["code"], result["error"]["status"], result["error"]["message"]))
else:
print("urlNotificationMetadata.url: {}".format(result["urlNotificationMetadata"]["url"]))
print("urlNotificationMetadata.latestUpdate.url: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["url"]))
print("urlNotificationMetadata.latestUpdate.type: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["type"]))
print("urlNotificationMetadata.latestUpdate.notifyTime: {}".format(result["urlNotificationMetadata"]["latestUpdate"]["notifyTime"]))
Here is a list of URL sample:
Can anyone please tell me what's wrong with my code?
Thank you very much in advance for all your help.
It seems that even if I apply .strip() to each row, there is still a \n at the end of each URL.
So instead of putting row one by one to lambda, I put the whole series to lambda and use a for-loop to handle it.
The whole working example is here:
Google Indexing API v3 Working Example with Python 3

Connecting to YouTube API and download URLs - getting KeyError

My goal is to connect to Youtube API and download the URLs of specific music producers.I found the following script which I used from the following link: https://www.youtube.com/watch?v=_M_wle0Iq9M. In the video the code works beautifully. But when I try it on python 2.7 it gives me KeyError:'items'.
I know KeyErrors can occur when there is an incorrect use of a dictionary or when a key doesn't exist.
I have tried going to the google developers site for youtube to make sure that 'items' exist and it does.
I am also aware that using get() may be helpful for my problem but I am not sure. Any suggestions to fixing my KeyError using the following code or any suggestions on how to improve my code to reach my main goal of downloading the URLs (I have a Youtube API)?
Here is the code:
#these modules help with HTTP request from Youtube
import urllib
import urllib2
import json
API_KEY = open("/Users/ereyes/Desktop/APIKey.rtf","r")
API_KEY = API_KEY.read()
searchTerm = raw_input('Search for a video:')
searchTerm = urllib.quote_plus(searchTerm)
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&q='+searchTerm+'&key='+API_KEY
response = urllib.urlopen(url)
videos = json.load(response)
videoMetadata = [] #declaring our list
for video in videos['items']: #"for loop" cycle through json response and searches in items
if video['id']['kind'] == 'youtube#video': #makes sure that item we are looking at is only videos
videoMetadata.append(video['snippet']['title']+ # getting title of video and putting into list
"\nhttp://youtube.com/watch?v="+video['id']['videoId'])
videoMetadata.sort(); # sorts our list alphaetically
print ("\nSearch Results:\n") #print out search results
for metadata in videoMetadata:
print (metadata)+"\n"
raw_input('Press Enter to Exit')
The problem is most likely a combination of using an RTF file instead of a plain text file for the API key and you seem to be confused whether to use urllib or urllib2 since you imported both.
Personally, I would recommend requests, but I think you need to read() the contents of the request to get a string
response = urllib.urlopen(url).read()
You can check that by printing the response variable

Using the Python GData API, cannot get editable video entry

I am having trouble getting a video entry which includes a link rel="edit". I need such an entry in order to be able to call DeleteVideoEntry(...) on it.
I am retrieving the video using GetYouTubeVideoEntry(youtube_id=XXXXXXX). My yt_service is initialized with a username, password, and a developer key. I use ProgrammaticLogin. This part seems to work fine. I use the same yt_service to upload said video earlier. Also, if I change the developer key to something bogus (during debugging) and try to authenticate, I get a 403 error. This leads me to believe that authentication works OK.
Needsless to say, the video entry retrieved with GetYouTubeVideoEntry(youtube_id=XXXXXXX) does not contain the edit link and I cannot use the entry in a DeleteVideoEntry(...) call.
Is there some special way to get a video entry which will contain a link element with a rel="edit"? Can anyone suggest some way to resolve my issue? Could this possibly be a bug?
Update:
For the records, when I tried getting the feed of all my uploads, and then looping through the video entries, the video entries do have an edit link. So using this works:
uri = 'http://gdata.youtube.com/feeds/api/users/%s/uploads' % username
feed = yt_service.GetYouTubeVideoFeed(uri)
for entry in feed.entry:
yt_service.DeleteVideoEntry(entry)
But this does not:
entry = yt_service.GetYouTubeVideoEntry(video_id = video.youtube_id)
yt_service.DeleteVideoEntry(entry)
Using the same yt_service.
I've just deleted youtube video using gdata and ProgrammaticLogin()
Here is some steps to reproduce:
import gdata.youtube.service
yt_service = gdata.youtube.service.YouTubeService()
yt_service.developer_key = 'developer_key'
yt_service.email = 'email'
yt_service.password = 'password'
yt_service.ProgrammaticLogin()
# video_id should looks like 'iu6Gq-tUsTc'
uri = 'https://gdata.youtube.com/feeds/api/users/%s/uploads/%s' % (username, video_id)
entry = yt_service.GetYouTubeUserEntry(uri=uri)
response = yt_service.DeleteVideoEntry(entry)
print response # True
yt_service.GetYouTubeVideoFeed(uri) works because GetYouTubeVideoFeed doesn't check uri and just calls self.Get(uri, ...) but originaly, I think, it expected 'https://gdata.youtube.com/feeds/api/videos' uri.
vice versa yt_service.GetYouTubeVideoEntry() use YOUTUBE_VIDEO_URI = 'https://gdata.youtube.com/feeds/api/videos' but this entry doesn't contains rel="edit"
Hope that helps you out
You can view the HTTP headers of the generated requests by setting the debug flag to true. This is as simple as:
yt_service = gdata.youtube.service.YouTubeService()
yt_service.debug = True
You can read about this in the documentation here.

Categories

Resources