So I trying to come up with a script to check and report files from my sharepoint according to some criteria. I need however authenticate using the company Active Directory.
I have done some research and all I've found seem far too advanced for what I intend and would prefer not to spend the whole time learning django custom auth. Haufe.sharepoint seems simpler, but I didn't manage to list all folders from a URL with it.
I must confess I don't really understand Sharepoint very much either, though I'm assuming it could behave sorta like a file repository for my purpose. So I will not be offended if someone points out that what I’m trying to do doesn’t make sense.
Anyways, I have an url like http://intranet/sharedir/Archive/Projects/Customer1 and under it many folders containing subfolders and files.
What I want is, given the URL (like the one above), list all directories and files contained. After that I will iterate over the items and apply the rules I’m interested.
If someone could provide some Python code example or reference would be great.
Related
I have started to explore the Graph API, but Sharepoint is quite complicated and I am not sure how to proceed. I previously have worked with OneNote using this API successfully.
Purpose: There are thousands of folders/files and I need to go through the list in order to organize it in a better way. I am looking for a way to export this list into Excel/CSV using Python and Graph API
I want to dynamically get a list of all underlying Folders and files visible from this URL:
https://company1.sharepoint.com/teams/TEAM1/Shared Documents/Forms/AllItems.aspx?id=/teams/TEAMS_BI_BI-AVS/Shared Documents/Team Channel1/Folder_Name1&viewid=6fa603f8-82e2-477c-af68-8b3985cfa525
When I open this URL, I see that this folder is part of a private group called PRIVATE_GROUP1 (on the top left).
Looking at some sample API calls here:
GET /drives/{drive-id}/items/{item-id}/children -> Not sure what drive-id
GET /groups/{group-id}/drive/items/{item-id}/children -> I assume group-id refers to private group. Not sure how to get the ID
GET /sites/{site-id}/drive/items/{item-id}/children -> Assuming site-id is 'company1.sharepoint.com'?
For all above not sure what item-id refers to...
Thanks
refer below code. This might help you.
https://gist.github.com/keathmilligan/590a981cc629a8ea9b7c3bb64bfcb417
This is my first post on stack.
I'm looking to gather a large amount of data from a multitude of files on PW so I can quantify a few things about the records.
The directories I'm working with have unique numbers and offer files that are all similar to files in other folders.
Is there a library from python I can use or any other useful tips for taking on this task?
It could potentially save many hours of work if I can do this with code.
A pseudocode example may look like.
for element in dataField:
search(folder)
if folder found:
search(file)
if file found
extract certain data from file X
extractedData.append(data)
Thank you,
R
Based off a quick web search for projectwise api, there is a web-based REST API available, so you'll definitely want to look into that more. You'll need to read the docs carefully to figure out which endpoint does what, but once you know what information you need to send and what kind of data you'll receive, programming a basic Python interface shouldn't be too difficult. One may already exist, I didn't look too hard.
I have a Node project that's bundled and added to Github as releases. At the moment, it checks my Github for a new release via the API and lets the user download it. The user must then stop the Node server, unzip the release.zip to the folder and overwrite everything to update the project.
What I'm trying to do is write a Python script that I can execute in Node by spawning a new process. This will then kill the Node server using PM2, and then Python script will then check the Github API, grab the download url, downloads it, unzips the contents to the current folder, deletes the zip and then starts up the Node server again.
What I'm struggling with though is checking the Github API and downloading the latest release file. Can anyone point me in the right direction? I've read that wget shouldn't be used in Python, and instead use urlopen
If you are asking for ways to get data from a web server, the two main libraries are:
Requests
Urllib
Personally, I prefer requests. They both have good documentation.
With requests, getting JSON data is as simple as:
r = requests.get("example.com")
r = r.json()
You can add headers and other information easily, though keep in mind that while it supports HTTP, it doesn't support HTTPS.
You need to map out your workflow and dataflow better. You can do it in words or pictures. If you can express your problem clearly and completely in words step by step in list format in words, then translate it to pseudocode. Python is great because you can go almost immediately from a good written description, to pseudocode, to a working implementation. Then at least you have something that works, and you can optimize performance, simplify functionality or usability from there. This is the process of translating a problem into a solution.
When asking questions on SO, you need to show your current thinking, what you've already tried, preferably with your code that doesn't yet work, or work the way you need it to work. People can vote you down and give you negative reputation points if you ask a question with just a vague description, a question that is an obvious cry for help with homework (yours is not that), or a muse or a vague question with not even an attempt at a solution, because it does not contribute back to the community in any way.
Do you have any code or detailed pseudocode steps for checking the GitHub API and checking for the "latest release" of file(s) you are trying to update?
We're creating gamma-cat, an open data collection for gamma-ray astronomy, and are looking for advice (here, or links to resources, formats, tools, packages) how to best set it up.
The data we have consists of measurements for different sources, from different papers. It's pretty heterogeneous, sometimes there's data for multiple sources in one paper, for each source there's usually several papers, sometimes there's no spectrum, sometimes one, sometimes many, ...
Currently we just collect the data in an input folder as YAML and CSV files, and now we'd like to expose it to users. Mainly access from Python, but also from Javascript and accessible from a static website.
The question is what format and organisation we should use for the data, and if there's any Python packages that will help us generate the output files as a set of linked data, as well as Python and Javascript packages that will help us access it?
We would like to get multiple "views" or simple "queries" of the data, e.g. "list of all sources", "list of all papers", "list of all spectra for source X", "spectrum A from paper B for source C".
For format, probably JSON would be a good choice? Although YAML is a bit nicer to read, and it's possible to have comments and ordered maps. We're storing the output files in a git repo, and have had a lot of meaningless diffs for JSON files because key order changes all the time.
To make the datasets discoverable and linked, I don't know what to use. I found e.g. http://jsonapi.org/ but that seems to be for REST APIs, not for just a series of flat JSON files on a static webserver? Maybe it could still be used that way?
I also found http://json-ld.org/ which looks relevant, but also pretty complex. Would either of those or something else be a good choice?
And finally, we'd like to generate the linked and discoverable files in output from just a bunch of somewhat organised YAML and CSV files in input using Python scripts. So far we just wrote a bunch of Python classes or scripts based on Python dicts / lists and YAML / JSON files. Is there a Python package that would help with that task of generating the linked data files?
Apologies for the long and complex question! I hope it's still in scope for SO and someone will have some advice to share.
Judging from the breadth of your question, you are new to linked data. The least "strange" format for you might be the Data Package. In the most common case it's just a zip archive of a CSV file and JSON metadata. It has a Python package.
If you have queries to the data, you should settle for a database (triplestore) with a SPARQL endpoint. Take a look at Fuseki. You can then use Turtle or RDF/XML for file export.
If the data comes from some kind of a tool, you can model the domain it represents using Eclipse Lyo (tutorial).
These tools are maintained by 3 different communities, you can reach out to their user mailing lists separately if you have further questions about them.
I have a Microsoft Access database, and pictures that have file names matching entries in the database (though part of the file name is irrelevant). I need to read the file names and insert links to them into the database attached to their correct entries. I've been playing around with PowerShell and Python for a while, but have little experience with Access and can't find a lot of documentation on the subject. So my questions are:
Am I better off using PowerShell, Python, or something else for this project? I just happen to have experience with those two languages so they're my preferred jumping off point.
I imagine this will take a good amount of work and I don't mind getting my hands dirty/doing a lot of research, but after looking around I can't seem to find a good place to get started. Are there any specific commands, documentation, functions, etc. that could give me a jumpstart on this project?
Thanks!
EDIT: Thanks to #ako for bringing up a good point and something I was concerned about. Putting the photos in the DB itself is likely a bad idea, so I'd like to instead host them elsewhere then automatically have links to the files generated in the DB based on the file names and matching DB entries.