So I'm building a pipeline that is not going into a real production environment. Basically, I have some data in a defined folder structure, and I want to access it from different stages in my pipeline. Right now, data is ordered kind of like this
.../data/03-14-2019/unprocessed/raw/student_id_12345-raw.csv
or
.../data/05-04-2020/processed/position/student_id_1234345-position.csv
Now, I wrote a modular pipeline that looks in a folder and runs the pipeline all the .csv files in all the contained directories. If I point it at .../data/03-14-2019/unprocessed/raw/ then my pipeline will process all of the raw data for every student. I built this under the assumption that we were going to rename all the files to a more manageable schema, but things may have changed. My question is this: Using the os.link() functionality in python3, would it be possible to make an alternate filepath system that includes what I want? For example, one way I may want to go through files might be:
.../data/unprocessed/student_id_12345/2019/03/14/raw/student_id_12345-raw.csv
or maybe
.../data/unprocessed/raw/2019/03/14/student_id_12345/student_id_12345-raw.csv
depending on if I want to process a certain batch of students or only raw data from a certain day. I remember using a batch renaming tool as part of either Total Commander or Nautilus, but I don't remember if it could do symbolic links. Basically, I want to use symbolic links to build a directory structure on top of an existing structure.
I was going to try implementing this, but I figured I should check if anyone has already done this or there were already solutions before I started, as well as maybe some suggestions of where to start. Thanks!
Related
I have images in 100 folders and the search results are slow, so I want to access those images, so maybe I wanna do it with python(if it is faster), in the way that when we select all the files, and drag and drop them in windows. then I realized that drag and drop in windows uses Component Object Model this source.
So I want to know is there any way in python to have COMs of the image files in those 100 folders in the same place (a specific folder)? or in other words can we create COMs of other files, (equivalent of shortcuts), cause I know shortcuts for my purpose won't work.
The question in general is about how to access direct handles or COMs of files of different folders in one folder? if it's possible, please tell me how? to be simpler I want to have similar function of file shortcuts but not 'shortcuts' existing in windows, because for my purpose 'shortcuts' won't work, so I think it can be done with COMs.
tkinter equivalent question:
let me ask my question in other way, lets think I want to make a windows file search application in python with some library like tkinter, so one background part of my code finds the file paths of desired search results, and other part in gui('gui part'): wants to show the result files with ability of opening files from that gui or drag files from gui to other folder or applications, so how should I do the 'gui part'?
this tutorial suggested by #Thingamabobs is about getting external files into window(gui) of app, but I want the opposite, I mean having file handles to open, something like windows explorer
My question maybe wrong in case of misunderstanding the concept of COMs, so please provide me more relevant sources of use case of mine. finally if the title seems to be unsuitable, feel free to change it.
Based on an interpretation of the question, the following is an initial summary approach to a solution.
"""
This module will enable easy access to files spread across 100 plus
directories. A file should be as easy to open as clicking on a link.
Analysis:
Will any files be duplicated in any other directory? Do not know.
Will any file name be the same as another file in a different directory? Do
not know.
Initial design in pseudocode:
> Capture absolute path to each file in each directory.
> Store files information in python data structure
> for instance a list of tuples <path>,<filename>
> Once a data structure is determined use Tkinter, ttk.treeview to open a
file as easy as clicking on a link in the tree.
"""
I tried with SASpy but it's not working. I am able to open the SAS .egp file but not able to run the multiple scripts within in sequence.
import os, sys, subprocess
def OpenProject(sas_exe, egp_path):
sasExe = sas_exe
sasEGpath = egp_path
subprocess.call([sasExe, sasEGpath])
sas_exe = path\path\
egp_path = path\path\path\
OpenProject(sas_exe, egp_path)
This depends a bit on exactly what the workflow is. A few side notes, then the full solution.
First: EGP is not really intended to store production processes, in my opinion. EGP should really be used for development, then production is done with .sas (text) files. EGP can directly store the nodes as .sas files; ask a new question about that if you want to know more, but it's pretty easy to figure out. Best practice is to have EGP save the code modules as .sas files, then run those - SASPy will easily do that for you.
Second: If you use SAS's built-in Git connectivity, then you can do this a bit more easily I suspect. Consider doing that if you already use Git for your other processes. Again, then you end up with a .sas file, and can directly run that via SASPy.
So: how can you do this in Python, with the assumption you do have to use the .egp itself, without too many different moving parts? The key here is the .egp format. EGP is a container file, which is actually a .zip format container that has in it, among other things, all of the SAS code you want to run, as text. Text in xml format, but still, text.
You can write a python program that opens the .egp as a .zip file, using the zipfile library, and then use xml.etree.ElementTree to parse the project.xml file inside that project. Exactly what you do from there depends on your particular details, and is well out of scope for a Stack Overflow answer, but if you do better visually you can simply rename the .egp to .zip and then open in unzip program of your choice, then browse project.xml in your text editor, and find the nodes and code related to those nodes.
You can then extract the .sas code as text, and submit it directly via SASPy, or extract it to a .sas file and then submit that however you prefer (SASPy or something else).
I do something similar to this for a project - I don't actually run code from it, I'm just parsing it to verify that the correct programs were synced from the EGP to production - but it would be trivial to actually submit the code from what I've written, which is about 50 lines of code total. I may write a SGF paper this year or next year on this topic, in which case I'll try and remember to submit it here - or you can head over to my github page and see if it's there (in the future!).
Currently I'm working on automation in a project that has a very big site divided in sections and/or pages.
I'm using python-behave for the first time and I'm still learning (used to work with ProtractorJS)
Since I started I've been using the steps folder to put all my step files (each page has it's own step file) but in my case, it's not going to be scalable since the site has around 50 pages, with subsections each, so having all step files in one folder is going to get messy as I start adding files to the step folder.
What I want to do is be able to have folders to separate each of the pages and have step files in each of them to better organize the files.
Now, I know that this framework does not support having a folder structure inside the step definition folder, so i started looking around and found this post with a possible solution, but i noticed he uses wildcard import to add all nested step files.
I know wildcard imports are not considered a good practice in general, but i feel in this case is the only way to allow folder strucures for step files.
Is there any other way to achieve this?
Example
features
-->steps
---->*.py // This is where pyhton-behave expects to find the step definitions. The "recommended" way
What i'm trying to do
features
--->steps
----->login
-------->login_steps.py
----->some_page
-------->some_pages_steps.py
--->all_steps.py // file that imports nested step files using wildcard
// the only file that behave finds when looking for steps
We're creating gamma-cat, an open data collection for gamma-ray astronomy, and are looking for advice (here, or links to resources, formats, tools, packages) how to best set it up.
The data we have consists of measurements for different sources, from different papers. It's pretty heterogeneous, sometimes there's data for multiple sources in one paper, for each source there's usually several papers, sometimes there's no spectrum, sometimes one, sometimes many, ...
Currently we just collect the data in an input folder as YAML and CSV files, and now we'd like to expose it to users. Mainly access from Python, but also from Javascript and accessible from a static website.
The question is what format and organisation we should use for the data, and if there's any Python packages that will help us generate the output files as a set of linked data, as well as Python and Javascript packages that will help us access it?
We would like to get multiple "views" or simple "queries" of the data, e.g. "list of all sources", "list of all papers", "list of all spectra for source X", "spectrum A from paper B for source C".
For format, probably JSON would be a good choice? Although YAML is a bit nicer to read, and it's possible to have comments and ordered maps. We're storing the output files in a git repo, and have had a lot of meaningless diffs for JSON files because key order changes all the time.
To make the datasets discoverable and linked, I don't know what to use. I found e.g. http://jsonapi.org/ but that seems to be for REST APIs, not for just a series of flat JSON files on a static webserver? Maybe it could still be used that way?
I also found http://json-ld.org/ which looks relevant, but also pretty complex. Would either of those or something else be a good choice?
And finally, we'd like to generate the linked and discoverable files in output from just a bunch of somewhat organised YAML and CSV files in input using Python scripts. So far we just wrote a bunch of Python classes or scripts based on Python dicts / lists and YAML / JSON files. Is there a Python package that would help with that task of generating the linked data files?
Apologies for the long and complex question! I hope it's still in scope for SO and someone will have some advice to share.
Judging from the breadth of your question, you are new to linked data. The least "strange" format for you might be the Data Package. In the most common case it's just a zip archive of a CSV file and JSON metadata. It has a Python package.
If you have queries to the data, you should settle for a database (triplestore) with a SPARQL endpoint. Take a look at Fuseki. You can then use Turtle or RDF/XML for file export.
If the data comes from some kind of a tool, you can model the domain it represents using Eclipse Lyo (tutorial).
These tools are maintained by 3 different communities, you can reach out to their user mailing lists separately if you have further questions about them.
So I trying to come up with a script to check and report files from my sharepoint according to some criteria. I need however authenticate using the company Active Directory.
I have done some research and all I've found seem far too advanced for what I intend and would prefer not to spend the whole time learning django custom auth. Haufe.sharepoint seems simpler, but I didn't manage to list all folders from a URL with it.
I must confess I don't really understand Sharepoint very much either, though I'm assuming it could behave sorta like a file repository for my purpose. So I will not be offended if someone points out that what I’m trying to do doesn’t make sense.
Anyways, I have an url like http://intranet/sharedir/Archive/Projects/Customer1 and under it many folders containing subfolders and files.
What I want is, given the URL (like the one above), list all directories and files contained. After that I will iterate over the items and apply the rules I’m interested.
If someone could provide some Python code example or reference would be great.