How to load kedro DataSet object dynamically - python

I am currently using the yaml api to create all of my datasets with kedro==15.5. I would like to be able to peer into this information from time to time dynamically. It appears that I can get to this information with the io.datasets which is a _FrozenDatasets object. I cannot loop over it or access it programatically though.
Specific Use Case
Specifically I would like to add a test that loops over the datasets to check that there are not multiple catalog entries using the same filepath. Is this possible without using eval? Currently I think would need to do something like this
filepaths = {}
for entry_name in io.list()
eval(f'filepaths[{entry_name}] = io.datasets.{entry_name}'.filepath)

Unfortunately, I don't think AbstractDataSet (from which they are all defined) has a property for filepath or the config that built it. You can read the ProjectContext config but that won't cover datasets that were dynamically built.

Related

how to filter / query data from Python SASPY to_df function

I am working on python on some data get from a SAS server. I am currently using SASPY to_df() function to bring it from SAS to local pandas.
I would like to know if its possible to filter/query the data that is being transferred so I could avoid bringing unneeded that and speeding up my download.
I couldn't find anything on saspy documentation, it only offers the possibility of using "**kwargs" but I couldn't figure out how to do it.
Thanks.
You need to define the sasdata object using the WHERE= dataset option to limit the observations pulled.
https://sassoftware.github.io/saspy/api.html#saspy.sasdata.SASdata
Then when you use the to_df() method only the selected data will be transferred.
You can also use the KEEP= or DROP= dataset option to limit the variables that are transferred. Remember that in order to reference any variables in the WHERE= option they have to be kept.
The "**kwargs" looks to be about changing how you connect to the SAS server, so that is not important for what you want.

Mapping fields inside other fields

Hello I would like to make an app that allows the user to import data from a source of his choice (Airtable, xls, csv, JSON) and export to a JSON which will be pushed to an Sqlite database using an API.
The "core" of the functionality of the app is that it allows the user to create a "template" and "map" of the source columns inside the destination columns. Which source column(s) go to which destination column is up to the user. I am attaching two photos here (used in airtable/zapier), so you can get a better idea of the end result:
adding fields inside fields - airtableadding fields inside fields - zapier
I would like to know if you can recommend a library or a way to come about this problem? I have tried to look for some python or nodejs libraries, I am lost between using ETL libraries, some recommended using mapping/zipping features, others recommend coding my own classes. Do you know any libraries that allow to do the same thing as airtable/zapier ? Any suggestions ?
Save file on databases is really a bad practice since it takes up a lot of database storage space and would add latency in the communication.
I hardly recommend saving it on disk and store the path on database.

Storing Variables in Django RestAPI

I am trying to host a rest API using Django which takes in some parameters, processes it and returns a result. For this processing to happen, I have to use certain datasets which are loaded using excel, tiff, csv, and txt files. Loading these datasets and putting them in python variables (to use them, of course) takes a bit of time; the problem is that I don't want my backend to extract info from all these files everytime I get a request. The best way I could think of is literally copying the raw values from these files and putting them in python variables using the = operator, but that would be atleast a 100,000 lines of code then. Is there some way to pre-define certain variables in my backend? i.e. a variable that would get defined on a run and then every-time I get a request, I'll just use these pre-defined variables instead of loading them again.

Data description on the fly using CKAN/Python

I wish to access the data found in a resource that a user has uploaded (i.e. after they have clicked on the 'upload' button and the user is idle) in CKAN, generate a list of the attribute names in the data using Python and relevant modules, and use the generated list to develop HTML drop-down lists to allow the user to map their data attribute names to my data attribute names which are collected from a PSQL database.
This sounds like a plugin but from my preliminary research I didn't find any which could help.
Does this sound like a viable project?
Any rough blueprints of efficient strategy to go about it?

Store simple user settings in Python

I am programming a website in which users will have a number of settings, such as their choice of colour scheme, etc. I'm happy to store these as plain text files, and security is not an issue.
The way I currently see it is: there is a dictionary, where all the keys are users and the values are dictionaries with the users' settings in them.
For example, userdb["bob"]["colour_scheme"] would have the value "blue".
What is the best way to store it on file? Pickling the dictionary?
Are there better ways of doing what I am trying to do?
I would use the ConfigParser module, which produces some pretty readable and user-editable output for your example:
[bob]
colour_scheme: blue
british: yes
[joe]
color_scheme: that's 'color', silly!
british: no
The following code would produce the config file above, and then print it out:
import sys
from ConfigParser import *
c = ConfigParser()
c.add_section("bob")
c.set("bob", "colour_scheme", "blue")
c.set("bob", "british", str(True))
c.add_section("joe")
c.set("joe", "color_scheme", "that's 'color', silly!")
c.set("joe", "british", str(False))
c.write(sys.stdout) # this outputs the configuration to stdout
# you could put a file-handle here instead
for section in c.sections(): # this is how you read the options back in
print section
for option in c.options(section):
print "\t", option, "=", c.get(section, option)
print c.get("bob", "british") # To access the "british" attribute for bob directly
Note that ConfigParser only supports strings, so you'll have to convert as I have above for the Booleans. See effbot for a good run-down of the basics.
Using cPickle on the dictionary would be my choice. Dictionaries are a natural fit for these kind of data, so given your requirements I see no reason not to use them. That, unless you are thinking about reading them from non-python applications, in which case you'd have to use a language neutral text format. And even here you could get away with the pickle plus an export tool.
I don't tackle the question which one is best. If you want to handle text-files, I'd consider ConfigParser -module. Another you could give a try would be simplejson or yaml. You could also consider a real db table.
For instance, you could have a table called userattrs, with three columns:
Int user_id
String attribute_name
String attribute_value
If there's only few, you could store them into cookies for quick retrieval.
Here's the simplest way. Use simple variables and import the settings file.
Call the file userprefs.py
# a user prefs file
color = 0x010203
font = "times new roman"
position = ( 12, 13 )
size = ( 640, 480 )
In your application, you need to be sure that you can import this file. You have many choices.
Using PYTHONPATH. Require PYTHONPATH be set to include the directory with the preferences files.
a. An explicit command-line parameter to name the file (not the best, but simple)
b. An environment variable to name the file.
Extending sys.path to include the user's home directory
Example
import sys
import os
sys.path.insert(0,os.path.expanduser("~"))
import userprefs
print userprefs.color
For a database-driven website, of course, your best option is a db table. I'm assuming that you are not doing the database thing.
If you don't care about human-readable formats, then pickle is a simple and straightforward way to go. I've also heard good reports about simplejson.
If human readability is important, two simple options present themselves:
Module: Just use a module. If all you need are a few globals and nothing fancy, then this is the way to go. If you really got desperate, you could define classes and class variables to emulate sections. The downside here: if the file will be hand-edited by a user, errors could be hard to catch and debug.
INI format: I've been using ConfigObj for this, with quite a bit of success. ConfigObj is essentially a replacement for ConfigParser, with support for nested sections and much more. Optionally, you can define expected types or values for a file and validate it, providing a safety net (and important error feedback) for users/administrators.
I would use shelve or an sqlite database if I would have to store these setting on the file system. Although, since you are building a website you probably use some kind of database so why not just use that?
The built-in sqlite3 module would probably be far simpler than most alternatives, and gets you ready to update to a full RDBMS should you ever want or need to.
If human readablity of configfiles matters an alternative might be the ConfigParser module which allows you to read and write .ini like files. But then you are restricted to one nesting level.
If you have a database, I might suggest storing the settings in the database. However, it sounds like ordinary files might suit your environment better.
You probably don't want to store all the users settings in the same file, because you might run into trouble with concurrent access to that one file. If you stored each user's settings as a dictionary in their own pickled file, then they would be able to act independently.
Pickling is a reasonable way to store such data, but unfortunately the pickle data format is notoriously not-human-readable. You might be better off storing it as repr(dictionary) which will be a more readable format. To reload the user settings, use eval(open("file").read()) or something like that.
Is there are particular reason you're not using the database for this? it seems the normal and natural thing to do - or store a pickle of the settings in the db keyed on user id or something.
You haven't described the usage patterns of the website, but just thinking of a general website - but I would think that keeping the settings in a database would cause much less disk I/O than using files.
OTOH, for settings that might be used by client-side code, storing them as javascript in a static file that can be cached would be handy - at the expense of having multiple places you might have settings. (I'd probably store those settings in the db, and rebuild the static files as necessary)
I agree with the reply about using Pickled Dictionary. Very simple and effective for storing simple data in a Dictionary structure.
If you don't care about being able to edit the file yourself, and want a quick way to persist python objects, go with pickle. If you do want the file to be readable by a human, or readable by some other app, use ConfigParser. If you need anything more complex, go with some sort of database, be it relational (sqlite), or object-oriented (axiom, zodb).

Categories

Resources