Using ConceptNet with Divisi reasoning Toolkit - python

I am trying to use ConceptNet with the divisi2 package. The divisi package is particularly designed for working with knowledge in semantic networks. This package takes up the graph as a input and convert it into SVD forms. With the package distribution they had provided the basic conceptNet data into graph format, this data seems to be outdated. Divisi can be used in this way Using Divisi with conceptNet(link). But the data needs to be updated with conceptNet5 data, is there any way to do that. Provided I have all the conceptNet data setted up locally as described in Running your own copy. So, the sqlite database I have all the data. Also I have the data into csv formats separately. So how can I load this data into Divisi package. Thanks.

Related

Store and reuse tsfresh feature engineering performed

I am currently using the tsfresh package for a project (predictive maintenance).
It is working really well and now I want to implement it live.
However, the issue is that I don't know how to store the feature engineering that has been applied to my original dataset in order to do the same feature engineering to the data that I am streaming (receiving live).
Do you have any idea if there is a parameter or a function that allows to store the feature engineering performed by tsfresh?
(I am using the extract_relevant_features function).
After searching through various post it turns out that the answer is that you can save your parameter into a dictionnary (see here).
This dictionnary can be can later be called with the function extract_features to extract only those parameters.

Numpy, Pandas counterpart in .Net or .Netcore

In ML.Net what are the counterparts of Numpy/ Pandas python libraries?
Here are all the available .NET counterparts that I know of:
Numpy
there are a few Tensor type proposals in dotnet/corefx:
https://github.com/dotnet/corefx/issues/25779
https://github.com/dotnet/corefx/issues/34527
There is also an implementation of NumPy made by the SciSharp org.
Pandas
On dotnet/corefx there is a DataFrame Discussion issue, which has spawned a dotnet/corefxlab project to implement a C# DataFrame library similar to Pandas.
There are also other DataFrame implementations:
Deedle
SciSharp Pandas.NET
ML.NET
In ML.NET, IDataView is an interface that abstracts the underlying storage for tabular data, ex. a DataFrame. It doesn't have the full rich APIs like a Pandas DataFrame does, but instead it supports reading data from any underlying source - for example a text file, SQL table, in-memory objects, etc.
There currently isn't a "data exploration" API in ML.NET v1.0, like you would have with a Pandas DataFrame. The current plan is for the corefxlab DataFrame class to implement IDataView, and then you can use DataFrame to do the data exploration, and feed it directly into ML.NET.
UPDATE: For a "data exploration" API similar to Pandas, check out the Microsoft.Data.Analysis package, which is currently in preview. It implements IDataView and can be fed directly into ML.NET to train or make predictions.
It is mostly the regular .NET types + the IDataView types.
The document is a bit out of date.

Does Seaborn's sns.load_dataset() function use real data?

I know all of the datasets that can be loaded with sns.load_dataset() are all example datasets, used for Seaborn's documentation, but do these example datasets use actual data?
I'm asking because I want to know if it's useful to pay attention to the results I get as I play around with these datasets, or if I should just see them as solely a means to learning the module.
The data does appear to be real. This is not formally documented by Seaborn, but:
Several of the dataset are "real" well-known datasets that can be verified elsewhere, such as the Iris dataset hosted on UCI's Machine Learning repository.
All of the data are sourced from https://github.com/mwaskom/seaborn-data, and in turn from actual CSVs on Michael Waskom's (core Seaborn developer) local drive, it appears. If the data were random/fake, it is more likely it would be generated by Python libraries such as NumPy.

Python GeoJson to GML conversion

I have a python application that creates polygons to identify geographic areas of interest at specific times. To this point I've been using geojson because of the handy geojson library that makes writing it easy. I put the time information in the file name. However now I need to publish my polygons via a WMS with TIME (probably going to use mapserver). As geojson doesn't appear to support a feature time and geojson-events hasn't been accepted yet, I thought I would try to convert to GML,however I cannot seem to locate a library that would make writing GML from python simple. Does one exist? I tried using the geojson-events format and then ogr2ogr to convert from geojson-events to gml but the time information gets dropped.
So looking for either:
a) an efficient way to write GML from python,
b) a way to encode datetime information into geojson such that ogr will recognize it or
c) another brilliant solution I haven't thought of.
To convert GeoJSON into GML you could use GDAL (Geospatial Data Abstraction Library). There are numerous ways of using the library including directly with Python
However as you want to set up a WMS to serve your data, you might want to set up a spatial database, for example PostgreSQL/PostGIS, and import the GeoJSON directly into the database, then allow MapServer to do the conversion for you.
See Store a GeoJSON FeatureCollection to postgres with postgis for details of how you might do this.

Saving models from Python

Is it possible to save a predictive model in Python?
Background: I'm building regression models in SPSS and I need to share those when done. Unfortunately, no one else has SPSS.
Idea: My idea is to make the model in Python, do something XYZ, use another library to convert XYZ into and exe that will pick up a csv file with data and spit out the model fit results on that data. In this way, I can share the model with anyone I want without the need of SPSS or other expensive software
Challenge: I need to find out XYZ, how do I save the instance when the model is built. For example, in case of linear/logistic, it would be the set of coefficients.
PS: I'm using linear/logistic as examples, in reality, I need to share more complex models like SVM etc.
Using FOSS (Free & Open Source Software) is great to facilitate collaboration. Consider using R or Sage (which has a Python backbone and includes R) so that you can freely share programs and data. Or even use Sagemath Cloud so that you can work collaboratively in real-time.
Yes, this is possible. What you're looking for is scitkit-learn in combination with joblib. A working example of your problem can be found in this question.

Categories

Resources