I want to implement a new model language for spaCY.
I have installed spaCy (using the guide of the official web site) on my Windows SO but I haven't understand where and how I could write and run my future files.
Help me, Thanks.
I hope I understand your question correctly: If you only want to use spaCy, you can simply create a Python file, import spacy and run it.
However, if you want to add things to the spaCy source – for example to add new language data that doesn't yet exist – you need to compile spaCy from source. On Windows, this needs a little more preparation – but it's not that difficult:
Install the Visual C++ Build Tools, which include the compiler you need.
Fork and clone the spaCy repository on GitHub.
Navigate to that directory and install spaCy's dependencies (other packages plus developer requirements like Cython) by running pip install -r requirements.txt.
Then run python setup.py build_ext --inplace from the same directory. This will build and compile spaCy into the directory.
Make sure your PYTHONPATH is set to the new spaCy directory. This is important so Python knows that you want to execute this exact version of spaCy, and not some other one you have installed somewhere else. On Windows, I normally use this command: set PYTHONPATH=C:\path\to\spacy\directory. There's also this thread with more info. (I'm no Windows expert, though – so if anyone reads this and disagrees, feel free to correct me here.)
You can now edit the source, add files and run them. If you want to add a new language, I'd recommend starting by adding a new directory to spacy/lang and creating an __init__.py. You can find more info on how this should look in the usage guide on adding languages.
To test if everything works, start the Python interpreter and import and initialise your language. For example, let's assume you've added Icelandic. You should then be able to do this:
from spacy.lang.is import Icelandic
nlp = Icelandic()
Related
When working with JVM languages a pattern commonly followed is to use a build system (ant+ivy / maven / gradle), where using a build file, the dependencies of your code can be defined. The build system is able to fetch these dependencies when you build your code. Moreover IDEs like Eclipse/IntelliJ are also able to read these build files and continuously build/verify your code as you write it.
How is something similar done while developing in Python? While there may not necessarily be a build step, I want a developer to be able to checkout my code and then run a single bootstrap command that will setup a virtualenv and pull in any thirdy-party dependencies necessary to run the code. I could include some sort of a script to do this, but I am wondering if there is a tool to do this? Most of my search so far has led me to packaging tools, which are more for distribution to end-user than for this purpose (or so I understand).
This is managed by virtualenv and the pip install -r requirements.txt command. More info here: Virtual Environments
I guess requirements.txt is what you are looking for. For example, PyCharm IDE will definitely see it as a dependency list.
Using from setuptools.command.install import install, I can easily run a custom post-install script if I run python setup.py install. This is fairly trivial to do.
Currently, the script does nothing but print some text but I want it to deal with system changes that need to happen when a new package is installed -- for example, back up the database that the package is using.
I want to generate the a Python wheel for my package and then copy that and install it on a a set of deployment machines. However, my custom install script is no longer run on the deployment machine.
What am I doing wrong? Is that even possible?
Do not mix package installation and system deployment
Installation of Python packages (using any sort of packaging tools or formats) shall be focused on making that package usable from Python code.
Deployment, what might include database modifications etc. is definitely out of scope and shall be handled by other tools like fab, salt-stack etc.
The fact, that something seems fairly trivial does not mean, one shall do it.
The risk is, you will make your package installation difficult to reuse, as it will be spoiled by others things, which are not related to pure package installation.
The option to hook into installation process and modify environment is by some people even considered flaw in design, causing big mess in Python packaging situation - see Armin Roacher in Python Packaging: Hate, Hate, Hate Everywhere, chapter "PTH: The failed Design that enabled it all"
PEP 427 which specifies the wheel package format does not leave any provisions for custom pre or post installation scripts.
Therefore running a custom script is not possible during wheel package installation.
You'll have to add the custom script to a place in your package where you expect the developer to execute first.
This is probably a question that has a very easy and straightforward answer, however, despite having a few years programming experience, for some reason I still don't quite get the exact concepts of what it means to "build" and then to "install". I know how to use them and have used them a lot, but have no idea about the exact processes which happen in the background...
I have looked across the web, wikipedia, etc... but there is no one simple answer to it, neither can I find one here.
A good example, which I tried to understand, is adding new modules to python:
http://docs.python.org/2/install/index.html#how-installation-works
It says that "the build command is responsible for putting the files to install into a build directory"
And then for the install command: "After the build command runs (whether you run it explicitly, or the install command does it for you), the work of the install command is relatively simple: all it has to do is copy everything under build/lib (or build/lib.plat) to your chosen installation directory."
So essentially what this is saying is:
1. Copy everything to the build directory and then...
2. Copy everything to the installation directory
There must be a process missing somewhere in the explanation...complilation?
Would appreciate some straightforward not too techy answer but in as much detail as possible :)
Hopefully I am not the only one who doesn't know the detailed answer to this...
Thanks!
Aivoric
Building means compiling the source code to binary in a sandbox location where it won't affect your system if something goes wrong, like a build subdirectory inside the source code directory.
Install means copying the built binaries from the build subdirectory to a place in your system path, where they become easily accessible. This is rarely done by a straight copy command, and it's often done by some package manager that can track the files created and easily uninstall them later.
Usually, a build command does all the compiling and linking needed, but Python is an interpreted language, so if there are only pure Python files in the library, there's no compiling step in the build. Indeed, everything is copied to a build directory, and then copied again to a final location. Only if the library depends on code written in other languages that needs to be compiled you'll have a compiling step.
You want a new chair for your living-room and you want to make it yourself. You browse through a catalog and order a pile of parts. When they arrives at your door, you can't immediately use them. You have to build the chair at your workshop. After a bit of elbow-grease, you can sit down in it. Afterwards, you install the chair in your living-room, in a convenient place to sit down.
The chair is a program you want to use. It arrives at your house as source code. You build it by compiling it into a runnable program. You install it by making it easier to use.
The build and install commands you are refering to come from setup.py file right?
Setup.py (http://docs.python.org/2/distutils/setupscript.html)
This file is created by 3rd party applications / extensions of Python. They are not part of:
Python source code (bunch of c files, etc)
Python libraries that come bundled with Python
When a developer makes a library for python that he wants to share to the world he creates a setup.py file so the library can be installed on any computer that has python. Maybe this is the MISSING STEP
Setup.py sdist
This creates a python module (the tar.gz files). What this does is copy all the files used by the python library into a folder. Creates a setup.py file for the module and archives everything so the library can be built somewhere else.
Setup.py build
This builds the python module back into a library (SPECIFICALLY FOR THIS OS).
As you may know, the computer that the python library originally came from will be different from the library that you are installing on.
It might have a different version of python
It might have a different operating system
It might have a different processor / motherboard / etc
For all the reasons listed above the code will not work on another computer. So setup.py sdist creates a module with only the source files needed to rebuild the library on another computer.
What setup.py does exactly is similar to what a makefile would do. It compiles sources / creates libraries all that stuff.
Now we have a copy of all the files we need in the library and they will work on our computer / operating system.
Setup.py install
Great we have all the files needed. But they won't work. Why? Well they have to be added to Python that's why. This is where install comes in. Now that we have a local copy of the library we need to install it into python so you can use it like so:
import mycustomlibrary
In order to do this we need to do several things including:
Copy files to their library folders in our version of python.
Make sure library can be imported using import command
Run any special install instructions for this library. (seting up paths, etc)
This is the most complicated part of the task. What if our library uses BeautifulSoup? This is not a part of Python Library. We'd have to install it in a way such that our library and any others can use BeautifulSoup without interfering with each other.
Also what if python was installed someplace else? What if it was installed on a server with many users?
Install handles all these problems transparently. What is does is make the library that we just built able to run. All you have to do is use the import command, install handles the rest.
I'm following Google's OR-Tools instructions and reading this instruction:
> "Then you can download all dependencies and build them using:
>
> make third_party"
What is this make command? Should I run it from Windows command prompt? Where is this third_party file located?
Sorry for this basic question. I'm new to this realm.
That page seems very clear to me.
Please make sure that svn.exe, nmake.exe and cl.exe are in your path.
You need to do exactly that. nmake.exe implements the make command, from the sound of things. As to where you should run this command, run it, as the page says, from the terminal in the Tools menu in Visual Studio.
NAME
make - GNU make utility to maintain groups of programs
SYNOPSIS
make [ -f makefile ] [ option ] ... target ...
Simply put make is a compilation tool, the Make command is a command used in Linux to 'make' all necessary recompilations. Make requires a configuration file. Once this file is constructed for your project, you usually type make to build the changed files.
Take a look at this link for some make examples.
http://linuxdevcenter.com/pub/a/linux/2002/01/31/make_intro.html
As per the link you provided, the instructions are straight forward:
Compiling libraries
All build rules use make (gnu make), even on windows. A make.exe binary is provided in the tools sub-directory; They are providing you with the make.exe, which means that in Windows you can use svn.exe to execute the following commands, just make sure you are within the path that includes the make binary.
If you do not find svn.exe, please install a svn version that offers the command line tool.
http://www.collab.net/downloads/subversion
Just execute the following commands to build the dependencies:
make
To compile in debug mode while in windows, use the following:
make DEBUG="/Od /Zi" all
If you need to clean everything and do it again, run:
make clean
This will clean all downloaded sources, all compiled dependencies, and Makefile.local. It is useful to get a clean state, or if you have added an archive in dependencies.archives.
Finally, to compile the library run:
make all
When everything is compiled, you will find under or-tools/bin and or-tools/lib:
some static libraries (libcp.a, libutil.a and libbase.a, and more)
One binary per C++ example (e.g. nqueens)
C++ wrapping libraries (pywrapcp.so, linjniwrapconstraint_solver.so)
Java jars (com.google.ortools.constraintsolver.jar...)
C# assemblies
Then we can edit the MakeFile.local
First off, download Python 2.7 and JDK 7, install them.
Edit Makefile.local to point to the correct Python and Java installation. For instance, on my system, it is:
WINDOWS_JDK_DIR = c:\\Program Files\\Java\\jdk1.7.0_02
WINDOWS_PYTHON_VERSION = 27
WINDOWS_PYTHON_PATH = C:\\python27
Afterwards, to use python, you need to install google-apputils.
cd dependencies/sources/google-apputils
c:\python27\python.exe setup.py install
I want to distribute some python code, with a few external dependencies, to machines with only core python installed (and users that unfamiliar with easy_install etc.).
I was wondering if perhaps virtualenv can be used for this purpose? I should be able to write some bash scripts that trigger the virtualenv (with the suitable packages) and then run my code.. but this seems somewhat messy, and I'm wondering if I'm re-inventing the wheel?
Are there any simple solutions to distributing python code with dependencies, that ideally doesn't require sudo on client machines?
Buildout - http://pypi.python.org/pypi/zc.buildout
As sample look at my clean project: http://hg.jackleo.info/hyde-0.5.3-buildout-enviroment/src its only 2 files that do the magic, more over Makefile is optional but then you'll need bootstrap.py (Make file downloads it, but it runs only on Linux). buildout.cfg is the main file where you write dependency's and configuration how project is laid down.
To get bootstrap.py just download from http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py
Then run python bootstap.py and bin/buildout. I do not recommend to install buildout locally although it is possible, just use the one bootstrap downloads.
I must admit that buildout is not the easiest solution but its really powerful. So learning is worth time.
UPDATE 2014-05-30
Since It was recently up-voted and used as an answer (probably), I wan to notify of few changes.
First of - buildout is now downloaded from github https://raw.githubusercontent.com/buildout/buildout/master/bootstrap/bootstrap.py
That hyde project would probably fail due to buildout 2 breaking changes.
Here you can find better samples http://www.buildout.org/en/latest/docs/index.html also I want to suggest to look at "collection of links related to Buildout" part, it might contain info for your project.
Secondly I am personally more in favor of setup.py script that can be installed using python. More about the egg structure can be found here http://peak.telecommunity.com/DevCenter/PythonEggs and if that looks too scary - look up google (query for python egg). It's actually more simple in my opinion than buildout (definitely easier to debug) as well as it is probably more useful since it can be distributed more easily and installed anywhere with a help of virtualenv or globally where with buildout you have to provide all of the building scripts with the source all of the time.
You can use a tool like PyInstaller for this purpose. Your application will appear as a single executable on all platforms, and include dependencies. The user doesn't even need Python installed!
See as an example my logview package, which has dependencies on PyQt4 and ZeroMQ and includes distributions for Linux, Mac OSX and Windows all created using PyInstaller.
You don't want to distribute your virtualenv, if that's what you're asking. But you can use pip to create a requirements file - typically called requirements.txt - and tell your users to create a virtualenv then run pip install -r requirements.txt, which will install all the dependencies for them.
See the pip docs for a description of the requirements file format, and the Pinax project for an example of a project that does this very well.