I hope you are feeling good and safe.
I'm working on Natural language processing project for my master degree, and I do need to translate
my local dialect to ENG, and I noticed that Facebook translate machine did very well with my local dialect.
So my question is there any way to use Facebook translate service in my project, like is there any api or python module that use it.
Which language is your local language?
Facebook has many machine translation models, so it depends on how good it has to be and how much computing power you have. I am not sure if they offer their latest state-of-the-art ones that they use in their products as an independent translation tool as well.
First Option: Run full models locally
One way would be using one of their models on huggingface (see the "Generation" part):
https://huggingface.co/docs/transformers/model_doc/m2m_100#training-and-generation
They also have some easy-to-use pretrained models in their torch.hub module (but that probably doesnt cover your local language):
https://github.com/pytorch/fairseq/blob/main/examples/translation/README.md
Second Option: APIs
As I said it depends on what quality you need, you could try out some easy-to-use (non-facebook) APIs and see how far that gets you, as this is much easier and you can use them online:
e.g. https://libretranslate.com/
Or check out this comparison of APIs: https://rapidapi.com/collection/google-translate-api-alternatives
APIs are usually limited to a maximum number of characters/words/requests per month/day/minute so you'll have to see if that is enough for your case and if the quality is acceptable.
Third Option: pip packages which use APIs
For example check out: https://pypi.org/project/deep-translator/
Fourth Option: pip wrapper packages which run translation locally
A great package which actually has some pretty strong facebook MT models is: https://pypi.org/project/EasyNMT/ (it also has their strong m2m_100 models)
More lightweight but probably not as strong: https://github.com/argosopentech/argos-translate
Conclusion:
Since I assume your local language is not supported by that many models I would first try the fourth option (start with biggest models and if it doesnt work try smaller ones).
If that doesnt work out you can try if you can get APIs work for your case.
If you have a lot of computing power and want to go a bit deeper you can run the full model inference locally using huggingface or fairseq.
Related
I am building a simple python api to search addresses in a database using free form text. I would like to use the libpostal library to convert the inputs to something specific (structured) that makes lookups in my DB quick.
However, I am hoping to do this very lean on AWS or GCP (read: cheap) and since the libpostal trained model seems to be almost 4gb because it includes all the world, I was wondering if anyone knows a way to do this for a specific country only (the U.K. in my case).
Why is Kinto using Cliquet and what is the difference between the two ?
Disclaimer: I am one of the authors of both tools. Since this is frequently asked question, I thought it would be relevant to share a proper answer here :)
At Mozilla Services we regularly implement and deploy micro-services.
Since most services share the same production requirements (in terms of monitoring, REST protocols etc.), we decided to develop and package a reusable toolkit using Cornice.
Kinto is one of those services. It uses Cliquet as one of its core libraries.
The Kinto HTTP API is made of several REST endpoints, that all share a set of common properties (filtrable, sortable etc.). The common code base for those REST resources is implemented as a reusable class in Cliquet.
We really like the name Cliquet. However, given the confusion of its scope, we will probably (some day) split it into two packages, called like cornice-mozprod and cornice-crud.
Kinto and Cliquet have now be merged together, and cliquet is no longer a thing.
See all the nifty details at https://mail.mozilla.org/pipermail/kinto/2016-May/000119.html
Can someone help me out with some suggestion for a full-text searching engine that supports Python?
Right now we have a MySQL database in place and I'd like to add the ability to have a full-text search engine index some of the text in some of the tables in this database. This text data would be used by a web application to search for the corresponding records in the database. For instance, index the customer name information in our customer table, full text search that with the web application to get the MySQL record for the customer.
I've looked (briefly) at Lucene, Swish-E and MongoDB, and few others, but I'm not sure what would be a good choice for me considering a couple of things:
I'm not a Java guy (though I've been programming for a long time),
we only want to search a relatively small set of data,
we're looking to index text in a MySQL database,
and would like that index to be updated in semi-realtime.
Any hints, tips or pointers would be greatly appreciated!
Have a look at Whoosh. I've heard it doesn't scale up terribly well (maybe that's fixed now) but for small collections, it might be useful.
For a scalable solution, consider using Lucene with PyLucene or Jython.
Building pylucene a few months ago was one of the most painful experiences I had. The project won't get any traction IMHO if it's so hard to build.
With a few other folks having the same itch to scratch, we started https://code.google.com/a/apache-extras.org/p/pylucene-extra/ to gather prebuilt pylucene and jcc eggs on several operating systems, Python versions and Java runtimes combos. It is not very active lately, though.
Whoosh might be a good fit, or you may want to have a look at Sphinx, ElasticSearch or HaystackSearch (CAVEAT: I did not work on any of these).
Or maybe try to access Solr via python (there are a few APIs), which might be much easier than using pylucene. Consider that lucene will still need a JVM to run, of course.
Since you don't have huge scalability needs, I would focus on simple usage and community support rather than performance and scale. Hope it helps.
Solr is a great wrapper to Lucene, it greatly simplifies things. It doesn't require any Java tinkering for most things, you just need to configure some XML files. It does run as another process, so this may complicate your deployment.
I have had great results with pysolr, but really, you could write your own python communication library since Solr uses REST, so it is really simple to send and retrieve data in either xml or json.
The site won't be that complicated and will resemble a modern blog (users, messages, news and other similar features).
Do I need to use a framework for this, and if so, which is best?
Pyramid, Django?
You certainly don't need a webframework to create a simple website. Given that you're new to python and interested in building a python website, I imagine this implies: you're interested in learning python. If you're exclusively interested in learning django-python, there's no reason you can't jump in to django, as Ronak said, of course. He's right. It has a lot of documentation. But it will make for somewhat of an odd intro to python.
If I were in your shoes, I'd either start making some offline programs first, or consider an ultra-lightweight framework. Many would advocate web2py or pyramid for ultralightweight. I might consider going even lighter. Something like Bottle, where you're more or less just pairing functions with urls. This way you can at least do a bit of hacking/trial-and-error, instead of launching right into django.
It's not that django doesn't use python-- it will tell you many times that it is in fact 'just python.' But it's adapted at its core to be used in a large business setting (the chicago something or other online, i think). So it enforces various rules that are helpful in managing many different employees working on a project together. You may or may not wish for this kind of 'help.' It also means the scale of projects is assumed to be large and the time-horizon, limitless. If you want to see how a python dictionary works, you may not want to spend a long time configuring settings and creating the pseudo-static-typing you need for your database, and so on, just to execute your project and see a result.
I realize I will automatically get downvoted for this, but I believe it to be sound advice.
It depends on what kind of website you are planning to come up with. If the website is going to be just a set of static HTML files, then you don't really need a framework. But if your website will have lots of dynamic content that will get updated on regular basis, you should go with some framework. That will make your life maintaining the website much more simpler.
Django is the most popular framework written in Python. It has very good documentation and a strong community base too.
Go with Django - 10,000 Elvis fans can't be wrong.
Or roll your own from scratch. You'll learn a lot, know everything about how you site works, and better appreciate what a framework does for you.
As RonakG first pointed, it all depends on the kind of website you intend to have up and running. Actually, your question is too general for a single, definitive answer. There are more aspects to consider other than just being in python. For example, deadlines. This means considering the learning curve to achieve your results. If you don't have much time, a steep learning curve (time to learn it in order to develop it) is certainly something you will want to avoid. Perhaps you already develop in other languages, and need integration and/or migration support, need scalability, reusability, etc, etc, etc.
Another thing that is not so clear in your question is what you mean by "The site won't be that complicated and will resemble a modern blog (users, messages, news and other similar features)". If it really resembles just a modern blog, with users, messages and news, you could google for CMS (Content Management Systems). There are many options available, that could make you have your site up and running in almost no-time. All you'll have to learn is how to customize whatever it has to as to comply to your needs.
That said, if you prefer python, there are some good CMSs available which you can develop your site fast, like Plone. And if you prefer Django, there's Django CMS and there's the excellent Pinax project, which takes the django code reusability to deliver you sample fully customizable, complete websites.
I am having to look into some code and consider working in a Python framework called Glashammer.
I know and love Django. I have some experience with Appengine native framework and Django on Appengine.
I'd like to know from you that have used one or more of those, how Glahammer compares and contrasts with others. What are there any Pros and Cons and what else do I need to be aware of.
I am highly biased, because I am the Glashammer author. But the pros for me are:
Werkzeug-based framework removes much of the boilerplate in creating Werkzeug based
applications
Easy pluggability and high flexibility: 2 levels of plugins, Appliances which are reusable components and Bundles which are behavioural
modifiers.
Well unit tested
Documentation is not bad (for an open source project)
Versus something like Django, I would just have to say "Werkzeug-based, with a nicer plugin framework."
Did I mention the code is beautiful like a glowing orb of ... (oh maybe this is subjective)
After a bit of googling (and finding your question:) and half an hour of reading docs and code I can say that
Glashammmer is great because it:
is well-documented;
is lightweight and very flexible;
provides almost everything to rapidly build a complex web app -- unlike Werkzeug itself;
does not suffer from NIH syndrome which is arguably the Django's greatest wart;
does not impose database-related libraries and thus supports whatever storage one could use from Python. Django only supports a number of relational databases and assumes you are happy with them. Of course you can drop Django ORM but this renders admin -- the Django's strongest point -- useless;
appliances are the best way to define views I've seen so far.
Galashammer is not so great because it:
has shorter development history and much smaller community than Django's, which leads to:
inevitably lower quality of core code, and
inevitably lower quantity of contributed code;
makes use of some components that may be unstable (e.g. flatland which is in alpha stage, though it's an arbitrary label and may be irrelevant to quality; moreover, it's only used in glashammer.utils.yconfig);
does not provide an API to define models (e.g. some declarative wrapper with backends), so the "pluginability" of applications can be significantly weaker that in Django (applications will make too many assumptions about the environment).
Anyway, I think this framework is worth diving into.