Solr Search Spell check and Stemming configuration without using text file - python

I need help to get some information on Solr-Search. Below is the problem statement:
Problem Statement
Need to implement spell check functionality (same as google did you mean).
Stemming of search words. e.g. dose, dossier, dosing. If some one search for dose result will be also for dossier and dosing.
Requirement
Need to implement both of the functionality without using any manual text file like spellcheck.txt for spell check and synonym.txt for stemming. I want it to be configured through search engine and want taht it use some general English dictionary.
My Understanding
Solr does not provide any dictionary . Spell check can be implemented by providing a text file for spell check..
For stemming also we need to upload txt file.
I need to mention this in schema.xml present in solr. These text files need to be maintained manually.
I need to confirm that is there any other way to configure a general dictionary with Solr or any other way we can achieve these requirements through Solr configuration changes without using text files.

You can use the DirectSolrSpellcheck so no dictionaries are needed.
You don't need text files for stemming, just an analyzer.

Related

Is there a way to scan a document and copy specific information into a word template?

Essentially I need to be able to scan an invoice, pull only the names and then insert those names into a word template for printing. Preferably the solution will open multiple word documents at a time and the user will only have to hit print. The main issue is that there needs to be minimal interaction from a user perspective. I'm strong in python and weak in Java if that helps
You can achieve that in Java using Tesseract OCR and Apache POI
https://github.com/tesseract-ocr/tesseract
https://poi.apache.org/

How to create a dynamic form with python using translated text as input?

I have an original text that I want to translate. I normally do it manually but I know I could save a lot of time translating automatically the most frequent words and expressions.
I will find out how to translate simple words, the problem is not here. I have read some books on python and I think using string manipulations can be done.
But I am lost about how to create the output file.
The output file will contain:
short empty forms ready to be filled wherever there is text that has not been translated
the translated words wherever they were in the original file
In the output file I will fill manually the empty forms, after pressing Tab the cursor should jump to the next exmpty form
I am lost here, I know how to do forms on html but the language I am used to is Python.
I would like to know what modules from Python I could use. I need some guidance on this.
Can you recommend me a book or a tool that explains how to do something similar to this?
This is what I want to do, assuming I have managed to create a simple database to translate colors from Spanish to English.
The first step contains the original file.
The second step contains the automatic translation.
In the third step I complete the manual translation.
After finishing everything is grouped into a normal txt file ready to be used.
I think it is quite clear. I don't expect people to tell me the code to do this, I just need to know what tools could be used to achieve my goal.
Thanks for editing.
To create an interface that works with a web browser, Flask for Python is a good method for creating webforms. There are tutorials available.
One method for storing data would be an SQLite file. That may be more than you need, so I'd recommend starting with a CSV file. Libraries exist in Python for both CSVs and SQLite.

What python modules should I use to edit a word document then turn it to a pdf?

I want users to be able to create the report template in Microsoft Word, I'll then probably add document fields. Then the script evaluates a number of things adds the appropriate text to the fields then creates a pdf of the filled in form.
So which modules would be best for this? I've looked at reportlab but I need to work from a pre-generated template and that doesn't seem feasible.
If you will use it only under Windows, having Word installed you could use PyWin32 that lets you access the api of the suite. You could also try IronPython as suggested here.
If you need to read a docx template regardless of the platform you could try this outdated extension.
If it suits your application to use a cloud service to populate Doc/DocX files there is a commercial system called Docmosis that can popluate plain-text (or merge) fields and stream back populated PDF documents to your Python system, or deliver via email etc.
You would upload your "template" Doc files to Docmosis via the Website (or api calls) then invoke Docmosis using a https post from your Python code.
Please note I work for the company that created Docmosis.
Hope that helps.

How to Customize the full text search function provided by postgresql

I am trying to customize the postgresql full text search functionality so that
For example, I can enter "Stouffers" and get a match on Stouffer's frozen foods. If I leave off one of the "f"s and spell it "Stoufers" I don't get a match. That's one of the things that the customized text search is supposed to handle. It converts all the text into a phonetic type code and searches on that.
Please help me in this how i can achive that.
i found some help that i need to write a custom parser in C for doing this but i am very poor in C.
Maybe you can try using the ISpell dictionary for postgresql or Tsearch2 which has some spelling correction module.
Or get a standalone search engine such as Solr or Xapian with stemming, spelling, phonetic etc.
Django-haystack brings you both of them.

How to programmatically insert comments into a Microsoft Word document?

Looking for a way to programmatically insert comments (using the comments feature in Word) into a specific location in a MS Word document. I would prefer an approach that is usable across recent versions of MS Word standard formats and implementable in a non-Windows environment (ideally using Python and/or Common Lisp). I have been looking at the OpenXML SDK but can't seem to find a solution there.
Here is what I did:
Create a simple document with word (i.e. a very small one)
Add a comment in Word
Save as docx.
Use the zip module of python to access the archive (docx files are ZIP archives).
Dump the content of the entry "word/document.xml" in the archive. This is the XML of the document itself.
This should give you an idea what you need to do. After that, you can use one of the XML libraries in Python to parse the document, change it and add it back to a new ZIP archive with the extension ".docx". Simply copy every other entry from the original ZIP and you have a new, valid Word document.
There is also a library which might help: openxmllib
If this is server side (non-interactive) use of the Word application itself is unsupported (but I see this is not applicable). So either take that route or use the OpenXML SDK to learn the markup needed to create a comment. With that knowledge it is all about manipulating data.
The .docx format is a ZIP of XML files with a defines structure, so mostly once you get into the ZIP and get the right XML file it becomes a matter of modifying an XML DOM.
The best route might be to take a docx, copy it, add a comment (using Word) to one, and compare. A diff will show you the kind of elements/structures you need to be looking up in the SDK (or ISO/Ecma standard).

Categories

Resources