Plural of words using Open Office API for Python (UNO) - python

I would like to retrieve the plural words in different languages in Python.
I know that openoffice has an API called uno (import uno) and it should give me this ability using openoffice's language dictionaries, but I could not find any reference to it.
As a concrete example, I would something like this:
>>> print getPluralOf('table')
tables
One possibility is to download the dictionary files though this link and write a method to read the dictionary and form the plurals. But i can't believe that this is not available already using uno.
I appreciate any help

You can introspect the module with dir(uno) and then try dir() on uno.XXX, with whatever looks helpful. You can also use help() on uno and its members. I've never used it and I don't have access to OO on this computer so I can't help more than that...

Nodebox Linguistics includes a convenient function for pluralizing nouns, albeit only in English.
>>> import en
>>> en.noun.plural('table')
'tables'

Related

How to perform a Google search and take the text result?

Wondering how to use Python 3 to use Google to create a dictionary of some words (so say I enter a word, I want Python to take the definition that Google is able to give, then store or display it)
I haven't done much coding, but I know how to manage the words after. I'm just a bit confused using urllib and stuff. I have only been able to find help for this on other versions of Python, which I have not been able to replicate on Python 3.3.
EDIT: Yes, I want to use Google because I like the way it defines words and phrases, and I plan to use the define protocol you mentioned, icedtrees.
Edit: it appears that Google Search grabs its definitions using AJAX calls or something. The below solution will not work.
If you are having trouble using urllib2, I suggest the nice Python Requests package, which is a lot easier to use.
If you are absolutely committed to getting the Google definition and no other definition, I would suggest doing a HTTP request to a page using the Google Search "define" protocol.
For example:
https://www.google.com.au/search?q=define:test
You would then save the HTML result, and then parse it for the definitions that you require. Some examples of Python HTML parsers are the HTMLParser module, and also BeautifulSoup. However, this parsing operation seems pretty simple, so a basic regex should be more than enough. All definitions are stored as follows:
<div style="display:inline" data-dobid="dfn"> # the order of the style and the data-dobid can change
<span>definition goes here</span>
</div>
An example of a regex to grab the definitions of "test" from the HTML page:
import re
definitions = re.findall(r'data-dobid="dfn".*?>.*?\<span>(.*?)</span>.*?</div>', html, re.DOTALL)
>>> len(definitions)
18
>>> definitions[0]
'a\n procedure intended to establish the quality, performance, or \nreliability of something, especially before it is taken into widespread \nuse.'
# Looks like you might need to remove the newlines
>>> definitions[5]
'the result of a medical examination or analytical procedure.'
As a sidenote, there also exists a Google Dictionary API, which can give you definition results in JSON format in response to a request.

Microsoft Speech Recognition Custom Training

I have been wanting to create an application using the Microsoft Speech Recognition.
My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return
You said: At age C.
You said: Cern
While it did work for 'CERN', it failed very badly for 'LHC'.
However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user access the Speech Control Panel and run my training file.
All the links I have found for this have been frustratingly useless, as they just say things like 'This is ----, you should try going to the ---- forum instead'.
If it does help, here is a list of the links:
http://compgroups.net/comp.speech.users/add-my-own-training/153194
https://groups.google.com/forum/#!topic/microsoft.public.speech.server/v58SH1ov22s
http://social.msdn.microsoft.com/Forums/en/servercorefordevelopers/thread/f7a35f3f-b352-464a-b264-e16eb4afd049
Is my problem even possible? Or are the training files themselves in a special format? If so, can that format be reproduced?
A solution that can also work on Windows XP would be ideal.
Thanks in advance!
P.S. If there are any libraries or modules out there already for this, could anyone point me to some? A Python or C/C++ solution would be splendid. Also, since I'd rather not post another question regarding this, is it possible to utilize the train utilities from command prompt (or without the GUI visible, but still having total command of all controls)?
Okay, pulling this from a thing I wrote three or four years ago now, but I believe you want to do something like this.
The grammar library is a trained system which can recognize words. You can create your own grammar library cued to specific words.
C#, sorry
using System.Speech
using System.Speech.Recognition
using System.Speech.AudioFormat
SpeechRecognitionEngine sre = new SpeechRecognitionEngine();
string[] words = {"L H C", "CERN"};
Choices choices = new Choices(words);
GrammarBuilder gb = new GrammarBuilder(choices);
Grammar grammar = new Grammar(gb);
sre.LoadGrammar(grammar);
That is as far as I can get you. From docs it looks like you can define the pronunciations somehow. So perhaps that way you could have LHC map directly to a single word. Here are the docs on the grammar class - http://msdn.microsoft.com/en-us/library/system.speech.recognition.grammar.aspx
Small update - see example in their docs here http://msdn.microsoft.com/en-us/library/ms554228.aspx

Python pefile member question

Gang,
I apologize if this is a really dumb question... I am wanting to use the super convenient python script pefile (http://code.google.com/p/pefile/) that parses an executable and lists particular information about the PE structure. My question is where can I find information about how to access particular members of the executable? I've scoured the wiki and read the usage examples but that documentation only covered 4-5 members. What I am wondering is if you guys have a list of members I can access to display the information I care about. So specifically, if I wanted to list the Stack Commit Size of an executable, does it look like this: pe.FILE_HEADER.StackCommitSize, obviously I can run this code and figure it out but have you guys seen API DOC floating around that I find the members i need?
THANKS!
From the PE docstring:
Basic headers information will be available in the attributes:
DOS_HEADER
NT_HEADERS
FILE_HEADER
OPTIONAL_HEADER
All of them will contain among their attributes the members of the
corresponding structures as defined in WINNT.H
So, look at winnt.h and you'll see which attributes are available.
Or just read the source code for the module. It's big, but everything you need to know is in there.
You can generally find everything that is in a PE file in the Microsoft PE/COFF specification.
Once you've looked there, you know that the StackCommitSize is in the optional image header. Then all you have to do is to look for the corresponding structure in pefile, which usually bears a similar name, if not indeed the very same name. In this case:
pe = pefile.PE("C:\\Windows\\Notepad.exe")
print pe.OPTIONAL_HEADER.SizeOfStackCommit
Will give you what you want.
If you have trouble finding SizeOfStackCommit (after you've found it in the specification), just use your quick find on the source code. It's as easy to read as you can get, and I don't think you'll have any trouble finding the required structure.
Now, there probably aren't any API docs for pefile itself, but as you can see there's really no need for it, since it's just a nice Pythonic wrapper around the PE specification itself.

Media Kind in iTunes COM for Windows SDK

I recently found out about the awesomeness of the iTunes COM for Windows SDK. I am using Python with win32com to talk to my iTunes library. Needless to say, my head is in the process of exploding. This API rocks.
I have one issue though, how do I access the Media Kind attribute of the track? I looked through the help file provided in the SDK and saw no sign of it. If you go into iTunes, you can modify the track's media kind. This way if you have an audiobook that is showing up in your music library, you can set the Media Kind to Audiobook and it will appear in the Books section in iTunes. Pretty nifty.
The reason I ask is because I have a whole crap load of audiobooks that are showing up in my LibraryPlaylist.
Here is my code thus far.
import win32com.client
iTunes = win32com.client.gencache.EnsureDispatch('iTunes.Application')
track = win32com.client.CastTo(iTunes.LibraryPlaylist.Tracks.Item(1), 'IITFileOrCDTrack')
print track.Artist, '-', track.Name
print
print 'Is this track an audiobook?'
print 'How the hell should I know?'
Thanks in advance.
One reason why you may not be able to find it is that the atom structure the com object references may be out of date. The most popular list of atoms from the MP4 structure is here: http://atomicparsley.sourceforge.net/mpeg-4files.html I don't see a media kind atom. I suppose you could try to parse the structure through atomicparsley but to my knowledge it only finds atoms that it knows about.
Short Answer: The COM object may not know about the MediaKind Attribute.
The only reference I can find to that "Media Kind" attribute is the ITUserPlaylistSpecialKind enum. The only place that is used is a getter method IITUserPlaylist::SpecialKind. So it seems that is a read-only playlist-level attribute. I would guess that in order to read it you need to get the playlist of the track and then get the playlists's SpecialKind attribute. In order to write it, you probably have to move the track to the appropriate playlist.
Well, the Media Kind is in interface IITTrack.Kind, but that probably isn't what you want - the answer will be one of:
public enum ITTrackKind
{
ITTrackKindUnknown = 0,
ITTrackKindFile = 1,
ITTrackKindCD = 2,
ITTrackKindURL = 3,
ITTrackKindDevice = 4,
ITTrackKindSharedLibrary = 5,
}
Probably you need to look at IITTrack.Genre, which gives the string form of the ID3 tag
Genre, so you can find "Audiobook" or Apple's "Books & Spoken". (Some genres are treated specially by iTunes/iPods).
Tip: the compiled help file in the ITunes SDK I downloaded seemed to be broken - I had to convert it back to HTML files and use Firefox/grep to find the information I needed.
It is actually quite easy, use the IITFileOrCDTrack.Podcast :
yourTrack.Podcast
If it is a podcast, it will return True, otherwise it will return False.
You can of course set it through
yourTrack.Podcast(bool)
Glad I could help.

Search function with PyGTKsourceview

I'm writing a small html editor in python mostly for personal use and have integrated a gtksourceview2 object into my Python code. All the mayor functions seem to work more or less, but I'm having trouble getting a search function to work. Obvioiusly the GUI work is already done, but I can't figure out how to somehow buildin methods of the GTKsourceview.Buffer object (http://www.gnome.org/~gianmt/pygtksourceview2/class-gtksourcebuffer2.html) to actually search through the text in it.
Does anybody have a suggestion? I find the documentation not very verbose and can't really find a working example on the web.
Thanks in advance.
The reference for the C API can probably be helpful, including this chapter that I found "Searching in a GtkSourceBuffer".
As is the reference for the superclass gtk.TextBuffer
Here is the python doc, I couldn't find any up-to-date documentation so I stuffed it in my dropbox. Here is the link. What you want to look at is at is the gtk.iter_forward_search and gtk.iter_backward_search functions.

Categories

Resources