Pyenchant Module - Spell checker

Pyenchant Module - Spell checker - python

How do I trim the output of Python Pyenchat Module's 'suggested words list ?
Quite often it gives me a huge list of 20 suggested words that looks awkward when displayed on the screen and also has a tendency to go out of the screen .

Like sentinel, I'm not sure if the problem you're having is specific to pyenchant or a python-familiarity issue. If I assume the latter, you could simply select the number of values you'd like as part of your program. In simple form, this could be as easy as:
suggestion_list = pyenchant_function(document_filled_with_typos)
number_of_suggestions = len(suggestion_list)
MAX_SUGGESTIONS = 3 # you choose what you like
if number_of_suggestions > MAX_SUGGESTIONS:
answer = suggestion_list[0:(MAX_Suggestions-1)] # python lists are indexed to 0
else:
answer = suggestion_list
Note: I'm choosing to be clear rather than concise here, since I'm guessing that will be valued by asker, if asker is unclear on using list indices.
Hope this helps and good luck with python.

Assuming it returns a standard Python list, you use standard Python slicing syntax. E.g. suggestedwords[:10] gets just the first 10.

Related

data exchange between R and python (music21)

My goal is to take a text file with a number list generated by R (e.g 1 2 3 4), and "translate" the numbers into music21 notes (that is, to compose a melody when each note is identified with a number).
Having the number list, one idea I had was creating a R vector with strings that matches with music21 note names, and trying to get a new output with the note names instead of numbers. But I'm not very sure of that. Besides, I don't know how to proceed after that.
I also read some topics talking about using R as a subprocess in Python, but again, I couldn't clearly understand how that works (the fact that running the subprocess almost makes my poor old laptop crash had something to do with that...)
How can I proceed here?

Personally, I would try to use only python. I realize you have little experience with it; but python is more general purpose than R and should be able to do anything R can do. Trying to use both at the same time seems like it would generate additional complexity and overhead you simply don't need.
It looks this music21 takes notes and lengths; however there are also rests. Let's say you have a list for durations called "durations", and a list for notes (and rests) called notes:
from music21 import *
mymusic = stream.Stream()
notes = ["F4", "F4", "rest", "F4"]
durations = [0.25, 1, 0.25, 1]
for n,d in zip(notes, durations):
if n == "rest":
mymusic.append(note.Rest(d))
else:
mymusic.append(note.Note(n,d))
mymusic.show("midi")
Music21 uses a special kind of list called a stream. We're making an empty stream first, and then populating it with notes and durations. Zip lets us walk through both lists at the same time. We chekc if the note is supposed to be a rest; if it is a rest we add the rest with the right duration, else we continue to add a note of the right duration. (notice I am not a composer, you could generate the notes and durations any way you like :-) ).
If you really wanted to; you could write a csv file or something of notes and durations in R and read that in python. However, I think generating the lists in python itself is a cleaner approach.
Thanks for introducing me to this music21 library, it looks very neat.

Python: replacing multiple words in a text file from a dictionary

I am having trouble figuring out where I'm going wrong. So I need to randomly replace words and re-write them to the text file, until it no longer makes sense to anyone else. I chose some words just to test it, and have written the following code which is not currently working:
# A program to read a file and replace words until it is no longer understandable
word_replacement = {'Python':'Silly Snake', 'programming':'snake charming', 'system':'table', 'systems':'tables', 'language':'spell', 'languages':'spells', 'code':'snake', 'interpreter':'charmer'}
main = open("INF108.txt", 'r+')
words = main.read().split()
main.close()
for x in word_replacement:
for y in words:
if word_replacement[x][0]==y:
y==x[1]
text = " ".join(words)
print text
new_main = open("INF108.txt", 'w')
new_main.write(text)
new_main.close()
This is the text in the file:
Python is a widely used general-purpose, high-level programming
language. It's design philosophy emphasizes code readability, and its
syntax allows programmers to express concepts in fewer lines of code
than would be possible in languages such as C++ or Java. The language
provides constructs intended to enable clear programs on both a small
and large scale.Python supports multiple programming paradigms,
including object-oriented, imperative and functional programming or
procedural styles. It features a dynamic type system and automatic
memory management and has a large and comprehensive standard
library.Python interpreters are available for installation on many
operating systems, allowing Python code execution on a wide variety
of systems. Using third- party tools, such as Py2exe or Pyinstaller,
Python code can be packaged into stand-alone executable programs for
some of the most popular operating systems, allowing for the
distribution of Python-based software for use on those environments
without requiring the installation of a Python interpreter.
I've tried a few methods of this but as someone new to Python it's been a matter of guessing, and the last two days spent researching it online, but most of the answers I've found are either far too complicated for me to understand, or are specific to that person's code and don't help me.

OK, let's take this step by step.
main = open("INF108.txt", 'r+')
words = main.read().split()
main.close()
Better to use the with statement here. Also, r is the default mode. Thus:
with open("INF108.txt") as main:
words = main.read().split()
Using with will make main.close() get called automatically for you when this block ends; you should do the same for the file write at the end as well.
Now for the main bit:
for x in word_replacement:
for y in words:
if word_replacement[x][0]==y:
y==x[1]
This little section has several misconceptions packed into it:
Iterating over a dictionary (for x in word_replacement) gives you its keys only. Thus, when you want to compare later on, you should just be checking if word_replacement[x] == y. Doing a [0] on that just gives you the first letter of the replacement.
Iterating over the dictionary is defeating the purpose of having a dictionary in the first place. Just loop over the words you want to replace, and check if they're in the dictionary using y in word_replacement.
y == x[1] is wrong in two ways. First of all, you probably meant to be assigning to y there, not comparing (i.e. y = x[1] -- note the single = sign). Second, assigning to a loop variable doesn't even do what you want. y will just get overwritten with a new value next time around the loop, and the words data will NOT get changed at all.
What you want to do is create a new list of possibly-replaced words, like so:
replaced = []
for y in words:
if y in word_replacement:
replaced.append(word_replacement[y])
else:
replaced.append(y)
text = ' '.join(replaced)
Now let's do some refinement. Dictionaries have a handy get method that lets you get a value if the key is present, or a default if it's not. If we just use the word itself as a default, we get a nifty reduction:
replaced = []
for y in words:
replacement = word_replacement.get(y, y)
replaced.append(replacement)
text = ' '.join(replaced)
Which you can just turn into a one-line list-comprehension:
text = ' '.join(word_replacement.get(y, y) for y in words)
And now we're done.

It looks like you want something like this as your if statement in the nested loops:
if x==y:
y=word_replacement[x]
When you loop over a dictionary, you get its keys, not key-value pairs:
>>> mydict={'Python':'Silly Snake', 'programming':'snake charming', 'system':'table'}
>>> for i in mydict:
... print i
Python
programming
system
You can then get the value with mydict[i].
This doesn't quite work, though, because assigning to y doesn't change that element of words. You can loop over its indices instead of elements to assign to the current element:
for x in word_replacement:
for y in range(len(words)):
if x==words[y]:
words[y]=word_replacement[x]
I'm using range() and len() here to get a list of indices of words ([0, 1, 2, ...])

Your issue is probably here:
if word_replacement[x][0]==y:
Here's a small example of what is actually happening, which is probably not what you intended:
w = {"Hello": "World", "Python": "Awesome"}
print w["Hello"]
print w["Hello"][0]
Which should result in:
"World"
"W"
You should be able to figure out how to correct the code from here.

You used word_replacement (which is a dictionary) in a wrong way. You should change for loop to something like this:
for y in words:
if y in word_replacement:
words[words.index(y)] = word_replacement[y]

Difference between two "contains" operations for python lists

I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.

In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"

the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.

The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.

for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.

A better way to assign list into a var

Was coding something in Python. Have a piece of code, wanted to know if it can be done more elegantly...
# Statistics format is - done|remaining|200's|404's|size
statf = open(STATS_FILE, 'r').read()
starf = statf.strip().split('|')
done = int(starf[0])
rema = int(starf[1])
succ = int(starf[2])
fails = int(starf[3])
size = int(starf[4])
...
This goes on. I wanted to know if after splitting the line into a list, is there any better way to assign each list into a var. I have close to 30 lines assigning index values to vars. Just trying to learn more about Python that's it...

done, rema, succ, fails, size, ... = [int(x) for x in starf]
Better:
labels = ("done", "rema", "succ", "fails", "size")
data = dict(zip(labels, [int(x) for x in starf]))
print data['done']

What I don't like about the answers so far is that they stick everything in one expression. You want to reduce the redundancy in your code, without doing too much at once.
If all of the items on the line are ints, then convert them all together, so you don't have to write int(...) each time:
starf = [int(i) for i in starf]
If only certain items are ints--maybe some are strings or floats--then you can convert just those:
for i in 0,1,2,3,4:
starf[i] = int(starf[i]))
Assigning in blocks is useful; if you have many items--you said you had 30--you can split it up:
done, rema, succ = starf[0:2]
fails, size = starf[3:4]

I might use the csv module with a separator of | (though that might be overkill if you're "sure" the format will always be super-simple, single-line, no-strings, etc, etc). Like your low-level string processing, the csv reader will give you strings, and you'll need to call int on each (with a list comprehension or a map call) to get integers. Other tips include using the with statement to open your file, to ensure it won't cause a "file descriptor leak" (not indispensable in current CPython version, but an excellent idea for portability and future-proofing).
But I question the need for 30 separate barenames to represent 30 related values. Why not, for example, make a collections.NamedTuple type with appropriately-named fields, and initialize an instance thereof, then use qualified names for the fields, i.e., a nice namespace? Remember the last koan in the Zen of Python (import this at the interpreter prompt): "Namespaces are one honking great idea -- let's do more of those!"... barenames have their (limited;-) place, but representing dozens of related values is not one -- rather, this situation "cries out" for the "let's do more of those" approach (i.e., add one appropriate namespace grouping the related fields -- a much better way to organize your data).

Using a Python dict is probably the most elegant choice.
If you put your keys in a list as such:
keys = ("done", "rema", "succ" ... )
somedict = dict(zip(keys, [int(v) for v in values]))
That would work. :-) Looks better than 30 lines too :-)
EDIT: I think there are dict comphrensions now, so that may look even better too! :-)
EDIT Part 2: Also, for the keys collection, you'd want to break that into multpile lines.
EDIT Again: fixed buggy part :)

Thanks for all the answers. So here's the summary -
Glenn's answer was to handle this issue in blocks. i.e. done, rema, succ = starf[0:2] etc.
Leoluk's approach was more short & sweet taking advantage of python's immensely powerful dict comprehensions.
Alex's answer was more design oriented. Loved this approach. I know it should be done the way Alex suggested but lot of code re-factoring needs to take place for that. Not a good time to do it now.
townsean - same as 2
I have taken up Leoluk's approach. I am not sure what the speed implication for this is? I have no idea if List/Dict comprehensions take a hit on speed of execution. But it reduces the size of my code considerable for now. I'll optimize when the need comes :) Going by - "Pre-mature optimization is the root of all evil"...

How to Sort Arrays in Dictionary?

I'm currently writing a program in Python to track statistics on video games. An example of the dictionary I'm using to track the scores :
ten = 1
sec = 9
fir = 10
thi5 = 6
sec5 = 8
games = {
'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"],
'nethack': [fir+fir+fir+sec+thi5, "Nethack"]
}
Right now, I'm going about this the hard way, and making a big long list of nested ifs, but I don't think that's the proper way to go about it. I was trying to figure out a way to sort the dictionary, via the arrays, and then, finding a way to display the first ten that pop up... instead of having to work deep in the if statements.
So... basically, my question is : Do you have any ideas that I could use to about making this easier, instead of wayyyy, way harder?
===== EDIT ====
the ten+fir produces numbers. I want to find a way to go about sorting the lists (I lack the knowledge of proper terminology) to go by the number (basically, whichever ones have the highest number in the first part of the array go first.
Here's an example of my current way of going about it (though, it's incomplete, due to it being very tiresome : Example Nests (paste2) (let's try this one?)
==== SECOND EDIT ====
In case someone doesn't see my comment below :
ten, fir, et cetera - these are just variables for scores. Basically, it goes from a top ten list into a variable number.
ten = 1, nin = 2, fir = 10, fir5 = 10, sec5 = 8, sec = 9...
so : 'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"] actually registers as : 'adom': [1+10+9+8, "Ancient Domain of Mysteries"] , which ends up looking like :
'adom': [28, "Ancient Domain of Mysteries"]
So, basically, if I ended up doing the "top two" out of my example, it'd be :
((1)) Nethack (48)
((2)) ADOM (28)
I'd write an actual number, but I'm thinking of changing a few things up, so the numbers might be a touch different, and I wouldn't want to rewrite it.
== THIRD (AND HOPEFULLY THE FINAL) EDIT ==
Fixed my original code example.

How about something like this:
scores = games.items()
scores.sort(key = lambda key, value: value[0])
return scores[:10]
This will return the first 10 items, sorted by the first item in the array.
I'm not sure if this is what you want though, please update the question (and fix the example link) if you need something else...

import heapq
return heapq.nlargest(10, games.iteritems(), key=lambda k, v: v[0])
is the most direct way to get the top ten key / value pairs, sorted by the first item of each "value" list. If you can define more precisely what output you want (just the names, the name / value pairs, or what else?) and the sorting criterion, this is easy to adjust, of course.

Wim's solution is good, but I'd say that you should probably go the extra mile and push this work off onto a database, rather than relying on Python. Python interfaces well with most types of databases, where much of what you're exploring is already a solved problem.
For example, instead of worrying about shifting your dictionaries to various other data types in order to properly sort them, you can simply get all the data for each pertinent entry pre-sorted based on the criteria of your query. There goes the need for convoluted sorting and resorting right there.
While dictionaries are tempting to use, because they give the illusion of database-like abilities to access data based on its attributes, I still think they stumble quite a bit with respect to implementation. I don't really have any numbers to throw at you, but just from personal experience, anything you do on Python when it comes to manipulating large amounts of data, you can do much faster and more efficient both in code and computation with something like MySQL.
I'm not sure what you have planned as far as the structure of your data goes, but along with adding data, changing its structure is a lot easier using a database, too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pyenchant Module - Spell checker - python

How do I trim the output of Python Pyenchat Module's 'suggested words list ? Quite often it gives me a huge list of 20 suggested words that looks awkward when displayed on the screen and also has a tendency to go out of the screen .

Assuming it returns a standard Python list, you use standard Python slicing syntax. E.g. suggestedwords[:10] gets just the first 10.

Related

data exchange between R and python (music21)

Python: replacing multiple words in a text file from a dictionary

Difference between two "contains" operations for python lists

A better way to assign list into a var

How to Sort Arrays in Dictionary?

Categories

Resources