Using difflib to detect missing characters - python

I've created a program to take in a password and tell the user by how many characters the password is wrong. To do this I've used difflib.Differ().
However I'm not sure how to create another loop to make it also be able to tell my how many characters the password is wrong if the input is missing characters.
This is the checking function itself.
import difflib
def check_text(correctPass, inputPass):
count = 0
difference = 0
d = difflib.Differ()
count_difference = list(d.compare(correctPass, inputPass))
while count < len(count_difference):
if '+' in count_difference[count]:
difference +=1
count +=1
return difference
At the moment the function can only pick up mistakes if the mistakes are extra characters (correct password in this case is 'rusty').
My understanding of difflib is quite poor. Do I create a new loop, swap the < for a > and the +s for -s? Or do I just change the conditions already in the while loop?
EDIT:
example input/output: rusty33 –> wrong by 2 characters
rsty –> wrong by 0 characters
I'm mainly trying to make the function detect if there are characters missing or extra, not too fussed about order of characters for the moment.

Related

Checking if name of a person is valid or not

I am working on a project in which I have to figure out if the name of a person is valid or not. One case of invalidity is a single character name.
In English, it is straight forward to figure out by checking the length.
if len(name) < 2:
return 0
I am not sure if checking length will work out for other languages to, like 玺. I am not sure if this is one character or something else.
Can someone help me to solve this issue?
Dataset info:
countries: 125
total names: 11 Million
While I can't vouch for all other languages, checking a python script with the character you provided confirms that according to Python it is still a single character.
print(len("玺"))
if len("玺") < 2:
print("Single char name")
Another potential solution would be to check ord(char) (29626 for the given char) to check if it is outside the standard latin alphabet and perform additional conditional checks.
Maybe use a dictionary:
language_dict = {
'english':2,
'chinese':1
}
if len(name) < language_dict['english']:
return 0

Characters formatting in Python

I'm beginner to Python ... I'd like to format the characters in Python using basic concepts and operations of Tuples and Lists as below ...
I enter 10 digit number and except last 4 digits remaining all the numbers should be replaced by 'X'. For e.g.
number = 1234567890
Expecting output as -
number = XXXXXX7890
How to mask entered characters / numbers in Python using Tuples/Lists concept not using by importing any modules or existing high functions. Is it possible ?
For e.g. entered some characters , those should be masked using * (asterisk) or # (hashed) while entering. For e.g.
password : pa55w0rd
Expecting output while entering password as -
password : ********
OR
password: ########
It is always better to use built-in modules for things sensitive like password. One way of doing is following:
import getpass
number = 1234567890
first = 'X' * max(0,len(str(number)[:-4]))
last = str(number)[-4:]
n = first + last
print(n)
# part 2
p = getpass.getpass(prompt='Enter the number : ')
if int(p) == 123:
print('Welcome..!!!')
else:
print('Please enter correct number..!!!')
If you don't want to display typed password just print:
print('######')
It does not have to be of the same length you just have to print something.
Break down what's needed: you need to convert to a string, to figure out how many characters to replace, generate a replacement string of that length, then include the tail of the original string. Also you need to be robust against, eg, strings too short to have any characters replaced.
'X' * max(0, len(str(number)) - 4) + str(number)[-4:]
For the second part: use a library.
Doing this directly is more complicated than it might seem to a beginner, because you're having to communicate with the systems which take text entry. It's going to depend upon the operating system, Windows vs "roughly everything else". For text entry outside of a web-browser or a GUI, most systems are emulating ancient text-only terminal devices because there's not yet enough reason to change that. Those devices have modes of text input (character at a time, line at a time, raw, etc) and changing them to not immediately "echo" the character typed involves some intricate system calls, and then other programming to echo a different character instead.
Thus you're going to want to use a library to take care of all those intricate details for you. Something around password entry. Given the security implications, using tested and hardened code instead of rolling your own is something I strongly encourage. Be aware that there are all sorts of issues around password handling too (constant time comparisons, memory handling, etc) such that as much as possible, you should avoid doing it at all, or move it to another program, and when you do handle it, use the existing libraries.
If you can, stick to the Python standard library and use getpass which won't echo anything for passwords, instead of printing stars.
If you really want the stars, then search https://pypi.org/ for getpass and see all the variants people have produced. Most of the ones I saw in a quick look didn't inspire confidence; pysectools seemed better than the others, but I've not used it.

Giant set of strings leads to memory error - alternative?

I'm trying to create a really huge list of incrementing 9 digits numbers (worst case). My plan is to have something like this:
['000000001', '000000002' , ..............,'999999999']
I already wrote the code. However, as soon as I run the code, my console prints "Memory Error" message.
Here is my current code:
HUGE_LIST = [''.join(i) for i in product('012345678', repeat = 9)
I know this might not be the best code to produce the list. Thus, can someone help me find a better way to solve this memory issue?
I'm planning to use HUGE_LIST for comparison with user input.
Example: a user enters '12345678' as input, then I want my code to assert that input with the HUGE_LIST.
The best way to solve an issue like this is to avoid a memory-intensive algorithm entirely. In this case, since your goal is to test whether a particular string is in the list, just write a function that checks whether the string satisfies the criteria to be in the list. For example, if your list contains all sequences of 9 digits, then your function just has to check whether a given input is a sequence of 9 digits.
def check(string):
return len(string) == 9 and all(c.isdigit() for c in string)
(in practice, give it a better name than check). Or if you want all sequences of 9 digits in which none of them is a 9, as your current code defining HUGE_LIST suggests, you could write
def check(string):
return len(string) == 9 and all(c.isdigit() and c != '9' for c in string)
Or so on.
If you can't write an algorithm to decide whether a string (or whatever) is in the list or not, the next best thing is to make a generator that will produce the values one at a time. If you already have a list comprehension, like
HUGE_LIST = [<something> for <variable> in <expression>]
then you can turn that into a generator by replacing the square brackets with parentheses:
HUGE_GENERATOR = (<something> for <variable> in <expression>)
Then you can test for membership using string in HUGE_GENERATOR. Note that after doing so, HUGE_GENERATOR will be (at least partially) consumed, so you can't use it for another membership test; you will have to recreate it if you want to test again.

Strict validation for latin character set (ISO 8859)

I'm looking to validate user input to ensure that all characters in the input string fall within the western-latin character set.
Background
I'm specifically working in Python, but I'm looking more to understand the ISO-8859 character set than receive actual Python code.
For a simpler example of what I'm looking for, if I was looking to ensure that user input was entirely ASCII, I could easily do so by checking that each character's numeric value falls in the range [0-126]:
def is_ascii(s):
for c in s:
if not (0 <= ord(c) <= 126):
return False
return True
Simple enough! But now I want to validate for ISO-8859 (western latin character set).
Question
Is this a simple case of changing the upper bound for the value of ord(c)?
If so, what value should I replace 126 with?
If not, how do I perform this validation?
Note
I'm expecting to receive characters that are certainly outside of ISO-8859, for example emotes entered from a mobile device's keyboard.
Edit
After some further research it looks like replacing 126 with 255 would potentially be a valid solution, but I would appreciate if anyone could confirm this?

how to keep count of replaced strings

I have a massive string im trying to parse as series of tokens in string form, and i found a problem: because many of the strings are alike, sometimes doing string.replace()will cause previously replaced characters to be replaced again.
say i have the string being replaced is 'goto' and it gets replaced by '41' (hex) and gets converted into ASCII ('A'). later on, the string 'A' is also to be replaced, so that converted token gets replaced again, causing problems.
what would be the best way to get the strings to be replaced only once? breaking each token off the original string and searching for them one at a time takes very long
This is the code i have now. although it more or less works, its not very fast
# The largest token is 8 ASCII chars long
'out' is the string with the final outputs
while len(data) != 0:
length = 8
while reverse_search(data[:length]) == None:#sorry THC4k, i used your code
#at first, but it didnt work out
#for this and I was too lazy to
#change it
length -= 1
out += reverse_search(data[:length])
data = data[length:]
If you're trying to substitute strings at once, you can use a dictionary:
translation = {'PRINT': '32', 'GOTO': '41'}
code = ' '.join(translation[i] if i in translation else i for i in code.split(' '))
which is basically O(2|S|+(n*|dict|)). Very fast. Although memory usage could be quite substantial. Keeping track of substitutions would allow you to solve the problem in linear time, but only if you exclude the cost of looking up previous substitution. Altogether, the problem seems to be polynomial by nature.
Unless there is a function in python to translate strings via dictionaries that i don't know about, this one seems to be the simplest way of putting it.
it turns
10 PRINT HELLO
20 GOTO 10
into
10 32 HELLO
20 41 10
I hope this has something to do with your problem.

Categories

Resources