How to properly set fst rules - python

I got in touch with tranducers and python, so i use default FST library. For example, I have a list ['a','b','c']. I need to replace 'b' if it is followed by 'c'. I make following rules, but it works only if 'b' is between 'a' and 'c' and only with this length of array.
from fst import fst
list = ['a','b','c']
t = fst.FST('example')
for i in range(0,len(list)):
t.add_state(str(i))
t.initial_state = '0'
t.add_arc('0','0',('a'),('a'))
t.add_arc('0','1',('b'),('d'))
t.add_arc('1','1',('c'),('c'))
t.set_final('1')
print t.transduce(list)
I got ['a','d','c']
I need to be able replace 'b' with 'd' wherever it is.
e.g. replace 'b' when followed by 'l'
['m','r','b','l'] => ['m','r','o','l']
['m','b','l'] => ['m','o','l']
['b','l','o'] => ['o','l','o']
Please help me, thanks!

Consider this function...
lists = [['m','r','b','l'],
['m','b','l'],
['b','l','o'],
['b','m','o']]
def change(list_, find_this, followed_by, replace_to):
return_list = list_.copy()
idx = list_.index(find_this)
if list_[idx+1] == followed_by:
return_list = list_.copy()
return_list[idx] = replace_to
return return_list
for lst in lists:
print(change(lst, 'b', 'l', 'o'))
''' output:
['m', 'r', 'o', 'l']
['m', 'o', 'l']
['o', 'l', 'o']
['b', 'm', 'o']
'''
You should add other pertinent validations, though.

Related

To check whether at least one list contains a specific element

Could someone please tell me what is the shortest way to write this logic?
I have two lists as list_one and list_two containing some letters. If none of these two lists contain 'B', I need to print(True). The snippet I have written works, but I am curious to know whether there is a pythonic way to write this instead of  repeating 'B' twice in the same line.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in list_one and 'B' not in list_two:
print('True')
 
Thanks in advance and any help would be greatly appreciated. 
Well, you can do that (even though I think your way is the best):
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in (set(list_one) & set(list_two)):
print('True')
Or:
if 'B' not in list_one + list_two:
print('True')
You can try the all function if it is more readable for you.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
print(all('B' not in current_list for current_list in [list_one, list_two]))
We have sets in Python and they are really fast compared to lists.
Here some features about sets.
Sets are unordered.
Set elements are unique.
Duplicate elements are not allowed in sets.
Therefore you can search the item in a common set.
list_one = ['A', 'K', 'L', 'J']
list_two = ['N', 'M', 'P', 'O']
if 'B' not in set(list_one + list_two)
print('True')
Bonus:
You can use extend method to speed up list concatenation
set( list_one.extend( list_two ))
A different way of doing this is putting all lists in a Pandas DataFrame first:
import pandas as pd
df = pd.DataFrame(list(zip(list_one, list_two)), columns =['l1', 'l2'])
Then you could check easily if the character B is absent by returning a True. The double .any() is to check rows and columns:
~df.isin(['B']).any().any()

How can method which evaluates a list to determine if it contains specific consecutive items be improved?

I have a nested list of tens of millions of lists (I can use tuples also). Each list is 2-7 items long. Each item in a list is a string of 1-5 characters and occurs no more than once per list. (I use single char items in my example below for simplicity)
#Example nestedList:
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
I need to find which lists in my nested list contain a pair of items so I can do stuff to these lists while ignoring the rest. This needs to be as efficient as possible.
I am using the following function but it seems pretty slow and I just know there has to be a smarter way to do this.
def isBadInList(bad, checkThisList):
numChecks = len(list) - 1
for x in range(numChecks):
if checkThisList[x] == bad[0] and checkThisList[x + 1] == bad[1]:
return True
elif checkThisList[x] == bad[1] and checkThisList[x + 1] == bad[0]:
return True
return False
I will do this,
bad = ['O', 'I']
for checkThisList in nestedLists:
result = isBadInList(bad, checkThisList)
if result:
doStuffToList(checkThisList)
#The function isBadInList() only returns true for the first and third list in nestedList and false for all else.
I need a way to do this faster if possible. I can use tuples instead of lists, or whatever it takes.
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
#first create a map
pairdict = dict()
for i in range(len(nestedList)):
for j in range(len(nestedList[i])-1):
pair1 = (nestedList[i][j],nestedList[i][j+1])
if pair1 in pairdict:
pairdict[pair1].append(i+1)
else:
pairdict[pair1] = [i+1]
pair2 = (nestedList[i][j+1],nestedList[i][j])
if pair2 in pairdict:
pairdict[pair2].append(i+1)
else:
pairdict[pair2] = [i+1]
del nestedList
print(pairdict.get(('e','z'),None))
create a value pair and store them into map,the key is pair,value is index,and then del your list(this maybe takes too much memory),
and then ,you can take advantage of the dict for look up,and print the indexes where the value appears.
I think you could use some regex here to speed this up, although it will still be a sequential operation so your best case is O(n) using this approach since you have to iterate through each list, however since we have to iterate over every sublist as well that would make it O(n^2).
import re
p = re.compile('[OI]{2}|[IO]{2}') # match only OI or IO
def is_bad(pattern, to_check):
for item in to_check:
maybe_found = pattern.search(''.join(item))
if maybe_found:
yield True
else:
yield False
l = list(is_bad(p, nestedList))
print(l)
# [True, False, True]

Iterating through list, ignoring duplicates

I've written a program that attempts to find a series of letters (toBeFound - these letters represent a word) in a list of letters (letterList), however it refuses to acknowledge the current series of 3 letters as it counts the 'I' in the first list twice, adding it to the duplicate list.
Currently this code returns "incorrect", when it should return "correct".
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
List = []
for i in toBeFound[:]:
for l in letterList[:]:
if l== i:
letterList.remove(l)
List.append(i)
if List == toBeFound:
print("Correct.")
else:
print("Incorrect.")
letterList and toBeFound are sample values, the letters in each can be anything. I can't manage to iterate through the code and successfully ensure that duplicates are ignored. Any help would be greatly appreciated!
Basically, you're looking to see if toBeFound is a subset of letterList, right?
That is a hint to use sets:
In [1]: letters = set(['F','I', 'I', 'X', 'O', 'R', 'E'])
In [2]: find = set(['F', 'I', 'X'])
In [3]: find.issubset(letters)
Out[3]: True
In [4]: find <= letters
Out[4]: True
(BTW, [3] and [4] are different notations for the same operator.)
I think this would solve your problem. Please try it and let me know
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
found_list = [i for i in toBeFound if i in letterList]
print("Correct" if toBeFound == found_list else "Incorrect")
You could make the initial list a set, but if you want to look up a word like 'hello' it wont work because you'll need both l's.
One way to solve this is to use a dictionary to check and see how we are doing so far.
letterList = ['H', 'E', 'L', 'X', 'L', 'I', 'O']
toBeFound = ['H', 'E', 'L', 'L', 'O']
# build dictionary to hold our desired letters and their counts
toBeFoundDict = {}
for i in toBeFound:
if i in toBeFoundDict:
toBeFoundDict[i] += 1
else:
toBeFoundDict[i] = 1
letterListDict = {} # dictionary that holds values from input
output_list = [] # dont use list its a reserved word
for letter in letterList:
if letter in letterListDict: # already in dictionary
# if we dont have too many of the letter add it
if letterListDict[letter] < toBeFoundDict[letter]:
output_list.append(letter)
# update the dictionary
letterListDict[letter] += 1
else: # not in dictionary so lets add it
letterListDict[letter] = 1
if letter in toBeFoundDict:
output_list.append(letter)
if output_list == toBeFound:
print('Success')
else:
print('fail')
How about this: (I tested in python3.6)
import collections
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
collections.Counter(letterList)
a=collections.Counter(letterList) # print(a) does not show order
# but a.keys() has order preserved
final = [i for i in a.keys() if i in toBeFound]
if final == toBeFound:
print("Correct")
else:
print("Incorrect")
If you're looking to check if letterList has the letters of toBeFound in the specified order and ignoring repeating letters, this would be a simple variation on the old "file match" algorithm. You could implement it in a non-destructive function like this:
def letterMatch(letterList,toBeFound):
i= 0
for letter in letterList:
if letter == toBeFound[i] : i += 1
elif i > 0 and letter != toBeFound[i-1] : break
if i == len(toBeFound) : return True
return False
letterMatch(['F','I', 'I', 'X', 'O', 'R', 'E'],['F', 'I', 'X'])
# returns True
On the other hand, if what you're looking for is testing if letterList has all the letters needed to form toBeFound (in any order), then the logic is much simpler as you only need to "check out" the letters of toBeFound using the ones in letterList:
def lettermatch(letterList,toBeFound):
missing = toBeFound.copy()
for letter in letterList:
if letter in missing : missing.remove(letter)
return not missing
As requested.
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
List = []
for i in toBeFound[:]:
for l in set(letterList):
if l== i:
List.append(i)
if List == toBeFound:
print("Correct.")
else:
print("Incorrect.")
This prints correct. I made the letterList a set! Hope it helps.
One simple way is to just iterate through toBeFound, and look for each element in letterList.
letterList= ['F','I', 'I', 'X', 'O', 'R', 'E']
toBeFound = ['F', 'I', 'X']
found = False
for x in letterList:
if x not in toBeFound:
found = False
break
if found:
print("Correct.")
else:
print("Incorrect.")

Search for a char in list of lists

I have a list of lists, and I want to return those sublists that have a specific char.
If the list is:
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
I want to retrive ['g', 'j'] "or it's position" if I search using j or g
Try this:-
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def search(search_char):
result = [x for x in lst if search_char in x]
return result
print(search('g'))
For a start there is a keyword error in your variable - list is a keyword, try my_list.
This works for returning the list you want:
#Try this
my_list = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def check_for_letter(a_list,search):
for i in a_list[:]:
if search in a_list[0]:
return a_list[0]
else:
a_list[0] = a_list[1]
Session below:
>>> check_for_letter(my_list,"j")
['g', 'j']
>>>
This is one way. It works even for repeats.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def searcher(lst, x):
for i in range(len(lst)):
if x in lst[i]:
yield i
list(searcher(lst, 'g')) # [1]
list(map(lst.__getitem__, searcher(lst, 'g'))) # [['g', 'j']]
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
spec_char = input("What character do you want to find?: ")#ask for a character to find
def find_char(spec_char):
for list_count, each_list in enumerate(lst): #iterates through each list in lst
if spec_char in each_list: #checks if specified character is in each_list
return spec_char, lst[list_count] #returns both the character and list that contains the character
def main(): #good habit for organisation
print(find_char(spec_char)) #print the returned value of find_char
if __name__ == '__main__':
main()
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
def search(spec_char):
for subList in lst:
if spec_char in subList:
return subList
return False
print search('g')
>>> ['g', 'j']
y=[['a','b'],['c','d'],['e','f'],['f']]
result=[x for x in y if 'f' in x])
here I took 'f' as the character to be searched
Alternatively, we can also use the lambda and the filter functions.
The basics for lambda and filter function can be found in python documentation.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']]
ch = input('Enter the character') # or directly type the character in ''
# lambda <parameter to the function>: <value to be returned>
# filter(<lambda function to check for the condition>, <sequence to iterate over>)
r = list(filter(lambda lst: ch in lst,lst))
print(r)
Note: To see the value returned by the lambda and the filter functions, I am storing the result in a list and printing the final output.
Below is the solution and explanation for your question. Please view the image at the bottom of this answer for further clarification and to view the output.
lst = [['a', 'e'], ['g', 'j'], ['m', 'n', 'w'], ['z']] #So this is your list
character=input("Enter the character: ") #The character you are searching for
for a in lst: #it means that variable [a] is an item in the list [lst]
for b in a: #it means that variable [b] is an item in the list [a]
if(b==character): #To check if the given character is present in the list
print(a) #Finally to print the list in which the given character is present
So, the code part is over. Now, let's look what the output will be.
C:\Users\User\Desktop\python>coc.py
Enter the character: a
['a', 'e']
C:\Users\User\Desktop\python>coc.py
Enter the character: w
['m', 'n', 'w']
Click here to view the image of my code and output

Why does != not work in this case and are there other options to say two things are not equal or should I just try and restart my code?

I am trying to write a code for something as described below, but when I try and use it, a the shell returns a message (see below) saying I can't use the != because list indices must be integers or slices, not str, but when I do something like ['a'] != ['b'] or ['a', 'c'] != ['a', 'c'] in the shell, it works (first returns true and second returns false) and those are string lists so I'm not entirely sure what the problem is. Maybe the way I am writing the code makes it wrong or I don't need to use it all?
Why do I receive an error and are there alternatives to != or should I rewrite my code? (If my code seems completely off, it may be, I am a beginner and tried to construct it all on my own, I think/hope I'm headed in the right direction)
Code:
def remove_match(list1, list2):
'''(list, list) -> list
Given two lists of a single character strings, return a new list
that contains only the characters in list1 that were not the same string
and in the same position in list 2. Both list1 and list2
have the same length.
>>>remove_match(['P', 'O', 'Y', 'R'], ['P', 'B', 'G', 'R'])
['O', 'Y']
>>>remove_match(['P', 'G', 'G', 'R'], ['P', 'G', 'G', 'R'])
[]
>>>remove_match(['O', 'R', 'B', 'Y'], ['P', 'P', 'P', 'P'])
['O', 'R', 'B', 'Y']
'''
edit_list1=[]
for i in list1:
if list1[i] != list2[i]:
edit_list1.append(list1[i])
return edit_list1
The message that comes up when I try and use the function (such as one of the doc string examples ) is:
Traceback (most recent call last):
Python Shell, prompt 2, line 1
File "/Users/Rir/Desktop/folder/file.py", line 56, in <module>
if list1[i] != list2[i]:
builtins.TypeError: list indices must be integers or slices, not str
Thanks!
First of all, list indices must be integer. You can use enumerate() to get the index and the content at the same time. Then run it through loop.
However, I suggest you to use list comprehension.
list1 = ['P', 'O', 'Q', 'Y']
list2 = ['P', 'O', 'Q', 'A']
list_nomatch = [x for x in list1 if x not in list2]
This will produce 'Y'. Hope that helps
you probably wanted an incrementing integer index i
def remove_match(list1, list2):
edit_list1=[]
for i in range(len(list1)): # get indices from range the length of your list
if list1[i] != list2[i]:
edit_list1.append(list1[i])
return edit_list1
zip() is good for "parallel" operations on lists:
def remove_match(list1, list2):
edit_list1=[]
for a, b in zip(list1, list2): # zip lists together, get elements in pairs
if a != b: # test on the paired elements, no indexing
edit_list1.append(a)
return edit_list1
and as shown in other answers you loop matches the List Comprehension pattern
def remove_match(list1, list2):
return [a for a, b in zip(list1, list2) if a != b]
i in list1 corresponds to a value, not an index.
What you really want to do is probably:
edit_list1=[]
for i in list1:
if i != list2[list1.index(i)]:
edit_list1.append(i)
return edit_list1
Since both lists are of the same size, you can use enumerate() -
def remove_match(list1, list2):
edit_list1=[]
for idx, i in enumerate(list1):
if i != list2[idx]:
edit_list1.append(i)
return edit_list1
print(remove_match(['P', 'O', 'Y', 'R'], ['P', 'B', 'G', 'R']))
print(remove_match(['P', 'G', 'G', 'R'], ['P', 'G', 'G', 'R']))
print(remove_match(['O', 'R', 'B', 'Y'], ['P', 'P', 'P', 'P']))
Output -
['O', 'Y']
[]
['O', 'R', 'B', 'Y']
The same code using List Comprehension -
def remove_match(list1, list2):
return [i for idx, i in enumerate(list1) if i != list2[idx]]

Categories

Resources