searching string in 2d array python moving right or down [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have an array that contains strings, like this:
l = ["abc",
"def",
"hij",
"klm",
"nop",
"qrs"]
and another one with words:
word = ["abc","knq","knop"]
What I need to find the word in the list and return the respecting coordinates.
The particularity is that the searching must be in horizontal or vertical or together.
For examples the first word:
abc return the index [(0,0)(0,1)(0,2)]
knq return [(3,0)(4,0)(5,0)]
knop return [(3,0)(4,0),(4,1),(4,2)]
The char in the string are not unique and I need to save the movement for example move one char down or move one char right.
is not a diagonal word finder.

I'm guessing characters have to be directly connected. You save the indices of each character in l in a dict, where key is the character and value is a list of indices tuples. Then you could loop through the words in word and do something like DFS from each position in the list of word's first character (going right or down). An example (follow the comments):
l = ["abc",
"def",
"hij",
"klm",
"nop",
"qrs"]
word = ["abc","knq","knop"]
# save the indices of each character
indices = {}
for i, w in enumerate(l):
for j, c in enumerate(w):
indices[c] = indices.get(c, []) + [(i, j)]
def check(i, j, k, cur, w, tmp):
# if current word matches return the list of indices
if cur == w:
return tmp
# otherwise return false if we reach either end of l
if i == len(l) or j == len(l[i]):
return False
# if the current position is inside l and the character maches the k'th character of w
if l[i][j] == w[k]:
# add the current character and appent it's position to tmp
cur += l[i][j]
tmp.append((i, j))
# check the next position either right or down
chk1 = check(i+1, j, k+1, cur, w, tmp)
if (chk1):
return chk1
chk2 = check(i, j+1, k+1, cur, w, tmp)
if (chk2):
return chk2
return False
# loop through each word and then through each position of the first character
for w in word:
for idx in indices[w[0]]:
# if word is found print the indices and break
chk = check(idx[0], idx[1], 0, '', w, [])
if chk:
print(w, chk)
break
output
abc [(0, 0), (0, 1), (0, 2)]
knq [(3, 0), (4, 0), (5, 0)]
knop [(3, 0), (4, 0), (4, 1), (4, 2)]

If you have a lot of words and build those type of list repeatedly, I would build a translation dictionary first:
tr = dict()
for r,w in enumerate(l):
for c,ch in enumerate(w):
tr[ch] = (r,c)
Having that, you can easily create the lists with a list comprehension:
for w in word:
res = [tr[ch] for ch in w]
print(w)
print(res)
OUTPUT:
abc
[(0, 0), (0, 1), (0, 2)]
knq
[(3, 0), (4, 0), (5, 0)]
knop
[(3, 0), (4, 0), (4, 1), (4, 2)]

Related

get all words combinations and path from letters arry

I create a boogle game, and I need to build a function that receives in input: the letter board (list of lists), the list of legal words and an integer n.
The function must return all n-length tracks of valid words.
For example n = 3 then the function must return all the paths on the three-length board which are actually valid words.
I wrote a code that returns in a particular example one route out of three routes that must be returned.
Input:
board1 = [['Q', 'O', 'Q', 'Q'],
['D', 'O', 'G', 'Q'],
['Q', 'O', 'Q', 'Q'],
['Q', 'Q', 'Q', 'Q']]
word_dict = {'DOG': True}
n = 3
board = Board(board1)
length_n_paths(3, board, word_dict)
My Output:
[((1, 0), (1, 1), (1, 2))]
Wanted Output:
[[(1, 0), (0, 1), (1, 2)], [(1, 0), (1, 1), (1, 2)], [(1, 0), (2, 1), (1, 2)]]
I used a combination, first I found all the possible combinations of letters of length n, then I went through a coordinate coordinate and checked if each coordinate is in a valid position according to the coordinate in front of it, and then I checked if the word coming out of the letter combination is a word from the word list.
If so - I will return its path in a list with the other legal words paths.
my code:
direct_lst=['Up','Down','Right','Left','Up_right','Up_left','Down_right','Down_left']
class Board:
def __init__(self, board):
self.board = board
def get_board_coordinate(self):
cord_lst = []
row = len(self.board)
col = len(self.board[0])
for i in range(row):
for j in range(col):
cord_lst.append((i, j))
return cord_lst
def possible_directions(self, coordinate, next_coordinate):
y, x = coordinate
directions_funcs = {
# A dictionary that matches between a letter and the direction of the desired search
'Up': (y - 1, x),
'Down': (y + 1, x),
'Right': (y, x + 1),
'Left': (y, x - 1),
'Up_right': (y - 1, x + 1),
'Up_left': (y - 1, x - 1),
'Down_right': (y + 1, x + 1),
'Down_left': (y + 1, x + 1)
}
it_ok = False
for direction in direct_lst:
if directions_funcs[direction] == next_coordinate:
it_ok = True
return it_ok
def is_valid_path(board, path, words):
word = board.board[path[0][0]][path[0][1]]
board_coordinates = board.get_board_coordinate()
for cord in range(len(path)-1):
if path[cord] in board_coordinates and path[cord+1] in board_coordinates:
if not board.possible_directions(path[cord], path[cord + 1]):
return None
else:
word += board.board[path[cord + 1][0]][path[cord + 1][1]]
else:
return None
if word in set(words):
return word
import itertools
def create_dict(board, n):
new_dict = dict()
row = len(board.board)
col = len(board.board[0])
for i in range(row):
for j in range(col):
new_dict[(i, j)] = board.board[i][j]
result_list = list(map(list, itertools.combinations(new_dict.items(), n)))
return result_list
def coordinates_lst_and_str_lst(board, n):
combine = create_dict(board, n)
all_cord_dic = dict()
for lst in combine:
is_it_ok = True
cord_lst = []
str_l = ""
for i in range(n):
cord_lst.append(lst[i][0])
str_l += lst[i][1]
try:
if not board.possible_directions(lst[i][0], lst[i + 1][0]):
is_it_ok = False
break
except IndexError:
break
if is_it_ok:
all_cord_dic[tuple(cord_lst)] = str_l
all_cord_dic[tuple(cord_lst)[::-1]] = str_l[::-1]
return all_cord_dic
def length_n_paths(n, board, words):
possible_words = coordinates_lst_and_str_lst(board, n)
my_dict = {key:val for key, val in possible_words.items() if val in words}
return list(my_dict.keys())
I think the problem is in the combination but I dont know how to fix it.
I would be happy for any help.
After debugging, it's apparent that the result possible_words does not contain the key (1, 0), (0, 1), (1, 2), so that explains why it's not part of the answer - so the question becomes why doesn't the call to coordinates_lst_and_str_lst() generate that tuple (and the other 'missing' one)
If you break after constructing combine in coordinates_lst_and_str_lst, you will find that [((1, 0), 'D'), ((0, 1), 'O'), ((1, 2), 'G')] is not in combine, this means coordinates_lst_and_str_lst can't find it as a solution.
So, the problem must be in create_dict, which apparently isn't creating all the legal moves.
And indeed, in create_dict, you use itertools.combinations(), which gives you all the unique combinations of n items from a collection, disregarding their order, but you care about the order.
So, you don't want itertools.combinations(new_dict.items(), n), you want itertools.permutations(new_dict.items(), n). Have a closer look at the difference between combinations and permutations (of size n).

Tokenize concatenated characters based on given dictionary

I would like to tokenize concatenated characters based on the given dictionary and give and output of tokenized words found. For example, I have the following
dictionary = ['yak', 'kin', 'yakkin', 'khai', 'koo']
chars = 'yakkinpadthaikhaikoo'
Output should be something like the following:
[('yakkin', (0, 6), 6), ('padthai', (6, 13), 7), ('khai', (13, 17), 4), ('koo', (17, 20), 3)]
I would like to get the list of tuple as an output. The first element in tuple is the word found in dictionary, second element is character offset and third element is length of the word found. If characters is not found, we'll chunk them together into one word e.g. padthai in above case. If multiple words found from the dictionary, we'll select the longest one (select yakkin instead of yak and kin).
I have my current implementation below. It starts with index if 0 then looping through characters (it doesn't work yet).
import numpy as np
def tokenize(chars, dictionary):
n_chars = len(chars)
start = 0
char_found = []
words = []
for _ in range(int(n_chars/3)):
for r in range(1, n_chars + 1):
if chars[start:(start + r)] in dictionary:
char_found.append((chars[start:(start + r)], (start, start + r), len(chars[start:start+r])))
id_offset = np.argmax([t[1][1] for t in char_found])
start = char_found[id_offset][2]
if char_found[id_offset] not in words:
words.append(char_found[id_offset])
return words
tokenize(chars, dictionary) # give only [('yakkin', (0, 6), 6)]
I have hard time wrap around my head to solve this problem. Please feels free to comment/suggest!
it can look a bit nasty, but it works
def tokenize(string, dictionary):
# sorting dictionary words by length
# because we need to find longest word if its possible
# like "yakkin" instead of "yak"
sorted_dictionary = sorted(dictionary,
key=lambda word: len(word),
reverse=True)
start = 0
tokens = []
while start < len(string):
substring = string[start:]
try:
word = next(word
for word in sorted_dictionary
if substring.startswith(word))
offset = len(word)
except StopIteration:
# no words from dictionary were found
# at the beginning of substring,
# looking for next appearance of dictionary words
words_indexes = [substring.find(word)
for word in sorted_dictionary]
# if word is not found, "str.find" method returns -1
appeared_words_indexes = filter(lambda index: index > 0,
words_indexes)
try:
offset = min(appeared_words_indexes)
except ValueError:
# an empty sequence was passed to "min" function
# because there are no words from dictionary in substring
offset = len(substring)
word = substring[:offset]
token = word, (start, start + offset), offset
tokens.append(token)
start += offset
return tokens
gives output
>>>tokenize('yakkinpadthaikhaikoo', dictionary)
[('yakkin', (0, 6), 6),
('padthai', (6, 13), 7),
('khai', (13, 17), 4),
('koo', (17, 20), 3)]
>>>tokenize('lolyakhaiyakkinpadthaikhaikoolol', dictionary)
[('lol', (0, 3), 3),
('yak', (3, 6), 3),
('hai', (6, 9), 3),
('yakkin', (9, 15), 6),
('padthai', (15, 22), 7),
('khai', (22, 26), 4),
('koo', (26, 29), 3),
('lol', (29, 32), 3)]
You can use find() to find the starting index of the word, and the length of the word is known thanks to len(). Iterate through each word in the dictionary, and your list is complete!
def tokenize(chars, word_list):
tokens = []
for word in word_list:
word_len = len(word)
index = 0
# skips words that appear in longer words
skip = False
for other_word in word_list:
if word in other_word and len(other_word) > len(word):
print("skipped word:", word)
skip = True
if skip:
continue
while index < len(chars):
index = chars.find(word, index) # start search from index
if index == -1: # find() returns -1 if not found
break
# Append the tuple and continue the search at the end of the word
tokens.append((word, (index, word_len+index), word_len))
index += word_len
return tokens
Then we can run it for the following output:
>>>tokenize('yakkinpadthaikhaikoo', ['yak', 'kin', 'yakkin', 'khai', 'koo'])
skipped word: yak
skipped word: kin
[('yakkin', (0, 6), 6),
('khai', (13, 17), 4),
('koo', (17, 20), 3)]

Python: Empty spaces(those with 0's) in lists

I am trying to create a function which returns the empty slots in this list:
grid = [[0,0,0,4],[0,0,4,2],[2,4,4,2],[0,8,4,2]]
The empty slots in this case is those slots with zeroes.
This was my program for it:
def empty_slots():
lst = []
for i in grid:
for j in grid:
if j == 0:
lst = lst + [(i,j)]
return lst
However, when I run this program I get an empty list []. And the function should output: [(0,0), (0,1), (0,2), (1,0), (1,1), (3,0)]. Note: I'm using Python 2.7.
for i in grid: iterates over the items in grid, it doesn't iterate over their indices. However, you can get the indices as you iterate over the items of an iterable via the built-in enumerate function:
def empty_slots(grid):
return [(i, j) for i, row in enumerate(grid)
for j, v in enumerate(row) if not v]
grid = [[0,0,0,4],[0,0,4,2],[2,4,4,2],[0,8,4,2]]
print(empty_slots(grid))
output
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (3, 0)]
Here's the same thing using "traditional" for loops instead of a list comprehension.
def empty_slots(grid):
zeros = []
for i, row in enumerate(grid):
for j, v in enumerate(row):
if v == 0:
zeros.append((i, j))
return zeros
In this version I use the explicit test of v == 0 instead of not v; the latter will test true if v is any "falsey" value, eg, 0, or an empty string, list, tuple, set or dict.
You don't need enumerate to do this. You could do this:
def empty_slots(grid):
zeros = []
for i in range(len(grid)):
row = grid[i]
for j in range(len(row)):
if row[j] == 0:
zeros.append((i, j))
return zeros
However, it is considered more Pythonic to iterate directly over the items in an iterable, so this sort of thing is generally avoided, when practical:
for i in range(len(grid)):
Occasionally you will need to do that sort of thing, but usually code like that is a symptom that there's a better way to do it. :)
In list comprehension:
grid = [[0,0,0,4],[0,0,4,2],[2,4,4,2],[0,8,4,2]]
[(i,j) for i,b in enumerate(grid) for j,a in enumerate(b) if a==0]
Out[81]: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (3, 0)]

Python: manipulating lists [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Problem
I have to get elements from a text file to a list, diagonally, and from top to buttom. It should work on any dimension of the letters.txt. The file would look like this:
Text file: letters.txt (thought it would be hard, I removed 'Y', and 'Z' from my original post
A B C D E F
G H I J K L
M N O P Q R
S T U V W X
the lists should look like this:
topButtom_List = ['AGMS', 'BHNT', 'CIOU', 'DJPV', 'EKQW', 'FLRX']
bLeftToURight = ['A', 'GB', 'MHC', 'SNID', 'TOJE', 'UPKF', 'VQL', 'WR', 'X']
My current code for top to buttom:
# top to buttom
topButtom_List = [] #should be ['AGMS', 'BHNT', 'CIOU', 'DJPV', 'EKQW', 'FLRX']
openFile = open("letters.txt")
for i in openFile:
i = i.replace(" ","")
length = len(i)
openFile.close()
openFile = open("letters.txt")
counter = 0
for eachIterration in range(length):
for line in openFile:
line = line.replace(" ","")
# counter should be added by 1 each time inner loop itterates x4, and outter loop x1.
topButtom_List.append(line[counter])
counter = counter + 1
openFile.close()
What I was trying to do with the code above:
I was trying to get the top to buttom characters from the text file and get it in a list called topButtom_List. I used counter to define the index that for every iteration the outer loop does, the index would be added by 1. The way I see it is, the outerloop will start, the inner loop will iterate x4 adding AGMS in the topButtom_List on the first iteration, the outer loop will iterate again and add 1 to counter. BHNTZ will be added on the second iteration and so on, the outer loop will iterate again and add 1 to counter.
From the text file: letters.txt
I want to populate topButtom_List
Output I am getting:
['A', 'G', 'M', 'S']
Expected output:
['AGMS', 'BHNT', 'CIOU', 'DJPV', 'EKQW', 'FLRX']
#!/usr/bin/python3
field = """A B C D E F
G H I J K L
M N O P Q R
S T U V W X"""
arr = [col.split(' ') for col in [row.strip() for row in field.split('\n')]]
len_x, len_y = len(arr[0]), len(arr)
len_all = len_x + len_y - 1
lines, groups = [], []
for i in range(len_all):
start = (i, 0) if i < len_y else (len_y-1, i-len_y+1)
end = (0, i) if i < len_x else (i-len_x+1, len_x-1)
lines.append([start, end])
print('List of start and end points: ', lines)
for start, end in lines:
group = ''
for i in range(len_x):
y, x = start[0] - i, start[1] + i
if y >= 0 and y < len(arr) and x < len(arr[y]):
group += arr[y][x]
else:
groups.append(group)
break
print(groups)
Returns
List of start and end points: [[(0, 0), (0, 0)], [(1, 0), (0, 1)],
[(2, 0), (0, 2)], [(3, 0), (0, 3)], [(3, 1), (0, 4)], [(3, 2), (0, 5)],
[(3, 3), (1, 5)], [(3, 4), (2, 5)], [(3, 5), (3, 5)]]
and
['A', 'GB', 'MHC', 'SNID', 'TOJE', 'UPKF', 'VQL', 'WR', 'X']

Python: determine length of sequence of equal items in list

I have a list as follows:
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
I want to determine the length of a sequence of equal items, i.e for the given list I want the output to be:
[(0, 6), (1, 6), (0, 4), (2, 3)]
(or a similar format).
I thought about using a defaultdict but it counts the occurrences of each item and accumulates it for the entire list, since I cannot have more than one key '0'.
Right now, my solution looks like this:
out = []
cnt = 0
last_x = l[0]
for x in l:
if x == last_x:
cnt += 1
else:
out.append((last_x, cnt))
cnt = 1
last_x = x
out.append((last_x, cnt))
print out
I am wondering if there is a more pythonic way of doing this.
You almost surely want to use itertools.groupby:
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
answer = []
for key, iter in itertools.groupby(l):
answer.append((key, len(list(iter))))
# answer is [(0, 6), (1, 6), (0, 4), (2, 3)]
If you want to make it more memory efficient, yet add more complexity, you can add a length function:
def length(l):
if hasattr(l, '__len__'):
return len(l)
else:
i = 0
for _ in l:
i += 1
return i
l = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
answer = []
for key, iter in itertools.groupby(l):
answer.append((key, length(iter)))
# answer is [(0, 6), (1, 6), (0, 4), (2, 3)]
Note though that I have not benchmarked the length() function, and it's quite possible it will slow you down.
Mike's answer is good, but the itertools._grouper returned by groupby will never have a __len__ method so there is no point testing for it
I use sum(1 for _ in i) to get the length of the itertools._grouper
>>> import itertools as it
>>> L = [0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,2,2,2]
>>> [(k, sum(1 for _ in i)) for k, i in it.groupby(L)]
[(0, 6), (1, 6), (0, 4), (2, 3)]

Categories

Resources