Input
['select', '*', 'from', 'ak','.','person']
I need to create a dictionary and merger the element after from
Expected Output
['select', '*', 'from', 'ak.person']
Code is below
m = []
for i in a:
if '.' == i:
ind = a.index('.')
m.append(a[ind-1] + a[ind] + a[ind+1])
else:
m.append(i)
My output >> ['select', '*', 'from', 'ak', 'ak.person', 'person']
Expected is ['select', '*', 'from', 'ak.person']
Relatively short:
arr = ['select', '*', 'from', 'ak','.','person']
ind = arr.index('from') + 1
# we join the initial array until 'from' then joining the rest:
print(arr[:ind] + ["".join(arr[ind:])])
The loop here checks if the previous element is 'from' and if so, it joins the following three elements that comes after it.
This should work for the test cases that follow the same pattern like you've given (also including longer queries with where mentioned in the updated question) .
a = ['select', '*', 'from', 'ak','.','person']
m = []
while i< len(a):
if a[i-1] == "from":
m.append("".join(a[i:i+3]))
i+=3
else:
m.append(a[i])
i+=1
Output
['select', '*', 'from', 'ak.person']
Try this. I first extracted a sub-list after 'from' and merged that.
orig = ['select', '*', 'from', 'ak', '.' ,'person']
from_pos = orig.index('from')
sublist = orig[from_pos+1:from_pos+4]
new_string = ''
for txt in sublist:
new_string += txt
new_list = orig[0:from_pos+1]
new_list.append(new_string)
print(orig)
print(new_list)
And if you have any where clause or group by after that, you can try this -
orig = ['select', '*', 'from', 'ak', '.' ,'person', 'where', 'filter', 'group', 'by']
from_pos = orig.index('from')
sublist = orig[from_pos+1:from_pos+4]
new_string = ''
for txt in sublist:
new_string += txt
new_list = orig[0:from_pos+1]
new_list.append(new_string)
where = orig[from_pos+4:]
new_list = new_list + where
print(orig)
print(new_list)
You get -
['select', '*', 'from', 'ak.person', 'where', 'filter', 'group', 'by']
Try this:
def my_func(my_list):
lp=my_list[(my_list.index('from')+1):(len(my_list))]
ans=my_list[0:(my_list.index('from')+1)]+["".join(lp)]
return ans
my_func(['select', '*', 'from', 'ak','.','person'])
Output:
['select', '*', 'from', 'ak.person']
Alternate Solution:
def sub_func(my_list):
lp=my_list[(my_list.index('.')-1):(my_list.index('.')+2)]
ans=my_list[0:(my_list.index('.')-1)]+["".join(lp)]+my_list[(my_list.index('.')+2):(len(my_list))]
return ans
def my_func(my_list):
lst=my_list
for i in range(my_list.count('.')):
lst=(lambda x: sub_func(x))(lst)
return lst
my_func(['select', '*', 'from', 'ak','.','person','where','foo','.','bar', '=', '30'])
Output:
['select', '*', 'from', 'ak.person', 'where', 'foo.bar', '=', '30']
This solution targets any elements separated by a period and joins them. Unlike the original solution, it will function with lists containing multiple periods and with lists that do not use a 'from' statement.
Golf answer:
print(a[:(i:=(a:=['select', '*', 'from', 'ak','.','person']).index('from')+1)]+[''.join(a[i:])])
Related
I am trying to trace to what extent is listA, listB, listC... similar to the original list. How do I print the number of elements that occur in the same sequence in listA as they occur in the original list?
original_list = ['I', 'live', 'in', 'space', 'with', 'my', 'dog']
listA = ['my', 'name', 'my', 'dog', 'is', 'two', 'years', 'old']
listB = ['how', 'where', 'I', 'live', 'in', 'space', 'with']
listC = ['I', 'live', 'to', 'the' 'in', 'space', 'with', 'my', 'football', 'my','dog']
Output:
listA: Count = 2 #'my', 'dog'
listB: Count = 5 #'I', 'live', 'in', 'space', 'with'
listC: Count = 2,4,2 #'I', 'live'
#'in', 'space', 'with', 'my'
#'my', 'dog'
I wrote a function that does the job I think. It might be a bit too complex, but I can't see an easier way at the moment:
original = ['I', 'live', 'in', 'space', 'with', 'my', 'dog']
listA = ['my', 'name', 'my', 'dog', 'is', 'two', 'years', 'old']
listB = ['how', 'where', 'I', 'live', 'in', 'space', 'with']
listC = ['I', 'live', 'to', 'the', 'in', 'space', 'with', 'my', 'football', 'my', 'dog']
def get_sequence_lengths(original_list, comparative_list):
original_options = []
for i in range(len(original_list)):
for j in range(i + 1, len(original_list)):
original_options.append(original_list[i:j + 1])
comparative_options = []
for i in range(len(comparative_list)):
for j in range(i+1, len(comparative_list)):
comparative_options.append(comparative_list[i:j+1])
comparative_options.sort(key=len, reverse=True)
matches = []
while comparative_options:
for option in comparative_options:
if option in original_options:
matches.append(option)
new_comparative_options = comparative_options.copy()
for l in comparative_options:
counter = 0
for v in option:
counter = counter + 1 if v in l else 0
if counter == len(l):
new_comparative_options.remove(l)
break
comparative_options = new_comparative_options
break
if option == comparative_options[-1]:
break
matches = [option for option in original_options if option in matches]
lengths = [len(option) for option in matches]
print(lengths)
print(matches)
return lengths
If you call it with the original list and example lists, it prints the following.
get_sequence_lengths(original, listA) prints [2] [['my', 'dog']].
get_sequence_lengths(original, listB) prints [5] [['I', 'live', 'in', 'space', 'with']].
get_sequence_lengths(original, listC) prints [2, 4, 2] [['I', 'live'], ['in', 'space', 'with', 'my'], ['my', 'dog']].
EDITED
I found this problem fun to do and wanted to explore some other options from the accepted one.
def _get_sequences(inter_dict : dict, list_range : int) -> tuple[set, int]:
occuring = [0] * list_range
for key, indices in inter_dict.items(): # lays out intersecting strings as they occur
for idx in indices:
occuring[idx] = key
_temp_list = []
lengths = []
matches = []
for idx in range(len(occuring)):
item = occuring.pop(0)
if item != 0: # if on python 3.8+ you could use (( item := occuring.pop(0) ) != 0) instead
_temp_list.append(item)
elif (bool(_temp_list) and len(_temp_list) > 1):
matches.append( _temp_list.copy() )
lengths.append( len(_temp_list) )
_temp_list.clear()
elif (bool(_temp_list) and item == 0) and len(_temp_list) == 1: # if its a single occurrence ignore
_temp_list.clear()
if bool(_temp_list) and len(_temp_list) > 1: # ensures no matching strings are missed
matches.append( _temp_list )
lengths.append( len(_temp_list) )
return lengths, matches
def get_intersecting(list_a, list_b) -> tuple[set, int]:
intersecting = set(list_a) & set(list_b) # returns intersecting strings
indices_dict = {}
for item in intersecting:
indices = [ index for index, value in enumerate(list_b) if value == item ] # gets occuring indices of each string
indices_dict[item] = indices
return _get_sequences( indices_dict, len(list_b) )
if __name__ == "__main__":
original = ['I', 'live', 'in', 'space', 'with', 'my', 'dog']
listA = ['my', 'name', 'my', 'dog', 'is', 'two', 'years', 'old']
listB = ['how', 'where', 'I', 'live', 'in', 'space', 'with']
listC = ['I', 'live', 'to', 'the', 'in', 'space', 'with', 'my', 'football', 'my', 'dog']
lengths, matches = get_intersecting(original, listA)
print(lengths, matches) # [2] [['my', 'dog']]
lengths, matches = get_intersecting(original, listB)
print(lengths, matches) # [5] [['I', 'live', 'in', 'space', 'with']]
lengths, matches = get_intersecting(original, listC)
print(lengths, matches) # [2, 4, 2] [['I', 'live'] ['in', 'space', 'with', 'my'] ['my', 'dog']]
EDITED x2
This would probably be my final solution.
def ordered_intersecting(list_a, list_b) -> tuple[int, list]:
matches = []
for item in list_b:
if item in list_a: # while iterating we can just add them to a return list as they appear
matches.append(item)
elif len(matches) > 1: # once we come across an item that does not intersect we know we can yield a return value ( as long as matches are greater than 1 )
yield len(matches), matches.copy() ; matches.clear() # a shallow copy should be good enough, but if needed it can be changed to a deep one
if len(matches) > 1: # catch any remaining matches
yield len(matches), matches
if __name__ == "__main__":
original = ['I', 'live', 'in', 'space', 'with', 'my', 'dog']
listA = ['my', 'name', 'my', 'dog', 'is', 'two', 'years', 'old']
listB = ['how', 'where', 'I', 'live', 'in', 'space', 'with']
listC = ['I', 'live', 'to', 'the', 'in', 'space', 'with', 'my', 'football', 'my', 'dog']
print( list(ordered_intersecting(original, listA)) )
print( list(ordered_intersecting(original, listB)) )
print( list(ordered_intersecting(original, listC)) )
I would like to separate a list in different lists at '\n'. For example, if I have a list like this one:
l = ['hi', 'my', 'name', 'is', 'john', '\n', '\n', 'nice', 'to', 'meet', 'you']
I'd like to separate the items this way:
l = [['hi', 'my', 'name', 'is', 'john'], ['nice', 'to', 'meet', 'you']]
Can someone help me?
Some code that I tried to write:
l = ['hi', 'my', 'name', 'is', 'john', '\n', '\n', 'nice', 'to', 'meet', 'you']
lst = []
ls = []
for word in l:
if word != '\n':
ls.append(l)
else:
lst.append(ls)
print(lst)
I think you just wanted to append word to the list ls. Also, clear the partial list at the newlines like so:
lst = []
ls = []
for word in l:
if word != '\n':
ls.append(word)
else:
if len(ls) > 0:
lst.append(ls)
ls = []
if len(ls) > 0:
lst.append(ls)
print(lst)
resulting in
[['hi', 'my', 'name', 'is', 'john'], ['nice', 'to', 'meet', 'you']]
You could use itertools.groupby:
>>> from itertools import groupby
>>> l = ['hi', 'my', 'name', 'is', 'john', '\n', '\n', 'nice', 'to', 'meet', 'you']
>>> l = [list(group) for key, group in groupby(l, lambda s: s != '\n') if key]
>>> l
[['hi', 'my', 'name', 'is', 'john'], ['nice', 'to', 'meet', 'you']]
I am currently doing a data analysis project involving text mining. As of now, I am stuck on filtering out certain phrases.
Suppose I have this tokenized array of words
arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld']
(hello, how is your day going? #HelloWorld)
and I want to remove the #HelloWorld from the sentence.
My original logic was traverse through the array and check for the # , once it the # has been found, I would replace the # and the element after the # with a blank space as followed:
N = 0
for index to arr:
if arr[N] == '#':
arr[N] = (' ')
arr[N+1] = (' ')
N += 1
unfortunately, I got the error list assignment index out of range at line 5. I tried to use the .append() but it only allows modification at N .
Is there another approach to this?
This should work, like the others said, you need to check when you are at the end of the list.
EDIT: simplify !
arr = ['a', 'b', '#', 'aa']
indices = [idx for idx, elt in enumerate(arr) if elt == '#']
for idx in indices:
if idx != len(arr): arr[idx+1] = ' ' # Check if not at the end of the list
arr[idx] = ' '
Your code will try to access outside the array when the last element is #, so you need to check for that.
There's also no need to use a separate variable for iteration and indexing, just iterate over the range of indexes.
for i in range(len(arr)):
if arr[i] == '#':
arr[i] = ' '
if i < len(arr)-2:
arr[i+1] = ' '
The root cause of your codes is 'N+1' will be out of range when loop to the end of the list.
If one element must exist following one '#', try below:
arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld']
for index in range(0, len(arr)):
if arr[index] == '#':
arr[index:index+2] = ['', '']
print (arr)
Output:
['hello,', 'how', 'is', 'your', 'day', 'going', '?', '', '']
[Finished in 0.133s]
if the array is end with '#', it will still replace '#' with ['',''] ( I am not sure whether this result is as you expected.
arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld', '#']
for index in range(0, len(arr)):
if arr[index] == '#':
arr[index:index+2] = ['', '']
print (arr)
Output:
['hello,', 'how', 'is', 'your', 'day', 'going', '?', '', '', '', '']
[Finished in 0.179s]
I tried to split a list into new list. Here's the initial list:
initList =['PTE123', '', 'I', 'am', 'programmer', 'PTE345', 'based', 'word',
'title', 'PTE427', 'how', 'are', 'you']
If I want to split the list based on the PTExyz to new list which looks:
newList = ['PTE123 I am programmer', 'PTE345 based word title', 'PTE427 how are you']
How should I develop proper algorithm for general case with repeated item PTExyz?
Thank You!
The algorithm will be something like this.
Iterate over the list. Find a the string s that starts with PTE. Assign it to a temp string which is initialized as an empty string. Add every next string s with temp unless that string starts with PTE. In that case, if the temp string is not empty then append it with your result list else add the string with temp.
ls = ['PTE123', '', 'I', 'am', 'programmer', 'PTE345', 'based', 'word', 'title', 'PTE427', 'how', 'are', 'you']
result = []
temp = ''
for s in ls:
if s.startswith('PTE'):
if temp != '':
result.append(temp)
temp = s
else:
if temp == '':
continue
temp += ' ' + s
result.append(temp)
print(result)
Edit
For handling the pattern PTExyz you can use regular expression. In that case the code will be like this where the line is s.startswith('PTE'):
re.match(r'PTE\w{3}$', s)
I think it will work
l =['PTE123', '', 'I', 'am', 'programmer', 'PTE345', 'based', 'word','title', 'PTE427', 'how', 'are', 'you']
resultlist = []
s = ' '.join(l)
str = s.split('PTE')
for i in str:
resultlist.append('PTE'+i)
resultlist.remove('PTE')
print resultlist
It works on a regular expression PTExyz
import re
l =['PTE123', '', 'I', 'am', 'programmer', 'PTE345', 'based', 'word',
'title', 'PTE427', 'how', 'are', 'you']
pattern = re.compile(r'[P][T][E]\d\d\d')
k = []
for i in l:
if pattern.match(i) is not None:
k.append(i)
s = ' '.join(l)
str = re.split(pattern, s)
str.remove('')
for i in range(len(k)):
str[i] = k[i] + str[i]
print str
>>> list =['PTE123', '', 'I', 'am', 'programmer', 'PTE345', 'based', 'word','title', 'PTE427', 'how', 'are', 'you']
>>> index_list =[ list.index(item) for item in list if "PTE" in item]
>>> index_list.append(len(list))
>>> index_list
[0, 5, 9, 13]
>>> [' '.join(list[index_list[i-1]:index_list[i]]) for i,item in enumerate(index_list) if item > 0 ]
Output
['PTE123 I am programmer', 'PTE345 based word title', 'PTE427 how are you']
This question already has answers here:
In Python, how do I split a string and keep the separators?
(19 answers)
Closed 5 years ago.
string="i-want-all-dashes-split"
print(split(string,"-"))
So I want the output to be:
string=(I,-,want,-,all,-,dashes,-,split)
I basically want to partition all the "-"'s.
>>> import re
>>> string = "i-want-all-dashes-split"
>>> string.split('-') # without the dashes
['i', 'want', 'all', 'dashes', 'split']
>>> re.split('(-)', string) # with the dashes
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']
>>> ','.join(re.split('(-)', string)) # as a string joined by commas
'i,-,want,-,all,-,dashes,-,split'
string="i-want-all-dashes-split"
print 'string='+str(string.split('-')).replace('[','(').replace(']',')').replace(' ','-,')
>>>string=('i',-,'want',-,'all',-,'dashes',-,'split')
Use the split function from str class:
text = "i-want-all-dashes-split"
splitted = text.split('-')
The value of splitted be a list like the one bellow:
['i', 'want', 'all', 'dashes', 'split']
If you want the output as a tuple, do it like in the code bellow:
t = tuple(splitted)
('i', 'want', 'all', 'dashes', 'split')
string="i-want-all-dashes-split"
print(string.slip('-'))
# Output:
['i', 'want', 'all', 'dashes', 'split']
string.split()
Inside the () you can put your delimiter ('-'), if you don't put anything it would be (',') by default.
You can make a function:
def spliter(string, delimiter=','): # delimiter have a default argument (',')
string = string.split(delimiter)
result = []
for x, y in enumerate(string):
result.append(y)
if x != len(string)-1: result.append(delimiter)
return result
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']
You can use this function too:
Code:
def split_keep(s, delim):
s = s.split(delim)
result = []
for i, n in enumerate(s):
result.append(n)
if i == len(s)-1: pass
else: result.append(delim)
return result
Usage:
split_keep("i-want-all-dashes-split", "-")
Output:
['i', '-', 'want', '-', 'all', '-', 'dashes', '-', 'split']