I am newbie in python. I have split a list which contains 100 separate string. It all have 300 chars in it. After splitting, it became to act like 2D array and I want to join them together to get an list in the beginning.
Below is my sample list and what I have tried but it does not work. I want to replace ' ' instead of '1' and remove less than 3 length of chars and join them together. Only replacing function does not work, I cannot remove words this situation.
1 c1|FaAO120O'8ovfoy1W#atvGs1[1s1[1/1]O-a8o1-...
2 O8v^10O#to1'#^'^tv1^]s111t01Otaq>-ata_1...
3 *#^-G1_#O-#b^'ta8a2%e1|28Oot^12#O-#ys1>c...
def tokenize(text):
return text.split("1")
def trimm(text):
return ' '.join([i for i in data if len(i) > 3])
token_data = [tokenize(i) for i in X]
#trim_data = [trimm(i) for i in token_data]
for n in token_data:
for i in token_data[n]:
res=trimm(i)
Below is after tokenize function.
['c', '|FaAO', "20O'8o\x02vfoy", 'W#at\x1bvGs', '[', 's', '[', '/', ']O-a8o', '-\x1b-\x03\x1b#', '^]', '-a\x02\x1b', 'av', 'vc]]\x1b#a\x02d', ']#^-', 'O', 'v\x1bz\x1b#\x1b', "A\x1b'#\x1bvva^\x02", '\x03#^cd0t', '^\x02s', '[', '\x03o', "-\x1b\x02^'Ocv\x1b", 'Ov', 'W\x1b88', 'Ov', 'O', '-\x1b\x02tO8', '\x03#\x1bOf', 'A^W\x02\x08', '', '>0\x1b', 'av', '\x03\x1ba\x02d', 't#\x1bOt\x1bA', 'Wat0s', '[', 'gO8oA^8', 'Wat0', 'v^-\x1b', 'vc__\x1bvv', '\x03ct', 't0\x1b', 't#\x1bOt-\x1b\x02tv', '\x03\x1ba\x02d', "'#^zaA\x1bA", 't0#^cd0', '0\x1b#s', '[', "'vo_0aOt#avt", 'O#\x1b', '\x02^t', 'vOtav]O_t^#o\x08', '', '>^-']
Below is after trimm function
|FaAO 20O'8ovfoy W#atvGs ]O-a8o --# -a vc]]#ad ]#^- vz# A'#vva^ #^cd0t -^'Ocv W88 -tO8 #Of A^W ad t#OtA Wat0s gO8oA^8 Wat0 v^- vc__vv t#Ot-tv ad '#^zaAA t0#^cd0 0#s 'vo_0aOt#avt vOtav]O_t^#
I can do above situation only one 300 chars string. However I want it to do all strings in the original list. Therefore how can I make a loop that trimm and join every string ?
These two lines look wrong:
for n in token_data:
for i in token_data[n]:
n will be an element of token_data, taking token_data[n] does not make sense to me, since n is not an index, instead I would use for i in n: for the second for loop.
Related
I have an arbitrarily nested array of values that looks like this:
['"multiply"', 'ALAssn ', ['ACmp ', ['Ge ', ['Var "n"'], ' ', ['Num 0']]], ['ALAssn ', ['ACmp ', ['Eq ', ['Var "p"'], ' ', ['Mul ', ['Var "n"'], ' ', ['Var "m"']]]]]
and I need to try and figure out a way to parse through the every value in the array and format it so that:
Each array of length 1 is split into two separate values:
-- Example: ['Var "n"'] should now become ["Var", "n"] and ['Num 0'] now becomes ["Num", 0].
All instances of empty list values are removed.
-- Example: ['Ge ', ['Var "n"'], ' ', ['Num 0']] now becomes ['Ge ', ['Var "n"'], ['Num 0']]
The whitespace in any string is removed.
-- Example: 'Ge ' now becomes 'Ge'
The given snippet is a portion of a much larger string that needs to parsed. I understand what needs to be done at a high level..ie:
Once I get to an list of length 1, list.split(" ") to split into two separate elements, then trim arr[1] to get rid of the extra quotation marks
If el is an empty string for every element in the list, list.remove(el)
Check if isinstance(el, string) of every element when traversing, and if true, el.replace(" ", "") to rid of the whitespace.
My only issue comes when traversing through every single element in the list. I've tried doing so recursively and iteratively, but so far haven't been able to crack it.
Ideally, I traverse through every single element, and then once I hit an element that meets the criteria, set that element equal to the change that I want to make on it. This is only really the case for points 1 and 3.
EDIT:
Thank you so much for the answers given. I have one more addition I would like to make.
Assume too I have a nested identifiers like 'Reads "a"' as the first value of an array, with the possibility of having addition identifiers like Write "a" in the same level. These also needs to be converted to the format ["Read", "a"]. See the change in the large list below. How would I go about doing this?
['Read "a"', ['Add', ['Var', 'i'], ['Num', '1']]], 'Write "a"', ['Add', ['Var', 'i'], ['Num', '1']], ['Var', 't']]
The point of these values 'Read' and 'Write' is so that, when traversing the list, we know the "type" of the next n elements of the list corresponding to that identifier. We can distinguish them basically by saying they are are the only values in the nested list that will not be lists themselves.
For example: ['identifier', [], [], []]
Assume it is known that the identifier type contains 3 lists, first, second, third. The goal is to read identifier and then store first, second, and third as nodes in a tree, for example.
This problem seems like it would be easiest to deal with by constructing a new list with the fixed-up items, rather than trying to modify the existing list in place. This would let you use recursion to deal with the nesting, while using iteration over the flat parts of each list.
I'd structure the code like this:
def process(lst):
if len(lst) == 1: # special case for one-element lists
result = lst[0].split()
result[1] = result[1].strip('"') # strip quotation marks
return result
result = []
for item in lst:
if isinstance(item, list):
result.append(process(item)) # recurse on nested lists
else: # item is a string
stripped = item.strip() # remove leading and trailing whitespace
if stripped:
result.append(stripped) # keep only non-empty strings
return result
Seems you can collapse 1 and 3 into one operation:
def sanitize(item):
if isinstance(item, list):
if len(item) == 1:
item = item[0].split()
return [output for i in item if (output := sanitize(i))]
return item.strip('" ') # Strips both '"' and ' '.
item = ['"multiply"', 'ALAssn ', ['ACmp ', ['Ge ', ['Var "n"'], ' ', ['Num 0']]], ['ALAssn ', ['ACmp ', ['Eq ', ['Var "p"'], ' ', ['Mul ', ['Var "n"'], ' ', ['Var "m"']]]]]]
sanitize(item)
# Returns: ['multiply', 'ALAssn', ['ACmp', ['Ge', ['Var', 'n'], ['Num', '0']]], ['ALAssn', ['ACmp', ['Eq', ['Var', 'p'], ['Mul', ['Var', 'n'], ['Var', 'm']]]]]]
How would you make a list of all the possible substrings in a string using recursion? (no loops) I know that you can recurse using s[1:] to cut off the first position and s[:-1] to cut off the last position. So far I have come up with this:
def lst_substrings(s):
lst = []
if s == "":
return lst
else:
lst.append(s)
return lst_substrings(s[1:])
but this would only make a list of all the substrings that are sliced by the first position if it worked
Fun problem, here's my solution - feedback appreciated.
Output
In [73]: lstSubStrings("Hey")
Out[73]: ['', 'y', 'H', 'Hey', 'He', 'e', 'ey']
Solution
def lstSubStrings(s):
# BASE CASE: when s is empty return the empty string
if(len(s) is 0):
return [s]
substrs = []
# a string is a substring of itself - by the definition of subset in math
substrs.append(s)
# extend the list of substrings by all substrings with the first
# character cut out
substrs.extend(lstSubStrings(s[1:]))
# extend the list of substrings by all substrings with the last
# character cut out
substrs.extend(lstSubStrings(s[:-1]))
# convert the list to `set`, removing all duplicates, and convert
# back to a list
substrs = list(set(substrs))
return substrs
EDIT: Duh. Just realized now that practically the same solution has been posted by someone who was quicker than me. Vote for his answer. I'll leave this as it is a bit more concise and in case you want to sort the resulting list by substring length. Use len(item, item), i.e. leave the - sign, to sort in ascending order.
This will do:
def lst_substrings(s):
lst = [s]
if len(s) > 0:
lst.extend(lst_substrings(s[1:]))
lst.extend(lst_substrings(s[:-1]))
return list(set(lst))
sub = lst_substrings("boby")
sub.sort(key=lambda item: (-len(item), item))
print(sub)
Output is:
['boby', 'bob', 'oby', 'bo', 'by', 'ob', 'b', 'o', 'y', '']
this is a question I have from one of my review packages and I'm pretty stumped. This is the description
"Return a list of m strings, where m is the length of a longest string
in strlist, if strlist is not empty, and the i-th string returned
consists of the i-th symbol from each string in strlist, but only from
strings that have an i-th symbol, in the order corresponding to the
order of the strings in strlist.
Return [] if strlist contains no nonempty strings."
This is the example
transpose(['transpose', '', 'list', 'of', 'strings'])
['tlos', 'rift', 'asr', 'nti', 'sn', 'pg', 'os', 's', 'e']
And this is the given format/style you gotta follow
# create an empty list to use as a result
# loop through every element in the input list
# loop through each character in the string
# 2 cases to deal with here:
# case 1: the result list has a string at the correct index,
# just add this character to the end of that string
# case 2: the result list doesn't have enough elements,
# need to create a new element to store this character
I got upto the "2 cases to deal with here:" part and then I got stuck, this is what I have so far
result = []
for index in strlist:
for char in range (len(index)):
this should work:
def transpose(strlist):
# create an empty list to use as a result
result = []
# loop through every element in the input list
for i in range(len(strlist)):
# loop through each character in the string
for j in range(len(strlist[i])):
# 2 cases to deal with here:
if len(result) > j:
# case 1: the result list has a string at the correct index,
# just add this character to the end of that string
result[j] = result[j] + strlist[i][j]
else:
# case 2: the result list doesn't have enough elements,
# need to create a new element to store this character
result.append(strlist[i][j])
return result
print(transpose(['transpose', '', 'list', 'of', 'strings']))
it outputs: ['tlos', 'rift', 'asr', 'nti', 'sn', 'pg', 'os', 's', 'e']
there are more pythonic ways to achieve it, but the shown code matches your given format/style
I'm trying to do something like a "conjugator".
Say I have a list of endings:
endings = ['o', 'es', 'e', 'emos', 'eis', 'em']
and I have a verb root as a string:
root = "com"
The way I thought of doing this is:
for ending in endings:
print root + ending
which outputs:
como
comes
come
comemos
comeis
comem
But my desired result is:
como, comes, come, comemos, comeis, comem
How can I achieve exactly this (and with no quotes around each of the resulting items, and no comma after the last item)?
You need a list comprehension and str.join(). From the documentation:
str.join(iterable)
Return a string which is the concatenation of the
strings in the iterable iterable. The separator between elements is
the string providing this method.
>>> root = "com"
>>> endings = ['o', 'es', 'e', 'emos', 'eis', 'em']
>>> verbs = [root + ending for ending in endings]
>>> print ", ".join(verbs)
como, comes, come, comemos, comeis, comem
I have a string returnd from a software like "('mono')" from that I needed to convert string to tuple .
that I was thinking using ast.literal_eval("('mono')") but it is saying malformed string.
Since you want tuples, you must expect lists of more than element in some cases. Unfortunately you don't give examples beyond the trivial (mono), so we have to guess. Here's my guess:
"(mono)"
"(two,elements)"
"(even,more,elements)"
If all your data looks like this, turn it into a list by splitting the string (minus the surrounding parens), then call the tuple constructor. Works even in the single-element case:
assert data[0] == "(" and data[-1] == ")"
elements = data[1:-1].split(",")
mytuple = tuple(elements)
Or in one step: elements = tuple(data[1:-1].split(",")).
If your data doesn't look like my examples, edit your question to provide more details.
How about using regular expressions ?
In [1686]: x
Out[1686]: '(mono)'
In [1687]: tuple(re.findall(r'[\w]+', x))
Out[1687]: ('mono',)
In [1688]: x = '(mono), (tono), (us)'
In [1689]: tuple(re.findall(r'[\w]+', x))
Out[1689]: ('mono', 'tono', 'us')
In [1690]: x = '(mono, tonous)'
In [1691]: tuple(re.findall(r'[\w]+', x))
Out[1691]: ('mono', 'tonous')
Convert string to tuple? Just apply tuple:
>>> tuple('(mono)')
('(', 'm', 'o', 'n', 'o', ')')
Now it's a tuple.
Try to this
a = ('mono')
print tuple(a) # <-- you create a tuple from a sequence
#(which is a string)
print tuple([a]) # <-- you create a tuple from a sequence
#(which is a list containing a string)
print tuple(list(a))# <-- you create a tuple from a sequence
# (which you create from a string)
print (a,)# <-- you create a tuple containing the string
print (a)
Output :
('m', 'o', 'n', 'o')
('mono',)
('m', 'o', 'n', 'o')
('mono',)
mono
I assume that the desired output is a tuple with a single string: ('mono',)
A tuple of one has a trailing comma in the form (tup,)
a = '(mono)'
a = a[1:-1] # 'mono': note that the parenthesis are removed removed
# if they are inside the quotes they are treated as part of the string!
b = tuple([a])
b
> ('mono',)
# the final line converts the string to a list of length one, and then the list to a tuple