Python Group Repeated Values in List in a Sublist - python

I need to append some repeated values from a list into a sublist, let me explain with an example:
I have a variable called array that contains strings of uppercase letters and $ symbols.
array = ['F', '$', '$', '$', 'D', '$', 'C']
My end goal is to have this array:
final_array = ['F', ['$', '$', '$'], 'D', ['$'], 'C']
As in the example, I need to group all $ symbols that are togheter into sublist in the original array, I thought about iterating over the array and finding all symbols near the current $ and then creating a second array, but I think maybe there is something more pythonic I can do, any ideas?

You can use groupby from itertools
array = ['F', '$', '$', '$', 'D', '$', 'C']
from itertools import groupby
result = []
for key, group in groupby(array):
if key == '$':
result.append(list(group))
else:
result.append(key)
print(result)
You can of course shorten the for-loop to a comprehension:
result = [list(group) if key == '$' else key for key, group in groupby(array)]

A general approach, that would work in every case (not just '$'):
array = ['F', '$', '$', '$', 'D', '$', 'C']
different_values = []
final_array = []
aux_array = []
old_value = None
for value in array:
if value not in different_values:
different_values.append(value)
final_array.append(value)
aux_array = []
else:
if value == old_value:
aux_array = list(final_array[-1])
del final_array[-1]
aux_array.append(value)
final_array.append(aux_array)
else:
aux_array = [value]
final_array.append(aux_array)
old_value = value
print(final_array)

I have made a little change from #rdas answer just in case we have an array with repeated values that we want to keep:
array = ['F', 'F', '$', '$', '$', 'D', '$', 'C']
result = []
for key, group in groupby(array):
if key == '$':
result.append(list(group))
else:
for g in list(group):
result.append(g)
print(result)
# ['F', 'F', ['$', '$', '$'], 'D', ['$'], 'C']

Related

How to extract list of pairs in a list enclosed by hash symbols?

For example, from the 'tokens' list below, I want to extract the pair_list:
tokens = ['0', '#', 'a', 'b', '#', '#', 'c', '#', '#', 'g', 'h', 'g', '#']
pair_list = [['a', 'b'], ['c'], ['g', 'h', 'g']]
I was trying to do something like below, but hasn't succeeded:
hashToken_begin_found = True
hashToken_end_found = False
previous_token = None
pair_list = []
for token in tokens:
if hashToken_begin_found and not hashToken_end_found and previous_token and previous_token == '#':
hashToken_begin_found = False
elif not hashToken_begin_found:
if token == '#':
hashToken_begin_found = True
hashToken_end_found = True
else:
...
ADDITION:
My actual problem is more complicated. What's inside the pair of # symbols are words in social media, like hashed phrases in twitter, but they are not English. I was simplified the problem to illustrate the problem. The logic would be something like I wrote: found the 'start' and 'end' of each # pair and extract it. In my data, anything in a pair of hash tags is a phrase, i.e. I live in #United States# and #New York#!. I need to get United States and New York. No regex. These words are already in a list.
I think you're overcomplicating the issue here. Think of the parser as a very simple state machine. You're either in a sublist or not. Every time you hit a hash, you toggle the state.
When entering a sublist, make a new list. When inside a sublist, append to the current list. That's about it. Here's a sample:
pair_list = []
in_pair = False
for token in tokens:
if in_pair:
if token == '#':
in_pair = False
else:
pair_list[-1].append(token)
elif token == '#':
pair_list.append([])
in_pair = True
You could try itertools.groupby in one single line:
from itertools import groupby
tokens = ['0', '#', 'a', 'b', '#', '#', 'c', '#', '#', 'g', 'h', 'g', '#']
print([list(y) for x, y in itertools.groupby(tokens, key=lambda x: x.isalpha()) if x])
Output:
[['a', 'b'], ['c'], ['g', 'h', 'g']]
I group by the consecutive groups where the value is alphabetic.
If you want to use a for loop you could try:
l = [[]]
for i in tokens:
if i.isalpha():
l[-1].append(i)
else:
if l[-1]:
l.append([])
print(l[:-1])
Output:
[['a', 'b'], ['c'], ['g', 'h', 'g']]
Another way (Try it online!):
it = iter(tokens)
pair_list = []
while '#' in it:
pair_list.append(list(iter(it.__next__, '#')))
Yet another (Try it online!):
pair_list = []
try:
i = 0
while True:
i = tokens.index('#', i)
j = tokens.index('#', i + 1)
pair_list.append(tokens[i+1 : j])
i = j + 1
except ValueError:
pass

Add item into each subarray python

I have an array arr_ = [['a', 'b'], ['x', 'y']]
I want to put at the beginning of each subarray the character !
so it should look kile this [['!', 'a', 'b'], ['!', 'x', 'y']]
this is what i've done so far:
def concat(*args):
return ['!', *args]
arr_ = [['a', 'b'], ['x', 'y']]
n = map(concat, arr_)
print(list(n))
but the result is [['!', ['a', 'b']], ['!', ['x', 'y']]]
What should i do?
just remove the * in the argument of the mapper:
def concat(args):
return ['!', *args]
arr_ = [['a', 'b'], ['x', 'y']]
n = map(concat, arr_)
list(n)
>>> [['!', 'a', 'b'], ['!', 'x', 'y']]
what happening is you packing and then unpacking the lists in each iteration...
when you add * to the argument you turn in into a list that contains 1 item.
when u skip this stage you can the real list and unpack it with * in the list statement return ['!', *args] which is equal to do : ['!'] + args to add the '!' in the beginning
this would work as expected if you did def concat(args):
What's happening is if you do *args as the parameter, every parameter you pass in will be put into a list stored in *args
And you're passing in things like ['a', 'b']
So that will be put into a list so really args is: [['a', 'b']]
and then in your return you unpack that list
but it only contains one element, that being the original list
so you get ['!', ['a', 'b']]
just a note, map is generally considered unpythonic since we also have list comprehensions:
n = [concat(inner) for inner in arr_]
you could go one step further and do this:
n = [['!'] + inner for inner in arr_]
I don't know how to explain the bug in your code. But I found another way to do it.
arr = [['a', 'b'], ['x', 'y']]
X = []
for i in arr:
X.append(['!'] + i)
print(X)
Output:
[['!', 'a', 'b'], ['!', 'x', 'y']]
Using List Comprehension:
arr = [['a', 'b'], ['x', 'y']]
X = [['!'] + x for x in arr]
print(X)
Output:
[['!', 'a', 'b'], ['!', 'x', 'y']]

How to convert numerical strings inside a mixed list of list to int (Python)

how do I convert all the numerical strings, inside a list of list that contains both alphabetical, and numerical strings, into an integer?
My Output:
[['69', ' Test', 'Results'], ['A', 'B', 'C'], ['D', '420', 'F']]
Intended Output:
[[69, ' Test', 'Results'], ['A', 'B', 'C'], ['D', 420, 'F']]
Note that my code reads a CSV file. Thanks everyone
def get_csv_as_table(a, b):
s = False
import csv
with open(a) as csv_file:
file_reader = csv.reader(csv_file, delimiter=b)
member = list(file_reader)
print(member)
print ("Enter filename: ")
a = input()
print ("Enter the delimiter: ")
b = input()
get_csv_as_table(a, b)
You can use list comprehension to achieve this. The only minor downside to this is that you will be creating a new list for this instead of modifying the existing list.
my_list = [['69', 'Test', 'Results'], ['A', 'B', 'C'], ['D', '420', 'F']]
filtered_list = [
[int(item) if item.isdigit() else item for item in sub_list]
for sub_list in my_list
]
If you want to edit the list in-place, you can use traditional for-loop. The following code will edit the existing list without creating a new list. This could turn out to be useful in case you have a large list.
my_list = [['69', 'Test', 'Results'], ['A', 'B', 'C'], ['D', '420', 'F']]
for i in range(len(my_list)):
for j in range(len(my_list[i])):
if my_list[i][j].isdigit():
my_list[i][j] = int(my_list[i][j])
str.isdigit() checks if a given string is a number or not. An important note to keep in mind is that, it does not work for floating-point numbers, just integers. Once the condition passes, the item is converted to integer.
Yoy have to combine 2 levels of list-comprehension and use str.isdigit()
values = [
[int(val) if val.isdigit() else val for val in row]
for row in values
]
Try with 2-level list comprehension and int()+.isdigit() power combo in list comprehension ;-)
l=[['69', ' Test', 'Results'], ['A', 'B', 'C'], ['D', '420', 'F']]
l=[[int(y) if y.isdigit() else y for y in x] for x in l]
print(l)
Output:
[[69, ' Test', 'Results'], ['A', 'B', 'C'], ['D', 420, 'F']]
.isdigit() only works on string representation of pure integers, In case if you have floats too then replace '.' to nothing ;-)
l=[['69', ' Test', 'Results'], ['A', 'B', 'C'], ['D', '420', 'F']]
l=[[float(y) if y.replace('.','').isdigit() else y for y in x] for x in l]
print(l)
Output:
[[69.0, ' Test', 'Results'], ['A', 'B', 'C'], ['D', 420.0, 'F']]

Python: merge adjacent number in list

is it possible to merge the numbers in a list of chars?
I have a list with some characters:
my_list = ['a', 'f', '£', '3', '2', 'L', 'k', '3']
I'm want to concatenate the adjacent numbers as follow:
my_list = ['a', 'f', '£', '32', 'L', 'k', '3']
I have this, and it works fine, but i don't really like how it came out.
def number_concat(my_list):
new_list = []
number = ""
for ch in my_list:
if not ch.isnumeric():
if number != "":
new_list.append(number)
number =""
new_list.append(ch)
else:
number = ''.join([number,ch])
if number != "":
new_list.append(number)
return new_list
What's the best way to do this?
You can use itertools.groupby:
from itertools import groupby
my_list = ['a', 'f', '£', '3', '2', 'L', 'k', '3']
out = []
for _, g in groupby(enumerate(my_list, 2), lambda k: True if k[1].isdigit() else k[0]):
out.append(''.join(val for _, val in g))
print(out)
Prints:
['a', 'f', '£', '32', 'L', 'k', '3']
you can use a variable to track the index position in the list and then just compare two elements and if they are both digits concat them by popping the index and adding it to the previous one. we leave index pointing to the same value since we popd all other elements iwll have shifted so we need to check this index again and check the next char which will now be in that index. If the char is not a digit then move the index to the next char.
# coding: latin-1
my_list = ['a', 'f', '£', '3', '2', 'L', 'k', '3']
index = 1
while index < len(my_list):
if my_list[index].isdigit() and my_list[index - 1].isdigit():
my_list[index - 1] += my_list.pop(index)
else:
index += 1
print(my_list)
OUTPUT
['a', 'f', '£', '32', 'L', 'k', '3']
Regex:
>>> re.findall('\d+|.', ''.join(my_list))
['a', 'f', '£', '32', 'L', 'k', '3']
itertools:
>>> [x for d, g in groupby(my_list, str.isdigit) for x in ([''.join(g)] if d else g)]
['a', 'f', '£', '32', 'L', 'k', '3']
Another:
>>> [''.join(g) for _, g in groupby(my_list, lambda c: c.isdigit() or float('nan'))]
['a', 'f', '£', '32', 'L', 'k', '3']
You are just trying to reduce your numbers together.
One way to accomplish this is to loop through the list, and check if it's a number using str.isnumeric().
my_list = ['a', 'f', '£', '3', '2', 'L', 'k', '3']
new_list = ['']
for c in my_list:
if c.isnumeric() and new_list[-1].isnumeric(): # Check if current and previous character is a number
new_list[-1] += c # Mash characters together.
else:
new_list.append(c)
else:
new_list[:] = new_list[1:] # Remove '' placeholder to avoid new_list[-1] IndexError
print(new_list) # ['a', 'f', '£', '32', 'L', 'k', '3']
This has also been tested with first character is numeric.
sure! this will combine all consecutive digits:
i = 0
while i < len(my_list):
if my_list[i].isdigit():
j = 1
while i+j < len(my_list) and my_list[i+j].isdigit():
my_list[i] += my_list.pop(i+j)
j += 1
i += 1
you can also do this recursively, which is maybe more elegant (in that it will be easier to build up correctly as the task becomes more complicated) but also possibly more confusing:
def group_digits(list, accumulator=None):
if list == []:
return accumulator or []
if not accumulator:
return group_digits(list[1:], list[:1])
x = list.pop(0)
if accumulator[-1].isdigit() and x.isdigit():
accumulator[-1] += x
else:
accumulator.append(x)
return group_digits(list, accumulator)
A quick and dirty way under the assumption that the non-numeric characters are not white-space:
''.join(c if c.isdigit() else ' '+ c + ' ' for c in my_list).split()
The idea is to pad with spaces the characters that you don't want merged, smush the resulting characters together so that the non-padded ones become adjacent, and then split the result on white-space, the net result leaving the padded characters unchanged and the non-padded characters joined.
I have written a beginner-friendly solution using an index and two lists:
my_list = ['a', 'f', '£', '3', '2', 'L', 'k', '3']
result = []
index = 0
for item in my_list:
if item.isdigit():
# If current item is a number
if my_list[index-1].isdigit() and len(result) > 1:
# If previous item is a number too and it is not the 1st item
# of the list, sum the two and put them in the previous slot in result
result[index-1] = my_list[index-1] + my_list[index]
else:
result.append(item)
else:
result.append(item)
index += 1
print(my_list)
print(result)
Output
['a', 'f', '£', '3', '2', 'L', 'k', '3']
['a', 'f', '£', '32', 'L', 'k', '3']

Create a list using the letters in x

Using list comprehension, create a list of all the letters used in x.
x = ‘December 11, 2018’
I tried writing each letter out but I am receiving a syntax error!
In Python a string acts as a list; it is easier and quicker to convert the list into a set (only unique values) and then back to a list:
unique_x = list(set(x))
Or if you must use list comprehension:
used = set()
all_x = "December 11, 2018"
unique_x = [x for x in all_x if x not in used and (used.add(x) or True)]
x = "December 11, 2018"
lst = [letter for letter in x]
print(lst) # test
Output:
['D', 'e', 'c', 'e', 'm', 'b', 'e', 'r', ' ', '1', '1', ',', ' ', '2', '0', '1', '8']
You can make a list comprehension like:
x = ‘December 11, 2018’
new_list = [letter for letter in x]
print(new_list)
# Output
# ['D', 'e', 'c', 'e', 'm', 'b', 'e', 'r', ' ', '1', '1', ',', ' ', '2', '0', '1', '8']
Alternatively, you could skip the list comprehension and just use new_list = list(x) to get the same result.
if you want only the letters and no spaces, you can use .replace on x like: x.replace(' ','') or add on if clause in your list comprehension:
new_list = [letter for letter in x if letter != ' ']
This should work
x = list('December 11, 2018')
print(x)
result = []
for item in x:
try:
int(item)
except ValueError:
if item == "," or item == " ":
pass
else:
result.append(item)
print(result)
"""
Output:
['D', 'e', 'c', 'e', 'm', 'b', 'e', 'r']
"""
If you are using only dates with that format, you could do this
x = "December 11, 2018".split()
print(x[0])
"""
Output:
'December'
"""

Categories

Resources