using python 3.6 to slice substring with same char [duplicate]

using python 3.6 to slice substring with same char [duplicate] - python

I am not well experienced with Regex but I have been reading a lot about it. Assume there's a string s = '111234' I want a list with the string split into L = ['111', '2', '3', '4']. My approach was to make a group checking if it's a digit or not and then check for a repetition of the group. Something like this
L = re.findall('\d[\1+]', s)
I think that \d[\1+] will basically check for either "digit" or "digit +" the same repetitions. I think this might do what I want.

Use re.finditer():
>>> s='111234'
>>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)]
['111', '2', '3', '4']

If you want to group all the repeated characters, then you can also use itertools.groupby, like this
from itertools import groupby
print ["".join(grp) for num, grp in groupby('111234')]
# ['111', '2', '3', '4']
If you want to make sure that you want only digits, then
print ["".join(grp) for num, grp in groupby('111aaa234') if num.isdigit()]
# ['111', '2', '3', '4']

Try this one:
s = '111234'
l = re.findall(r'((.)\2*)', s)
## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l
## now I am keeping only the first value from the tuple of each list
lst = [x[0] for x in l]
print lst
output:
['111', '2', '3', '4']

If you don't want to use any libraries then here's the code:
s = "AACBCAAB"
L = []
temp = s[0]
for i in range(1,len(s)):
if s[i] == s[i-1]:
temp += s[i]
else:
L.append(temp)
temp = s[i]
if i == len(s)-1:
L.append(temp)
print(L)
Output:
['AA', 'C', 'B', 'C', 'AA', 'B']

Related

Split list in python when same values occurs into a list of sublists

Using python, I need to split my_list = ['1','2','2','3','3','3','4','4','5'] into a list with sublists that avoid the same value. Correct output = [['1','2','3','4','5'],['2','3','4'],['3']]

Probably not the most efficient approach but effective nonetheless:
my_list = ['1','2','2','3','3','3','4','4','5']
output = []
for e in my_list:
for f in output:
if not e in f:
f.append(e)
break
else:
output.append([e])
print(output)
Output:
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

I assumed you are indexing every unique element with its occurrence and also sorted the result list to better suit your desired output.
uniques = list(set(my_list))
uniques.sort()
unique_counts = {unique:my_list.count(unique) for unique in uniques}
new_list = []
for _ in range(max(unique_counts.values())):
new_list.append([])
for unique,count in unique_counts.items():
for i in range(count):
new_list[i].append(unique)
The output for new_list is
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

By using collections.Counter for recognizing the maximum number of the needed sublists and then distributing consecutive unique keys on sublists according to their frequencies:
from collections import Counter
my_list = ['1','2','2','3','3','3','4','4','5']
cnts = Counter(my_list)
res = [[] for i in range(cnts.most_common(1).pop()[1])]
for k in cnts.keys():
for j in range(cnts[k]):
res[j].append(k)
print(res)
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

Here's a way to do it based on getting unique values and counts using list comprehension.
my_list = ['1','2','2','3','3','3','4','4','5']
unique = [val for i,val in enumerate(my_list) if val not in my_list[0:i]]
counts = [my_list.count(val) for val in unique]
output = [[val for val,ct in zip(unique, counts) if ct > i] for i in range(max(counts))]

Code works only sometimes for removing odd or even index items

Question : Write a Python program to remove the characters which have odd or even index
values of a given string.
I tried to make a copy of the list by deep copy .
I ran a loop from first list and checked for even then used pop method on second list to remove that specific index from the second list .
This code works for some inputs , I think mostly for those which doesn't have any repeated characters and doesn't work for others.
Code
#!/usr/bin/python3
import copy
list1 = input("Enter a string ")
list1 = list(list1)
list2 = copy.deepcopy(list1)
for i in list1:
if list1.index(i)%2 != 0:
list2.pop(list2.index(i))
print(list2)
The outputs for some samples are :
123456789 -> ['1', '3', '5', '7', '9'], qwertyuiop -> ['q', 'e', 't', 'u', 'o'], saurav -> ['s', 'u'], 11112222333344445555 -> ['1', '1', '1', '1', '2', '2', '2', '2', '3', '3', '3', '3', '4', '4', '4', '4', '5', '5', '5', '5']

Read the documentation for index. It returns the index of the first occurrence of the given value. A simple print inside the loop will show you what's going on, in appropriate detail. This is a basic debugging skill you need to learn for programming in any language.
import copy
list1 = input("Enter a string ")
list1 = list(list1)
list2 = copy.deepcopy(list1)
for i in list1:
if list1.index(i)%2 != 0:
print(i, list1.index(i), list2.index(i))
list2.pop(list2.index(i))
print(list2)
print(list2)
output:
Enter a string google
o 1 1
['g', 'o', 'g', 'l', 'e']
o 1 1
['g', 'g', 'l', 'e']
e 5 3
['g', 'g', 'l']
['g', 'g', 'l']
... and that's your trouble. Fix your logic. You already know the needed index to save or remove. There is no need to extract the character, and then search for it again. You already know where it is.
Even better, simply slice the original string for the characters you want:
print(list1[::2])

Your problem is the list.index function. The documentation states that it "returns zero-based index in the list of the first item whose value is equal to x." Because you are calling it on list1 - and that is not modified - the result will always be list1.index('a') == 1 for example.
The correct solution would be to use enumerate. A further problem exists here - because you are indexing from an array that you have not modified, you indexes will be off after the first list.pop operation. Every item after the one removed will have been shifted by 1. To correct this, you could instead try building a list instead of emptying one:
#!/usr/bin/python3
list1 = input("Enter a string ")
list2 = []
for i, item in enumerate(list1):
if i % 2 == 0:
list2.append(item)
print(list2)

You don't need to iterate at all. Just reference the string elements directly.
st="123456789"
print('Odd: ', list(st[::2]))
print('Even: ', list(st[1::2]))
Output:
Odd: ['1', '3', '5', '7', '9']
Even: ['2', '4', '6', '8']

The method list.index(i) returns index in the list of the first item whose value is equal to i.
For example, "saurav".index('a') returns 1. when you call list2.pop(list2.index(i)) and you want to pop an a, it doesn't work well.
I think it can be simple using range as build-in function.
list1 = list(input("Enter a string "))
list2 = list()
for i in range(len(list1)):
if i % 2 == 0:
list2.append(list1[i])
print(list2)
It works with same way by following:
list1 = list(input("Enter a string "))
list2 = list()
for i in range(0, len(list1), 2):
list2.append(list1[i])
print(list2)
Also, you can use Extended Slices in Python 2.3 or above.
list1 = list(input("Enter a string "))
list2 = list1[::2]
print(list2)

How to create lists from every item of an existing list in python 2.7.11?

I am trying to generate lists from the elements of a list in python.
For example: there is a list with the following information:
list=['AB4', 'AB3','AC3', 'BC4', 'BC5']
This is the exact format of the elements of the list.
I suppouse to create list for every element, separate for the letters (considering both letters as one block) and separate for the numbers, that will contain the missing character from their string. Here is what I mean:
AB:['4', '3']
AC:['3']
BC:['4', '5']
4:['AB', 'BC']
3:['AB', 'AC']
5:['BC']
These are the lists that I should generate from the original list. There is no limitation to the elements of the original list, and their format is exactly like in the example "two letters and a number".
Thank you in advance.

You can use regexes (the re module) and a defaultdict to accomplish this. The following will work for arbitrary lengths of the non-digit/digit parts of your input strings:
import re
from collections import defaultdict
def str_dig(s): # str_dig('ABC345') -> ('ABC', '345')
return re.match('([^\d]+)(\d+)', s).groups()
lst=['AB4', 'AB3','AC3', 'BC4', 'BC5'] # do NOT shadow list!
d = defaultdict(list)
for x, y in map(str_dig, lst): # map applies the str_dig function to all in lst
d[x].append(y)
d[y].append(x)
# d['AB']: ['4', '3'], d['3']: ['AB', 'AC']

This will do it:
from collections import defaultdict
l=['AB4', 'AB3','AC3', 'BC4', 'BC5']
result=defaultdict(list)
for item in l:
#If you want numbers to be numbers and not strings replace item[2:] with int(item[2:])
result[item[:2]].append(item[2:])
result[item[2:]].append(item[:2])
And you can use this to print it just as you want:
import pprint
pp = pprint.PrettyPrinter()
pp.pprint(result)
output:
{'3': ['AB', 'AC'],
'4': ['AB', 'BC'],
'5': ['BC'],
'AB': ['4', '3'],
'AC': ['3'],
'BC': ['4', '5']}

How about this,
import itertools
import operator
l = ['AB4', 'AB3','AC3', 'BC4', 'BC5']
lists = [(s[:2], s[2]) for s in l] # [('AB', '4'), ('AB', '3'), ('AC', '3'), ('BC', '4'), ('BC', '5')]
results = dict()
for name, group in itertools.groupby(sorted(lists, key=operator.itemgetter(0)), key=operator.itemgetter(0)):
results[name] = map(operator.itemgetter(1), group)
for name, group in itertools.groupby(sorted(lists, key=operator.itemgetter(1)), key=operator.itemgetter(1)):
results[name] = map(operator.itemgetter(0), group)
print(results)
# Output
{ 'AC': ['3'],
'AB': ['4', '3'],
'BC': ['4', '5'],
'3': ['AB', 'AC'],
'5': ['BC'],
'4': ['AB', 'BC']}

Finding overlapping sequence with regular expressions with Python

I'm trying to extract numbers and both previous and following characters (excluding digits and whitespaces) of a string. The expected return of the function is a list of tuples, with each tuple having the shape:
(previous_sequence, number, next_sequence)
For example:
string = '200gr T34S'
my_func(string)
>>[('', '200', 'gr'), ('T', '34', 'S')]
My first iteration was to use:
def my_func(string):
res_obj = re.findall(r'([^\d\s]+)?(\d+)([^\d\s]+)?', string)
But this function doesn't do what I expect when I pass a string like '2AB3' I would like to output [('','2','AB'), ('AB','3','')] and instead, it is showing [('','2','AB'), ('','3','')], because 'AB' is part of the previous output.
How could I fix this?

Since there is no overlapping numbers, a single trailing
assertion should be all you need.
Something like ([^\d\s]+)?(\d+)(?=([^\d\s]+)?)
This ([^\d\s]*)(\d+)(?=([^\d\s]*)) if you care about
the difference between NULL and the empty string.

Instead of modifier + and ? you can simply use * :
>>> re.findall(r'([^\d\s]*)(\d+)([^\d\s]*)',string)
[('', '200', 'gr'), ('T', '34', 'S')]
But if you mean to match the overlapped strings you can use a positive look ahead to fine all the overlapped matches :
>>> re.findall(r'(?=([^\d\s]*)(\d+)([^\d\s]*))','2AB3')
[('', '2', 'AB'), ('AB', '3', ''), ('B', '3', ''), ('', '3', '')]

Another way can be using regex and functions!
import re
#'200gr T34S' '2AB3'
def s(x):
tmp=[]
d = re.split(r'\s+|(\d+)',x)
d = ['' if v is None else v for v in d] #remove None
t_ = [i for i in d if len(i)>0]
digits = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
nms = [i for i in t_ if i[0] in digits]
for i in nms:
if d.index(i)==0:
tmp.append(('',i,d[d.index(i)+1]))
elif d.index(i)==len(d):
tmp.append((d[d.index(i)-1],i,''))
else:
tmp.append((d[d.index(i)-1],i,d[d.index(i)+1]))
return tmp
print s('2AB3')
Prints-
[('', '2', 'AB'), ('AB', '3', '')]

Python max with same number of instances

I have a list:
hello = ['1', '1', '2', '1', '2', '2', '7']
I wanted to display the most common element of the list, so I used:
m = max(set(hello), key=hello.count)
However, I realised that there could be two elements of the list that occur the same frequency, like the 1's and 2's in the list above. Max only outputs the first instance of a maximum frequency element.
What kind of command could check a list to see if two elements both have the maximum number of instances, and if so, output them both? I am at a loss here.

Using an approach similar to your current, you would first find the maximum count and then look for every item with that count:
>>> m = max(map(hello.count, hello))
>>> set(x for x in hello if hello.count(x) == m)
set(['1', '2'])
Alternatively, you can use the nice Counter class, which can be used to efficiently, well, count stuff:
>>> hello = ['1', '1', '2', '1', '2', '2', '7']
>>> from collections import Counter
>>> c = Counter(hello)
>>> c
Counter({'1': 3, '2': 3, '7': 1})
>>> common = c.most_common()
>>> common
[('1', 3), ('2', 3), ('7', 1)]
Then you can use a list comprehension to get all the items that have the maximum count:
>>> set(x for x, count in common if count == common[0][1])
set(['1', '2'])

Edit: Changed solution
>>> from collections import Counter
>>> from itertools import groupby
>>> hello = ['1', '1', '2', '1', '2', '2', '7']
>>> max_count, max_nums = next(groupby(Counter(hello).most_common(),
lambda x: x[1]))
>>> print [num for num, count in max_nums]
['1', '2']

from collections import Counter
def myFunction(myDict):
myMax = 0 # Keep track of the max frequence
myResult = [] # A list for return
for key in myDict:
print('The key is', key, ', The count is', myDict[key])
print('My max is:', myMax)
# Finding out the max frequence
if myDict[key] >= myMax:
if myDict[key] == myMax:
myMax = myDict[key]
myResult.append(key)
# Case when it is greater than, we will delete and append
else:
myMax = myDict[key]
del myResult[:]
myResult.append(key)
return myResult
foo = ['1', '1', '5', '2', '1', '6', '7', '10', '2', '2']
myCount = Counter(foo)
print(myCount)
print(myFunction(myCount))
Output:
The list: ['1', '1', '5', '2', '1', '6', '7', '10', '2', '2']
Counter({'1': 3, '2': 3, '10': 1, '5': 1, '7': 1, '6': 1})
The key is 10 , The count is 1
My max is: 0
The key is 1 , The count is 3
My max is: 1
The key is 2 , The count is 3
My max is: 3
The key is 5 , The count is 1
My max is: 3
The key is 7 , The count is 1
My max is: 3
The key is 6 , The count is 1
My max is: 3
['1', '2']
I wrote this simple program, I think it might also work. I was not aware of the most_common() function until I do a search. I think this will return as many most frequent element there is, it works by comparing the max frequent element, when I see a more frequent element, it will delete the result list, and append it once; or if it is the same frequency, it simply append to it. And keep going until the whole Counter is iterated through.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using python 3.6 to slice substring with same char [duplicate] - python

Use re.finditer(): >>> s='111234' >>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)] ['111', '2', '3', '4']

Try this one: s = '111234' l = re.findall(r'((.)\2*)', s) ## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l ## now I am keeping only the first value from the tuple of each list lst = [x[0] for x in l] print lst output: ['111', '2', '3', '4']

If you don't want to use any libraries then here's the code: s = "AACBCAAB" L = [] temp = s[0] for i in range(1,len(s)): if s[i] == s[i-1]: temp += s[i] else: L.append(temp) temp = s[i] if i == len(s)-1: L.append(temp) print(L) Output: ['AA', 'C', 'B', 'C', 'AA', 'B']

Related

Split list in python when same values occurs into a list of sublists

Code works only sometimes for removing odd or even index items

How to create lists from every item of an existing list in python 2.7.11?

Finding overlapping sequence with regular expressions with Python

Python max with same number of instances

Categories

Resources