python - looping through values in a string - python

I'm trying to get numerous values out of a pretty complex string that looks like this -
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
These are the values I need to scan for -
list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
My intention is to get the 3 numbers after each string so in the example of HighPriority I would get [0, 74, 74] which I can then do something with each item.
I've used the below but it doesn't account for when the end of the string isn't a comma.
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
for l in list:
print l
print find_between( s, l + ':', ',' ).split(':')

Edit, if you really want to avoid regexes, your approach works with a minor tweak (I renamed list to l to avoid shadowing the built in type):
from itertools import takewhile
from string import digits
def find_between(s, first):
try:
start = s.index(first) + len(first)
# Keep taking the next character while it's either a ':' or a digit
# You can also just cast this into a list and forget about joining and later splitting.
# Also, consider storing ':'+digits in a variable to avoid recreating it all the time
return ''.join(takewhile(lambda char: char in ':'+digits, s[start:]))
except ValueError:
return ""
for _ in l:
print _
print find_between(s, _ + ':').split(':')
This prints:
Compiler
['0', '0', '0']
HighPriority
['0', '74', '74']
Default
['6', '1872', '1874']
LowPriority
['0', '2', '2']
Special
['0', '2', '2']
Event
['0', '0', '0']
CommHigh
['0', '1134', '1152']
CommDefault
['0', '4', '4']
However, this really is a task for regex, and you should try to get to know the basics.
import re
def find_between(s, word):
# Search for your (word followed by ((:a_digit) repeated three times))
x = re.search("(%s(:\d+){3})" % word, s)
return x.groups()[0]
for word in l:
print find_between(s, word).split(':', 1)[-1].split(':')
This prints
['0', '0', '0']
['0', '74', '74']
['6', '1872', '1874']
['0', '2', '2']
['0', '2', '2']
['0', '0', '0']
['0', '1134', '1152']
['0', '4', '4']

check this:
import re
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']
data = []
for x in search:
data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s))
data = [map(lambda x: x.split(':'), x) for x in data] # remove :
data = [x[0] for x in data] # remove unnecessary []
data = [map(int,x) for x in data] # convert to int
print data
>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]

This will get you all the groups, provided the string is always well formed:
re.findall('(\w+):(\d+):(\d+):(\d+)', s)
It also gets the time, which you can easily remove from the list.
Or you can use a dictionary comprehension to organize the items:
matches = re.findall('(\w+):(\d+:\d+:\d+)', s)
my_dict = {k : v.split(':') for k, v in matches[1:]}
I used matches[1:] here to get rid of the spurious match. You can do that if you know it will always be there.

Related

How to seperate decimal numbers written back to back

How would it be possible to seperate a string of values (in my case, only corresponding to roman numeral values) into elements of a list?
'10010010010100511' -> [100, 100, 100, 10, 100, 5, 1, 1,]
I want to create something that goes like:
if it is a zero add it to side
if it's not a zero create a new element for it
You were on the right track with zeros, you have to notice that every base roman numeral is either a 1 or 5 followed by some amount of zeros. You can represent that as a very simple regex.
import re
s = '10010010010100511'
pattern = "[1|5]0*"
matches = re.finditer(pattern=pattern, string=s)
l = [match[0] for match in matches]
print(l) # ['100', '100', '100', '10', '100', '5', '1', '1']
If for some reason you don't want to use regex, you can simply iterate over each character using the same principle:
string = '10010010010100511'
lst = []
for char in string:
if char in ['1', '5']:
lst.append(char)
elif char == '0':
lst[-1] += '0'
print(lst) # ['100', '100', '100', '10', '100', '5', '1', '1']
Code:
s='10010010010100511'
d=[]
c=0 #introducing this new varible just to know from where
for i in range(len(s)): ##Here basic idea is to check next value
if i+1 <len(s):
if s[i+1]!='0': #if NEXT value is not zero thn
d.append(s[c:i+1]) #get string from - to and add in d list
c=len(s[:i+1])
else:
d.append(s[-1])
d
Output:
['100', '100', '100', '10', '100', '5', '1', '1']

List, Convert Str List, replace values, print new str List. Should return ['*', '2', '3', '*', '5']

can any of you help me to identify what am I doing wrong? I know this might be simple but I
am new to programming and Python. I need to return ['*', '2', '3', '*', '5']. Instead of that I am
getting much more values within the list.
Test to replace values in a List
repl_list = [1, 2, 3, 1, 5]
str_repl_list = str(repl_list)
# print('This is the list to replace: ' + str_repl_list)
# print(type(str_repl_list[0]))
new_str_list = []`enter code here`
print(new_str_list)
for item in str_repl_list:
replacement = item.replace('1', '*')
new_str_list.append(replacement)
for index, char in enumerate(new_str_list):
print(index, char) # This is to identify what information is being taken as par of the new list
when you do a str(repl_list), the outpt is a string '[1, 2, 3, 1, 5]', not a list of strings, so if you iterate through str_repl_list you will get
1
,
2
,
3
,
1
,
5
]
Instead you can avoid that step and convert each item to string inside your for loop (str(item))
repl_list = [1, 2, 3, 1, 5]
new_str_list = []
for item in repl_list:
replacement = str(item).replace('1', '*')
new_str_list.append(replacement)
>>> print(new_str_list)
>>> ['*', '2', '3', '*', '5']
you can also use list coprehension
>>> print(['*' if x == 1 else str(x) for x in repl_list])
>>> ['*', '2', '3', '*', '5']
Instead of converting each item to string, you are converting the entire list into a string. Instead try this list comprehension:
str_repl_list = [str(i) for i in str_list]
This will go through each item and convert it into a string, then store it in the new list.
since you are appending each element in the list new_str_list, to see the desired result you need to print them together, so you need to join them in a string and add all element in the string.
so to see the desired result, you just need to add all elment together
which can be done as
str_list_final = ''.join(new_str_list)

Separate each item of a list in an specific way

I have an input, which is a tuple of strings, encoded in a1z26 cipher: numbers from 1 to 26 represent alphabet letters, hyphens represent same word letters and spaces represent an space between words.
For example:
8-9 20-8-5-18-5 should translate to 'hi there'
Let's say that the last example is a tuple in a var called string
string = ('8-9','20-8-5-18-5')
The first thing I find logical is convert the tuple into a list using
string = list(string)
so now
string = ['8-9','20-8-5-18-5']
The problem now is that when I iterate over the list to compare it with a dictionary which has the translated values, double digit numbers are treated as one, so instead of, for example, translating '20' it translate '2' and then '0', resulting in the string saying 'hi bheahe' (2 =b, 1 = a and 8 = h)
so I need a way to convert the list above to the following
list
['8','-','9',' ','20','-','8','-','5','-','18','-','5',]
I've already tried various codes using
list(),
join() and
split()
But it ends up giving me the same problem.
To sum up, I need to make any given list (converted from the input tuple) into a list of characters that takes into account double digit numbers, spaces and hyphens altogether
This is what I've got so far. (The last I wrote) The input is further up in the code (string)
a1z26 = {'1':'A', '2':'B', '3':'C', '4':'D', '5':'E', '6':'F', '7':'G', '8':'H', '9':'I', '10':'J', '11':'K', '12':'L', '13':'M', '14':'N', '15':'O', '16':'P', '17':'Q', '18':'R', '19':'S', '20':'T', '21':'U', '22':'V', '23':'W', '24':'X', '25':'Y', '26':'Z', '-':'', ' ' : ' ', ', ' : ' '}
translation = ""
code = list(string)
numbersarray1 = code
numbersarray2 = ', '.join(numbersarray1)
for char in numbersarray2:
if char in a1z26:
translation += a1z26[char]
There's no need to convert the tuple to a list. Tuples are iterable too.
I don't think the list you name is what you actually want. You probably want a 2d iterable (not necessarily a list, as you'll see below we can do this in one pass without generating an intermediary list), where each item corresponds to a word and is a list of the character numbers:
[[8, 9], [20, 8, 5, 18, 5]]
From this, you can convert each number to a letter, join the letters together to form the words, then join the words with spaces.
To do this, you need to pass a parameter to split, to tell it how to split your input string. You can achieve all of this with a one liner:
plaintext = ' '.join(''.join(num_to_letter[int(num)] for num in word.split('-'))
for word in ciphertext.split(' '))
This does exactly the splitting procedure as described above, and then for each number looks into the dict num_to_letter to do the conversion.
Note that you don't even need this dict. You can use the fact that A-Z in unicode is contiguous so to convert 1-26 to A-Z you can do chr(ord('A') + num - 1).
You don't really need hypens, am I right?
I suggest you the following approach:
a = '- -'.join(string).split('-')
Now a is ['8', '9', ' ', '20', '8', '5', '18', '5']
You can then convert each number to the proper character using your dictionary
b = ''.join([a1z26[i] for i in a])
Now b is equal to HI THERE
I think, it's better to apply regular expressions there.
Example:
import re
...
src = ('8-9', '20-8-5-18-5')
res = [match for tmp in src for match in re.findall(r"([0-9]+|[^0-9]+)", tmp + " ")][:-1]
print(res)
Result:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
using regex here is solution
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
print(data)
output
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
if you want to get hi there from the input string , here is a method (i am assuming all character are in uppercase):
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
new_str =''
for i in range(len(data)):
if data[i].isdigit():
new_str+=chr(int(data[i])+64)
else:
new_str+=data[i]
result = new_str.replace('-','')
output:
HI THERE
You could also try this itertools solution:
from itertools import chain
from itertools import zip_longest
def separate_list(lst, delim, sep=" "):
result = []
for x in lst:
chars = x.split(delim) # 1
pairs = zip_longest(chars, [delim] * (len(chars) - 1), fillvalue=sep) # 2, 3
result.extend(list(chain.from_iterable(pairs))) # 4
return result[:-1] # 5
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
Output:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
Explanation of above code:
Split each string by delimiter '-'.
Create interspersing delimiters.
Create pairs of characters and separators with itertools.zip_longest.
Extend flattened pairs to result list with itertools.chain.from_iterable.
Remove trailing ' ' from result list added.
You could also create your own intersperse generator function and apply it twice:
from itertools import chain
def intersperse(iterable, delim):
it = iter(iterable)
yield next(it)
for x in it:
yield delim
yield x
def separate_list(lst, delim, sep=" "):
return list(
chain.from_iterable(
intersperse(
(intersperse(x.split(delim), delim=delim) for x in lst), delim=[sep]
)
)
)
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
# ['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']

Generated List consists of [Apparantly] unaccounted whitespaces in this code snippet

For a routine question on python programming, I was asked to generate a list of string sliced from one string (let's call it as target_string), with the length of each sliced string increasing from 1 to the length of string.
For example, if target_string is '123', I would have to generate the list like this : ['1', '2', '3', '12', '23', '123'].
For this, I wrote a code snippet that was something like this:
target_string = raw_input("Target String:")
length = len(target_string)
number_list = []
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
print(number_list)
On execution of this the result was:
Target String:12345
['', '', '', '', '', '1', '2', '3', '4', '5', '12', '23', '34', '45', '123', '234', '345', '1234', '2345']
The first thing I noticed is that the list consists of whitespaces as elements, and the number of whitespaces is equal to the length of the target_string. Why does this happen? Any kind of clarification and help is welcome.
P.S: I have a temperory workaround to generate the list that I need:
target_string = raw_input("Target String:")
length = len(target_string)
number_list = []
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
number_list.append(target_string)
del number_list[0:length]
target_list = [int(i) for i in number_list]
print(target_list)
Also feel free to suggest any changes or modifications to this, or any approach you would feel is more efficient and pythonic. Thanks in advance.
Edit: This is implemented in Pycharm, on Windows 10 , using Python 2.7, but please feel free to give the solutions in both the Python 2.7 and 3.X versions.
You can use itertools.combinations, then get the ones that the indexes are continuously adding 1, use ''.join(..) for converting it o a string and add it using .extend(..):
Python 2.7:
import itertools
target_string = raw_input("Target String:")
l=[]
for i in range(1,len(target_string)+1):
l.extend([''.join(i) for i in itertools.combinations(target_string,i) if all(int(y)-int(x)==1 for x, y in zip(i, i[1:]))])
print l
Output:
['1', '2', '3', '12', '23', '123']
Python 3.x:
import itertools
target_string = input("Target String:")
l=[]
for i in range(1,len(target_string)+1):
l.extend([''.join(i) for i in itertools.combinations(target_string,i) if all(int(y)-int(x)==1 for x, y in zip(i, i[1:]))])
print(l)
Output:
['1', '2', '3', '12', '23', '123']
Explaining why you got whitespaces in your code snippet.
Have a look at the loop part:
for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[j:j + i])
Here, both i and j gets initiated with 0.
So when we decode it, it comes like:
i = 0:
j=0:
0+0 < length
number_list.append(for i in range(length):
for j in range(length):
if j + i <= length:
number_list.append(target_string[0:0 + 0])) --> ['']
and so on.....

Filtering out a generator

Whats the best way to filter out some subsets from a generator. For example I have a string "1023" and want to produce all possible combinations of each of the digits. All combinations would be:
['1', '0', '2', '3']
['1', '0', '23']
['1', '02', '3']
['1', '023']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I am not interested in a subset that contains a leading 0 on any of the items, so the valid ones are:
['1', '0', '2', '3']
['1', '0', '23']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I have two questions.
1) If using a generator, whats the best way to filter out the ones with leading zeroes. Currently, I generate all combinations then loop through it afterwards and only continuing if the subset is valid. For simplicity I am only printing the subset in the sample code. Assuming the generator that was created is very long or if it constains a lot of invalid subsets, its almost a waste to loop through the entire generator. Is there a way to stop the generator when it sees an invalid item (one with leading zero) then filter it off 'allCombinations'
2) If the above doesn't exist, whats a better way to generate these combinations (disregarding combinations with leading zeroes).
Code using a generator:
import itertools
def isValid(subset): ## DIGITS WITH LEADING 0 IS NOT VALID
valid = True
for num in subset:
if num[0] == '0' and len(num) > 1:
valid = False
break
return valid
def get_combinations(source, comb):
res = ""
for x, action in zip(source, comb + (0,)):
res += x
if action == 0:
yield res
res = ""
digits = "1023"
allCombinations = [list(get_combinations(digits, c)) for c in itertools.product((0, 1), repeat=len(digits) - 1)]
for subset in allCombinations: ## LOOPS THROUGH THE ENTIRE GENERATOR
if isValid(subset):
print(subset)
Filtering for an easy and obvious condition like "no leading zeros", it can be more efficiently done at the combination building level.
def generate_pieces(input_string, predicate):
if input_string:
if predicate(input_string):
yield [input_string]
for item_size in range(1, len(input_string)+1):
item = input_string[:item_size]
if not predicate(item):
continue
rest = input_string[item_size:]
for rest_piece in generate_pieces(rest, predicate):
yield [item] + rest_piece
Generating every combination of cuts, so long it's not even funny:
>>> list(generate_pieces('10002', lambda x: True))
[['10002'], ['1', '0002'], ['1', '0', '002'], ['1', '0', '0', '02'], ['1', '0', '0', '0', '2'], ['1', '0', '00', '2'], ['1', '00', '02'], ['1', '00', '0', '2'], ['1', '000', '2'], ['10', '002'], ['10', '0', '02'], ['10', '0', '0', '2'], ['10', '00', '2'], ['100', '02'], ['100', '0', '2'], ['1000', '2']]
Only those where no fragment has leading zeros:
>>> list(generate_pieces('10002', lambda x: not x.startswith('0')))
[['10002'], ['1000', '2']]
Substrings that start with a zero were never considered for the recursive step.
One common solution is to try filtering just before using yield. I have given you an example of filtering just before yield:
import itertools
def my_gen(my_string):
# Create combinations
for length in range(len(my_string)):
for my_tuple in itertools.combinations(my_string, length+1):
# This is the string you would like to output
output_string = "".join(my_tuple)
# filter here:
if output_string[0] != '0':
yield output_string
my_string = '1023'
print(list(my_gen(my_string)))
EDIT: Added in a generator comprehension alternative
import itertools
my_string = '1023'
my_gen = ("".join(my_tuple)[0] for length in range(len(my_string))
for my_tuple in itertools.combinations(my_string, length+1)
if "".join(my_tuple)[0] != '0')

Categories

Resources