Python String Manipulation - finding unique numbers in many strings

Python String Manipulation - finding unique numbers in many strings - python

I am looping through variables of the form - V15_1_1. The middle and last number in this string changes for each variable. I want to create a string of all the unique middle numbers.
For example, I may have V15_1_1, V15_2_3, V15_2_6, V15_12_17,V15_12_3 which would return a text string of '1,2,12'

Here you go:
my_list = ['V15_1_1', 'V15_2_3', 'V15_2_6', 'V15_12_17','V15_12_3']
def unique_middle_values(input_list):
return set([i.split('_')[1] for i in input_list])
unique_middle_values(my_list)
{'1', '12', '2'}

Split each string into the numbers you want, for instance by splitting on the _ character and removing any non-numeric characters from each substring. Then you have them in order from left to right, and add each of them to one of three sets for the left, middle and right numbers. Sets can only have the same entry once. You can then print the contents of the sets to get what you want. If needed they can be sorted first. All of these things I've described can be individually googled.

You can use set().
first_set = set()
second_set = set()
for item in list:
v, num1, num2 = item.split('_')
first_set.add(num1)
second_set.add(num2)
print ', '.join(s for e in first_set)
print ', '.join(s for e in second_set)

This might help
>>> def fun(var):
l = []
for a in var:
middle = a.split('_')[1]
if middle not in l:
l.append(middle)
return ','.join(l)
>>> fun(['V15_1_1', 'V15_2_3', 'V15_2_6', 'V15_12_17', 'V15_12_3'])
'1,2,12'

Related

Iterate through a list in python and delete characters after the second instance of a character from an element

Sorry, very new to python.
Essentially I have a long list of file names, some in the format NAME_XX123456 and others in the format NAME_XX123456_123456.
I am needing to lose everything from the second underscore and after in each element.
The below code only iterates through the first two elements though, and doesn't delete the remainder when it encounters a double underscore, just splits it.
sample_list=['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
shortlist=[]
item = "_"
count = 0
i=0
for i in range(0,len(sample_list)):
if(item in sample_list[i]):
count = count + 1
if(count == 2):
shortlist.append(sample_list[i].rpartition("_"))
i+=1
if (count == 1):
shortlist.append(sample_list[i])
i+=1
print(shortlist)

Here is a simple split join approach. We can split each input on underscore, and then join the first two elements together using underscore as the separator.
sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = ['_'.join(x.split('_')[0:2]) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']
You could also use regular expressions here:
sample_list = ['NAME_XX011024', 'NAME_XX011030_1234', 'NAME_XX011070', 'NAME_XX090119_15165']
output = [re.sub(r'([^_]+_[^_]+)_.*', r'\1', x) for x in sample_list]
print(output)
# ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070', 'NAME_XX090119']

You can simply use split method to split each item in the list using '_' and then join the first two parts of the split. Thus ignoring everything after the second underscore.
Try this:
res= []
for item in sample_list:
item_split = item.split('_')
res.append('_'.join(item_split[0:2])) # taking only the first two items
print(res) # ['NAME_XX011024', 'NAME_XX011030', 'NAME_XX011070','NAME_XX090119']

How do i make the program print specific letters in this specific format i give to it?

so i need to code a program which, for example if given the input 3[a]2[b], prints "aaabb" or when given 3[ab]2[c],prints "abababcc"(basicly prints that amount of that letter in the given order). i tried to use a for loop to iterate the first given input and then detect "[" letters in it so it'll know that to repeatedly print but i don't know how i can make it also understand where that string ends
also this is where i could get it to,which probably isnt too useful:
string=input()
string=string[::-1]
bulundu=6
for i in string:
if i!="]":
if i!="[":
lst.append(i)
if i=="[":
break

The approach I took is to remove the brackets, split the items into a list, then walk the list, and if the item is a number, add that many repeats of the next item to the result for output:
import re
data = "3[a]2[b]"
# Remove brackets and convert to a list
data = re.sub(r'[\[\]]', ' ', data).split()
result = []
for i, item in enumerate(data):
# If item is a number, print that many of the next item
if item.isdigit():
result.append(data[i+1] * int(item))
print(''.join(result))
# aaabb

A different approach, inspired by Subbu's use of re.findall. This approach finds all 'pairs' of numbers and letters using match groups, then multiplies them to produce the required text:
import re
data = "3[a]2[b]"
matches = re.findall('(\d+)\[([a-zA-Z]+)\]',data)
# [(3, 'a'), (2, 'b')]
for x in matches:
print(x[1] * int(x[0]), end='')
#aaabb

Lenghty and documented version using NO regex but simple string and list manipulation:
first split the input into parts that are numbers and texts
then recombinate them again
I opted to document with inline comments
This could be done like so:
# testcases are tuples of input and correct result
testcases = [ ("3[a]2[b]","aaabb"),
("3[ab]2[c]","abababcc"),
("5[12]6[c]","1212121212cccccc"),
("22[a]","a"*22)]
# now we use our algo for all those testcases
for inp,res in testcases:
split_inp = [] # list that takes the splitted values of the input
num = 0 # accumulator variable for more-then-1-digit numbers
in_text = False # bool that tells us if we are currently collecting letters
# go over all letters : O(n)
for c in inp:
# when a [ is reached our num is complete and we need to store it
# we collect all further letters until next ] in a list that we
# add at the end of your split_inp
if c == "[":
split_inp.append(num) # add the completed number
num = 0 # and reset it to 0
in_text = True # now in text
split_inp.append([]) # add a list to collect letters
# done collecting letters
elif c == "]":
in_text = False # no longer collecting, convert letters
split_inp[-1] = ''.join(split_inp[-1]) # to text
# between [ and ] ... simply add letter to list at end
elif in_text:
split_inp[-1].append(c) # add letter
# currently collecting numbers
else:
num *= 10 # increase current number by factor 10
num += int(c) # add newest number
print(repr(inp), split_inp, sep="\n") # debugging output for parsing part
# now we need to build the string from our parsed data
amount = 0
result = [] # intermediate list to join ['aaa','bb']
# iterate the list, if int remember it, it text, build composite
for part in split_inp:
if isinstance(part, int):
amount = part
else:
result.append(part*amount)
# join the parts
result = ''.join(result)
# check if all worked out
if result == res:
print("CORRECT: ", result + "\n")
else:
print (f"INCORRECT: should be '{res}' but is '{result}'\n")
Result:
'3[a]2[b]'
[3, 'a', 2, 'b']
CORRECT: aaabb
'3[ab]2[c]'
[3, 'ab', 2, 'c']
CORRECT: abababcc
'5[12]6[c]'
[5, '12', 6, 'c']
CORRECT: 1212121212cccccc
'22[a]'
[22, 'a']
CORRECT: aaaaaaaaaaaaaaaaaaaaaa
This will also handle cases of '5[12]' wich some of the other solutions wont.

You can capture both the number of repetitions n and the pattern to repeat v in one go using the described pattern. This essentially matches any sequence of digits - which is the first group we need to capture, reason why \d+ is between brackets (..) - followed by a [, followed by anything - this anything is the second pattern of interest, hence it is between backets (...) - which is then followed by a ].
findall will find all these matches in the passed line, then the first match - the number - will be cast to an int and used as a multiplier for the string pattern. The list of int(n) * v is then joined with an empty space. Malformed patterns may throw exceptions or return nothing.
Anyway, in code:
import re
pattern = re.compile("(\d+)\[(.*?)\]")
def func(x): return "".join([v*int(n) for n,v in pattern.findall(x)])
print(func("3[a]2[b]"))
print(func("3[ab]2[c]"))
OUTPUT
aaabb
abababcc
FOLLOW UP
Another solution which achieves the same result, without using regular expression (ok, not nice at all, I get it...):
def func(s): return "".join([int(x[0])*x[1] for x in map(lambda x:x.split("["), s.split("]")) if len(x) == 2])

I am not much more than a beginner and looking at the other answers, I thought understanding regex might be a challenge for a new contributor such as yourself since I myself haven't really dealt with regex.
The beginner friendly way to do this might be to loop through the input string and use string functions like isnumeric() and isalpha()
data = "3[a]2[b]"
chars = []
nums = []
substrings = []
for i, char in enumerate(data):
if char.isnumeric():
nums.append(char)
if char.isalpha():
chars.append(char)
for i, char in enumerate(chars):
substrings.append(char * int(nums[i]))
string = "".join(substrings)
print(string)
OUTPUT:
aaabb
And on trying different values for data:
data = "0[a]2[b]3[p]"
OUTPUT bbppp
data = "1[a]1[a]2[a]"
OUTPUT aaaa
NOTE: In case you're not familiar with the above functions, they are string functions, which are fairly self-explanatory. They are used as <your_string_here>.isalpha() which returns true if and only if the string is an alphabet (whitespace, numerics, and symbols return false
And, similarly for isnumeric()
For example,
"]".isnumeric() and "]".isalpha() return False
"a".isalpha() returns True
IF YOU NEED ANY CLARIFICATION ON A FUNCTION USED, PLEASE DO NOT HESITATE TO LEAVE A COMMENT

How can we remove word with repeated single character?

I am trying to remove word with single repeated characters using regex in python, for example :
good => good
gggggggg => g
What I have tried so far is following
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
Problem with above solution is that it changes good to god and I just want to remove words with single repeated characters.

A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^ and $ (inspired from #bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g

If you do not want to use a set in your method, this should do the trick:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
output:
good
abba
g
g
Explanations:
You compute the length of the string
you count the number of characters that are equal to the first one and you compare the count with the initial string length
depending on the result you return the first character or the whole string

You can use trim command:
take a look at this examples:
"ggggggg".Trim('g');
Update:
and for characters which are in the middle of the string use this function, thanks to this answer
in java:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
in python:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
but I think all of these answers does not match situation like aaaaabbbbbcda, this string has an a at the end of string which does not appear in the result (abcd). for this kind of situation use this functions which I wrote:
In:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
out:
['a', 'b', 'c', 'd', 'a']

Replacing strings, not characters without the use of .replace and joining the strings the characters

Question has been asked that is similar but all post on here refer to replacing single characters. I'm trying to replace a whole word in a string. I've replaced it but I cant print it with spaces in between.
Here is the function replace that replaces it:
def replace(a, b, c):
new = b.split()
result = ''
for x in new:
if x == a:
x = c
result +=x
print(' '.join(result))
Calling it with:
replace('dogs', 'I like dogs', 'kelvin')
My result is this:
i l i k e k e l v i n
What I'm looking for is:
I like kelvin

The issue here is that result is a string and when join is called it will take each character in result and join it on a space.
Instead, use a list , append to it (it's also faster than using += on strings) and print it out by unpacking it.
That is:
def replace(a, b, c):
new = b.split(' ')
result = []
for x in new:
if x == a:
x = c
result.append(x)
print(*result)
print(*result) will supply the elements of the result list as positional arguments to print which prints them out with a default white space separation.
"I like dogs".replace("dogs", "kelvin") can of course be used here but I'm pretty sure that defeats the point.

Substrings and space preserving method:
def replace(a, b, c):
# Find all indices where 'a' exists
xs = []
x = b.find(a)
while x != -1:
xs.append(x)
x = b.find(a, x+len(a))
# Use slice assignment (starting from the last index)
result = list(b)
for i in reversed(xs):
result[i:i+len(a)] = c
return ''.join(result)
>>> replace('dogs', 'I like dogs dogsdogs and hotdogs', 'kelvin')
'I like kelvin kelvinkelvin and hotkelvin'

Just make result a list, and the joining will work:
result = []
You are just generating one long string and join its chars.

How to convert numbers into strings in python? 1 -> 'one'

If i want a program where I let the user input a number (e.g. 1, 13, 4354) how can i get it to print (one, thirteen, four three five four) does that make sense? if it's two digit, print it as though it's joined (thirty-one) but if it's more than 2 just print them sepretly, one the same line joined with a space, I tried to do this with a dictionary, and I think it's possible, but i can't figure out how to do it?
l = input('Enter the number: ')
if len(l) > 2:
nums = {'1':'one',
'2':'two',
'3':'three',
'4':'four',
'5':'five',
'6':'six',
'7':'seven',
'8':'eight',
'9':'nine'}
elif len(l) == 2:
tens = {'10'}
for k, v in nums.items():
print(k, v)
This is obviously a wrong code, but I would like the finished result to look something like this? thanks in advance!

To access items from a dictionary, you can do dictionary[key]. The value is returned.
Let's say that my input is "8".
You can then do print nums[l] (inside your conditional statement), and this will return "eight".
Also, it's probably better if you create your dictionaries outside of your conditional structures to prevent NameErrors and so you can access both dictionaries anywhere.
If you have an input "324", then you can use a combination of str.join() and a list comprehension:
l = "324"
nums = {'1':'one',
'2':'two',
'3':'three',
'4':'four',
'5':'five',
'6':'six',
'7':'seven',
'8':'eight',
'9':'nine'}
print ' '.join(nums[i] for i in l)
Explanation:
[nums[i] for i in l] is the same as:
returned_list = []
for number in l:
returned_list.append(d[number])
str.join() joins every item in the list together, separated by a space. So ' '.join(['one', 'two', 'three']) returns 'one two three'

If you're doing this to learn, carry on. If you've got a plane to catch, you could try Pyparsing. This very exercise is covered in one of Pyparsing's examples:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Manipulation - finding unique numbers in many strings - python

I am looping through variables of the form - V15_1_1. The middle and last number in this string changes for each variable. I want to create a string of all the unique middle numbers. For example, I may have V15_1_1, V15_2_3, V15_2_6, V15_12_17,V15_12_3 which would return a text string of '1,2,12'

Here you go: my_list = ['V15_1_1', 'V15_2_3', 'V15_2_6', 'V15_12_17','V15_12_3'] def unique_middle_values(input_list): return set([i.split('_')[1] for i in input_list]) unique_middle_values(my_list) {'1', '12', '2'}

You can use set(). first_set = set() second_set = set() for item in list: v, num1, num2 = item.split('_') first_set.add(num1) second_set.add(num2) print ', '.join(s for e in first_set) print ', '.join(s for e in second_set)

This might help >>> def fun(var): l = [] for a in var: middle = a.split('_')[1] if middle not in l: l.append(middle) return ','.join(l) >>> fun(['V15_1_1', 'V15_2_3', 'V15_2_6', 'V15_12_17', 'V15_12_3']) '1,2,12'

Related

Iterate through a list in python and delete characters after the second instance of a character from an element

How do i make the program print specific letters in this specific format i give to it?

How can we remove word with repeated single character?

Replacing strings, not characters without the use of .replace and joining the strings the characters

How to convert numbers into strings in python? 1 -> 'one'

Categories

Resources