Trying to sort two combined strings alphabetically without duplicates - python

Challenge: Take 2 strings s1 and s2 including only letters from a to z. Return a new sorted string, the longest possible, containing distinct letters - each taken only once - coming from s1 or s2.
# Examples
a = "xyaabbbccccdefww"
b = "xxxxyyyyabklmopq"
assert longest(a, b) == "abcdefklmopqwxy"
a = "abcdefghijklmnopqrstuvwxyz"
assert longest(a, a) == "abcdefghijklmnopqrstuvwxyz"
So I am just starting to learn, but so far I have this:
def longest(a1, a2):
for letter in max(a1, a2):
return ''.join(sorted(a1+a2))
which returns all the letters but I am trying to filter out the duplicates.
This is my first time on stack overflow so please forgive anything I did wrong. I am trying to figure all this out.
I also do not know how to indent in the code section if anyone could help with that.

You have two options here. The first is the answer you want and the second is an alternative method
To filter out duplicates, you can make a blank string, and then go through the returned string. For each character, if the character is already in the string, move onto the next otherwise add it
out = ""
for i in returned_string:
if i not in out:
out += i
return out
This would be empedded inside a function
The second option you have is to use Pythons sets. For what you want to do you can consider them as lists with no dulicate elements in them. You could simplify your function to
def longest(a: str, b: str):
return "".join(set(a).union(set(b)))
This makes a set from all the characters in a, and then another one with all the characters in b. It then "joins" them together (union) and you get another set. You can them join all the characters together in this final set to get your string. Hope this helps

Related

Getting triplet characters from strings

Define the get_triples_dict() function which is passed a string of text
as a parameter. The function first converts the parameter string to
lower case and then returns a dictionary with keys which are all the
unique consecutive three alphabetic characters from the text, and the
corresponding values are the number of times the three consecutive
alphabetic characters appear in the text. Use the isalpha() method to
check if a character is alphabetic or not. The dictionary should only
contain entries which occur more than once. After your dictionary has
been created and populated, you need to remove any key-value pairs which
have a corresponding value of 1.
I need help with coding this get_triples_dict function:
def get_triples_dict(text):
def test_get_triples_dict():
print("1.")
print_dict_in_key_order(get_triples_dict('super, duper'))
print("\n2.")
print_dict_in_key_order(get_triples_dict("ABC ABC ABC"))
print("\n3.")
print_dict_in_key_order(get_triples_dict("Sometimes the smallest things make more room in your heart"))
print("\n4.")
print_dict_in_key_order(get_triples_dict("My favourite painting is the painting i did of my dog in that painting in my den"))
I am not going to complete your assignment for you, but below is a good start to what you need to do. I believe you can write the code that sorts the words and prints out the dictionary. Make sure to study each line and then once you get the general idea, write your own version.
def get_triples_dict(text):
d = dict()
text = text.lower().replace(' ', '') # set to lowercase and remove spaces
for i in range(len(text) - 2): # stops early to prevent index out of bounds exception
bit = text[i: i + 3] # characters in groups of 3
if all(c.isalpha() for c in bit): # all characters must be alphabetic
if not bit in d: # if there's no entry
d[bit] = 0
d[bit] += 1
copy = d.copy() # we cannot remove items from a list we are looping over (concurrent modification exception)
for key, value in copy.items():
if value == 1: # remove items with counts of 1
d.pop(key)
return d

Add a character to a string in multiple positions in Python 3

Python beginner here, sorry if this is a dumb question.
So I have a long string, and I need to add a character in very specific areas of the strings. For example, a | after character number 23, 912, and 1200. I read this Add string in a certain position in Python, but it only works for adding one character.
Also, the solution needs to be expandable, not just do it 3 times. The code I'm making can have lots of different locations with where I want the character to be.
With reference to the link that you posted Add string in a certain position in Python;
If you would like to repeat the operation for different values, you could create a list containing all index positions where you would like your | character to be inserted.
For example,
>>> l = [1, 3, 4]
>>> s = "abcdef"
>>> for i in l:
>>> s = s[:i] + "|" + s[i:] # as suggested in your link
>>> s
'a|b||cdef'
This will allow you to repeat the process for the set of values that you provide in the list. You could also define a function to assist in this, which I could explain if this method is insufficient!
Note, however, that this will insert the character relative to the current iteration. That is, in this example, after adding the | at position 1, the next insert position, 3, is different from what it was before the first insert. You could avoid this (if you want) by including a counter variable to offset all the index positions by the number of inserts that have been executed (will require initial list to be ordered).
Not so good at python, hope I can help
According to that site you went to, you can make a while loop to solve the problem
The code should look something like this
def insert_dash(string, index, addin):
return string[:index] + addin + string[index:]
alldone = False
string = input("String: ")
index = " "
while index:
index = input("Index: ")
addin = input("Add into: ")
string = insert_dash(string, index, addin)
Hope it helps!
PS: I have NOT tried the code, but I think it will work

Swapping pairs of characters in a string

Okay, I'm really new to Python and have no idea how to do this:
I need to take a string, say 'ABAB__AB', convert it to a list, and then take the leading index of the pair I want to move and swap that pair with the __. I think the output should look something like this:
move_chars('ABAB__AB', 0)
'__ABABAB'
and another example:
move_chars('__ABABAB', 3)
'BAA__BAB'
Honestly have no idea how to do it.
Python strings are immutable, so you can't really modify a string. Instead, you make a new string.
If you want to be able to modify individual characters in a string, you can convert it to a list of characters, work on it, then join the list back into a string.
chars = list(str)
# work on the list of characters
# for example swap first two
chars[0], chars[1] = chars[1], chars[0]
return ''.join(chars)
I think this should go to the comment section, but I can't comment because of lack of reputation, so...
You'll probably want to stick with list index swapping, rather than using .pop() and .append(). .pop() can remove elements from arbitrary index, but only one at once, and .append() can only add to the end of the list. So they're quite limited, and it would complicate your code to use them in this kind of problems.
So, well, better stick with swapping with index.
The trick is to use list slicing to move parts of the string.
def move_chars(s, index):
to_index = s.find('__') # index of destination underscores
chars = list(s) # make mutable list
to_move = chars[index:index+2] # grab chars to move
chars[index:index+2] = '__' # replace with underscores
chars[to_index:to_index+2] = to_move # replace underscores with chars
return ''.join(chars) # stitch it all back together
print(move_chars('ABAB__AB', 0))
print(move_chars('__ABABAB', 3))

Breaking 1 String into 2 Strings based on special characters using python

I am working with python and I am new to it. I am looking for a way to take a string and split it into two smaller strings. An example of the string is below
wholeString = '102..109'
And what I am trying to get is:
a = '102'
b = '109'
The information will always be separated by two periods like shown above, but the number of characters before and after can range anywhere from 1 - 10 characters in length. I am writing a loop that counts characters before and after the periods and then makes a slice based on those counts, but I was wondering if there was a more elegant way that someone knew about.
Thanks!
Try this:
a, b = wholeString.split('..')
It'll put each value into the corresponding variables.
Look at the string.split method.
split_up = [s.strip() for s in wholeString.split("..")]
This code will also strip off leading and trailing whitespace so you are just left with the values you are looking for. split_up will be a list of these values.

Check if string in strings

I have a huge list containing many strings like:
['xxxx','xx','xy','yy','x',......]
Now I am looking for an efficient way that removes all strings that are present within another string. For example 'xx' 'x' fit in 'xxxx'.
As the dataset is huge, I was wondering if there is an efficient method for this beside
if a in b:
The complete code: With maybe some optimization parts:
for x in range(len(taxlistcomplete)):
if delete == True:
x = x - 1
delete = False
for y in range(len(taxlistcomplete)):
if taxlistcomplete[x] in taxlistcomplete[y]:
if x != y:
print x,y
print taxlistcomplete[x]
del taxlistcomplete[x]
delete = True
break
print x, len(taxlistcomplete)
An updated version of the code:
for x in enumerate(taxlistcomplete):
if delete == True:
#If element is removed, I need to step 1 back and continue looping.....
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
print x[1],y[1]
print taxlistcomplete[x]
del taxlistcomplete[x[0]]
delete = True
break
print x, len(taxlistcomplete)
Now implemented with the enumerate, only now I am wondering if this is more efficient and howto implement the delete step so I have less to search in as well.
Just a short thought...
Basically what I would like to see...
if element does not match any other elements in list write this one to a file.
Thus if 'xxxxx' not in 'xx','xy','wfirfj',etc... print/save
A new simple version as I dont think I can optimize it much further anyway...
print 'comparison'
file = open('output.txt','a')
for x in enumerate(taxlistcomplete):
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
taxlistcomplete[x[0]] = ''
delete = True
break
if delete == False:
file.write(str(x))
x in <string> is fast, but checking each string against all other strings in the list will take O(n^2) time. Instead of shaving a few cycles by optimizing the comparison, you can achieve huge savings by using a different data structure so that you can check each string in just one lookup: For two thousand strings, that's two thousand checks instead of four million.
There's a data structure called a "prefix tree" (or trie) that allows you to very quickly check whether a string is a prefix of some string you've seen before. Google it. Since you're also interested in strings that occur in the middle of another string x, index all substrings of the form x, x[1:], x[2:], x[3:], etc. (So: only n substrings for a string of length n). That is, you index substrings that start in position 0, 1, 2, etc. and continue to the end of the string. That way you can just check if a new string is an initial part of something in your index.
You can then solve your problem in O(n) time like this:
Order your strings in order of decreasing length. This ensures that no string could be a substring of something you haven't seen yet. Since you only care about length, you can do a bucket sort in O(n) time.
Start with an empty prefix tree and loop over your ordered list of strings. For each string x, use your prefix tree to check whether it is a substring of a string you've seen before. If not, add its substrings x, x[1:], x[2:] etc. to the prefix tree.
Deleting in the middle of a long list is very expensive, so you'll get a further speedup if you collect the strings you want to keep into a new list (the actual string is not copied, just the reference). When you're done, delete the original list and the prefix tree.
If that's too complicated for you, at least don't compare everything with everything. Sort your strings by size (in decreasing order), and only check each string against the ones that have come before it. This will give you a 50% speedup with very little effort. And do make a new list (or write to a file immediately) instead of deleting in place.
Here is a simple approach, assuming you can identify a character (I will use '$' in my example) that is guaranteed not to be in any of the original strings:
result = ''
for substring in taxlistcomplete:
if substring not in result: result += '$' + substring
taxlistcomplete = result.split('$')
This leverages Python's internal optimizations for substring searching by just making one big string to substring-search :)
Here is my suggestion. First I sort the elements by length. Because obviously the shorter the string is, the more likely it is to be a substring of another string. Then I have two for loops, where I run through the list and remove every element from the list where el is a substring. Note that the first for loop only passes each element once.
By sortitng the list first, we destroy the order of elements in the list. So if the order is important, then you can't use this solution.
Edit. I assume there are no identical elements in the list. So that when el == el2, it's because its the same element.
a = ["xyy", "xx", "zy", "yy", "x"]
a.sort(key=len)
for el in a:
for el2 in a:
if el in el2 and el != el2:
a.remove(el2)
Using a list comprehension -- note in -- is the fastest and more Pythonic way of solving your problem:
[element for element in arr if 'xx' in element]

Categories

Resources