Split string and take only part of it (python) - python

QUESTION
I have a list of strings, let's call it input_list, and every string in this list is made of five words divided only by a "%" character, like
"<word1>%<word2>%<word3>%<word4>%<word5>"
My goal is, for every element of input_list to make a string made only by <word3> and <word4> divided by the "%" sign, like this "<word3>%<word4>", and create a new list made by these strings.
So for example, if:
input_list = ['the%quick%brown%fox%jumps', 'over%the%lazy%dog%and']
then the new list will look like this
new_list = ['brown%fox', 'lazy%dog']
IMPORTANT NOTES AND POSSIBLE ANSWERS
The length of each word is random, so I can't just use string slicing or guess in any way how <word3> and <word4> start.
A possible way to answer this would the following, but I want to know if there is a better and maybe (computationally) faster way, without having to create a new variable (current_list) and/or without having to consider/split the whole string (maybe using regex?)
input_list = ['the%quick%brown%fox%jumps', 'over%the%lazy%dog%and']
new_list = []
for element in input_list:
current_list = element.split('%')
final_element = [current_list[2], current_list[3]]
new_list.append(final_element)
EDIT:
I tried to compare the running time of #Pac0 answer with the running time of #bb1 answer, and, with an input list of 100 strings, #Pac0 has a running time of 92.28286 seconds, #bb1 has a running time of 42.6106374 seconds. So I will consider #bb1 one as the answer.

new_list = ['%'.join(w.split('%')[2:4]) for w in input_list]

You can use a regular expression (regex) with a capture group:
import re
pattern = re.compile('[^%]*%[^%]*%([^%]*%[^%]*)%[^%]*')
input_list = ['the%quick%brown%fox%jumps', 'over%the%lazy%dog%and']
result = [pattern.search(s).group(1) for s in input_list]
print(result)
Note: the "compile" part is not strictly needed, but can help performance if you have a lot of strings to process.

How about this?
input_list = ['the%quick%brown%fox%jumps', 'over%the%lazy%dog%and']
new_list = ['%'.join(x.split('%')[2:4]) for x in input_list]
print (new_list)
Output
['brown%fox', 'lazy%dog']

Related

Appending list using .capitalize() - compiler freezes with high RAM usage

I'm trying to swap str elements in list with same elements but capitalize their first letter.
While trying to achieve this I'm just step by stepping it and when I try to use for loop to just append list with capitalized elements, my compiler freezes and proceeds to gradually increase in RAM usage up to 90%.
I can guess it has to do something with built in functions that I use (probably incorrectly). Can anyone help me understand what is happening and how should I approach it?
Here is code:
title = 'a clash of KINGS'
out = title.split()
for i in out:
out.append(i.capitalize())
Don't change a list while iterating over it. You keep adding elements to out list. You can print out inside the loop and see for yourself. Even if it didn't enter into infinite loop, still you did not replace the initial values, but just add more and more elements.
you can use list comprehension
title = 'a clash of KINGS'
out = title.split()
out = [word.capitalize() for word in out]
you can combine last 2 lines into one
title = 'a clash of KINGS'
out = [word.capitalize() for word in title.split()]
I think you are in a infinite loop. You're not accessing the out element, you're keep appending a lot of elements inside the list. I think what you're trying to do is:
title = 'a clash of KINGS'
out = title.split()
for i in range(len(out)):
out[i] = out[i].capitalize()

Program to create a string from two given strings by concatenating the characters that are not contained by both strings

Write a Python program to create a string from two given strings by concatenating the characters that are not contained by both strings. The characters from the 1st string should appear before the characters from the 2nd string. Return the resulting string.
Sample input: ‘0abcxyz’, ‘abcxyz1’
Expected Output: ‘01’
I have already got the results but would like to learn if there is a better way to achieve the same results.'''
var14_1, var14_2 = '0abcxyz', 'abcxyz1'
def concat(var14_1,var14_2):
res = []
[res.append(s) for s in var14_1 if s not in var14_2]
[res.append(s) for s in var14_2 if s not in var14_1]
print(''.join(res))
concat(var14_1,var14_2)
The above code is returning the results as 01 which is as
expected. However I would like to know if there is any other way
to arrive at this solution without having to use "for loop"
twice. Your feedback will immensely help in improving my python skills. Thanks in advance!
It would be nicer to not use list comprehensions only to run many times res.append()
var14_1, var14_2 = '0abcxyz', 'abcxyz1'
r1 = [s for s in var14_1 if s not in var14_2]
r2 = [s for s in var14_2 if s not in var14_1]
res = r1 + r2
print(''.join(res))
To use one for loop you could convert strings to sets and get common chars
common = set('0abcxyz') & set('abcxyz1')
and then you can use one for with concatenated strings var14_1 + var14_2
common = set('0abcxyz') & set('abcxyz1')
res = [s for s in var14_1 + var14_2 if s not in common]
print(''.join(res))
Try this.
#furas pointed out you don't need list() while using set, so updated for that.
var14_1, var14_2 = '0abcxyz', 'abcxyz1'
def concat(first, second):
return ''.join(set(first).symmetric_difference(set(second)))
print(concat(var14_1, var14_2))
taking a set of an object creates an unordered collection of unique elements.
set()
has a function called symmetric_difference() which allows you to find the symmetric difference between two sets.

Slicing a list advice

I'm trying to slice a list in a certain way in Python. If I have a list that looks like this:
myList = ['hello.how.are.you', 'hello.how.are.they', 'hello.how.are.we']
Is there a way to slice it so that I can get everything after the last period for each element? So, I would want "you", "they", and "we".
There's no way to slice the list directly that way; what you do is slice each element.
You can easily build a list comprehension where you split on the period and take the last element.
myList = ["hello.how.are.you", "hello.how.are.they", "hello.how.are.we"]
after_last_period = [s.split('.')[-1] for s in myList]
Yes, it can be done:
# Input data
myList = ["hello.how.are.you", "hello.how.are.they", "hello.how.are.we"]
# Define a function to operate on a string
def get_last_part(s):
return s.split(".")[-1]
# Use a list comprehension to apply the function to each item
answer = [get_last_part(s) for s in myList]
# Sample output
>>> answer: ["you", "they", "we"]
A footnote for speed demons: Using s.rpsilt(".", 1)[-1] is even faster than split().
[i.split('.')[-1] for i in myList]
Assuming that you omitted quotes around each list element, use a list comprehension and str.split():
[x.split('.')[-1] for x in myList]

Asserting equal length of string elements in a list

A function I created takes a list of string (long list of long sequences) as an argument. Initially, I want to make sure all strings are of equal length. Of course, I could do it by iterating over all sequences in a loop and checking the length. But I am wondering - is there any way to do it faster/more efficiently?
I've tried looking at the unittest module but I am not sure whether it would suit here. Alternatively, I was thinking about creating a list of len(string) of all strings using list comprehension and then checking whether or elements are the same. However, this seems like a lot of effort.
my_list = [ ... ]
FIXED_SIZE = 100 # Lenght of each string which should be equal
result = all(len(my_string) == FIXED_SIZE for my_string in my_list)
This may help you. If all are same length output will be True otherwise False.
str_list = ['ilo', 'jak']
str_len = map(len,str_list)
all(each_len == str_len[0] for each_len in str_len)

Replace whitespaces with dashes for each item in a list -python

Is there a way of simplifying this loop where i replaces whitespace with dashes for each item in a list?
for item in a_list:
alist[alist.index(item)] = '-'.join(item.split(" "))
or is this better?
for item in a_list:
alist[alist.index(item)] = item.replace(" ", "-")
NOTE: The above solution only updates the 1st occurrence in this list, as David suggested, use list comprehension to do the above task.
I have a list of words and some have dashes while some doesn't. The items in a_list looks like this:
this-item has a-dash
this has dashes
this should-have-more dashes
this foo
doesnt bar
foo
bar
The output should look like this, where all items in list should have dashes instead of whitespace:
this-item-has-a-dash
this-has-dashes
this-should-have-more-dashes
this-foo
doesnt-bar
foo
bar
Use a list comprehension:
a_list = [e.replace(" ", "-") for e in a_list]
When you find yourself using the index method, you've probably done something wrong. (Not always, but often enough that you should think about it.)
In this case, you're iterating a list in order, and you want to know the index of the current element. Looking it up repeatedly is slow (it makes an O(N) algorithm O(N^3))—but, more importantly, it's fragile. For example, if you have two identical items, index will never find the second one.
This is exactly what enumerate was created for. So, do this:
for i, item in enumerate(a_list):
alist[i] = '-'.join(item.split(" "))
Meanwhile, you could replace the loop with a list comprehension:
a_list = ['-'.join(item.split(" ")) for item in a_list]
This could be slower or use more memory (because you're copying the list rather than modifying it in-place), but that almost certainly doesn't matter (it certainly won't be as slow as your original code), and immutable algorithms are simpler and easier to reason about—and more flexible; you can call this version with a tuple, or an arbitrary iterable, not just a list.
As another improvement, do you really need to split and then join, or can you just use replace?
a_list = [item.replace(" ", "-") for item in a_list]
You could use regular expressions instead, which might be better for performance or readability in some similar cases—but I think in this case it would actually be worse. So, once you get here, you're done.

Categories

Resources