python split and remove duplicates

python split and remove duplicates - python

I have the following output with print var:
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan
I'm trying to find the correct split function to populate below:
test.qa.home-page.website.com
test.qa.home-page.website.net
test.qa.home-page.website.com
test.qa.home-page.website.net
...as well as remove duplicates:
test.qa.home-page.website.com
test.qa.home-page.website.net
The numeric values after "com-" or "net-" are random so I think my struggle is finding out how to rsplit ("-" + [CHECK_FOR_ANY_NUMBER])[0] . Any suggestions would be great, thanks in advance!

How about :
import re
output = [
"test.qa.home-page.website.com-3412-jan",
"test.qa.home-page.website.net-5132-mar",
"test.qa.home-page.website.com-8422-aug",
"test.qa.home-page.website.net-9111-jan"
]
trimmed = set([re.split("-[0-9]", item)[0] for item in output])
print(trimmed)
# out : {'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}

If you have an array of values, and you want to remove duplicates, you can use set.
>>> l = [1,2,3,1,2,3]
>>> l
[1, 2, 3, 1, 2, 3]
>>> set(l)
{1, 2, 3}
You can get to a useful array by str.split('-')[0]-ing every value.

You could use a regex to parse the individual lines and a set comprehension to uniqueify:
txt='''\
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan'''
import re
>>> {re.sub(r'^(.*\.(?:com|net)).*', r'\1', s) for s in txt.split() }
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
Or just use the same regex with set and re.findall with the re.M flag:
>>> set(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M))
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
If you want to maintain order, use {}.fromkeys() (since Python 3.6):
>>> list({}.fromkeys(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M)).keys())
['test.qa.home-page.website.com', 'test.qa.home-page.website.net']
Or, if you know your target is always 2 - from the end, just use .rsplit() with maxsplit=2:
>>> {s.rsplit('-',maxsplit=2)[0] for s in txt.splitlines()}
{'test.qa.home-page.website.com', 'test.qa.home-page.website.net'}

Related

Python Split/Slice a string into a list while keeping delimeter

I want to take a string and create a list from it using a delimiter, while keeping delimiter.
If I have "A56-3#AJ4klAP0W" I would like to return a list with A as delimiter.
[A56-3#, AJ4kl, AP0W]
I have tried split and slice but have not been successful.
I did do list comprehension to get a list of the index of each delimiter but haven't been able to do much with it [0, 6, 11]

You can use regular expressions and the findall() function.
>>> re.findall('A?[^A]+', 'A56-3#AJ4klAP0W')
['A56-3#', 'AJ4kl', 'AP0W']
This even works when the string doesn't start with your delimiter. E.g.
>>> re.findall('A?[^A]+', '56-3#AJ4klAP0W')
['56-3#', 'AJ4kl', 'AP0W']
Explanation: (Regex101)
A? : Zero or one "A"
[^A]+ : Followed by one or more "not A"
It's easy to build the regex using an f-string:
def get_substrings(delim, s):
rex = f"{delim}?[^{delim}]+"
return re.findall(rex, s)

Given:
st="A56-3#AJ4klAP0W"
You can get the index of each delimiter with enumerate:
idx=[i for i,ch in enumerate(st) if ch=='A']
Then slice the string with that index:
>>> [st[x:y] for x,y in zip([0]+idx, idx[1:]+[len(st)])]
['A56-3#', 'AJ4kl', 'AP0W']
# this is how you use the [0,6,11] list in your question
You can also use a regex split:
>>> re.split(r'(?=A)', st)
['', 'A56-3#', 'AJ4kl', 'AP0W']
Or find the substrings (rather than split) that satisfy that condition:
>>> re.findall(r'A*[^A]+', st)
['A56-3#', 'AJ4kl', 'AP0W']

Just add it back in:
>>> x = 'A56-3#AJ4klAP0W'
>>> x.split('A')
['', '56-3#', 'J4kl', 'P0W']
>>> ['A'+k for k in x.split('A') if k]
['A56-3#', 'AJ4kl', 'AP0W']
>>>

>>> [f'A{el}' for el in "A56-3#AJ4klAP0W".split('A') if el]
['A56-3#', 'AJ4kl', 'AP0W']
>>>

I am able to parse the log file but not getting output in correct format in python [duplicate]

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

Splitting bracket-separated string to a dictionary

I want to make this string to be dictionary.
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
Following
{'SEQ':'A=1%B=2', 'OPS':'CC=0%G=2', 'T1':'R=N', 'T2':'R=Y'}
I tried this code
d = dict(item.split('(') for item in s.split(')'))
But an error occurred
ValueError: dictionary update sequence element #4 has length 1; 2 is required
I know why this error occurred, the solution is deleting bracket of end
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y'
But it is not good for me. Any other good solution to make this string to be dictionary type ...?

More compactly:
import re
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
print dict(re.findall(r'(.+?)\((.*?)\)', s))

Add a if condition in your generator expression.
>>> s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
>>> s.split(')')
['SEQ(A=1%B=2', 'OPS(CC=0%G=2', 'T1(R=N', 'T2(R=Y', '']
>>> d = dict(item.split('(') for item in s.split(')') if item!='')
>>> d
{'T1': 'R=N', 'OPS': 'CC=0%G=2', 'T2': 'R=Y', 'SEQ': 'A=1%B=2'}

Alternatively, this could be solved with a regular expression:
>>> import re
>>> s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
>>> print dict(match.groups() for match in re.finditer('([^(]+)\(([^)]+)\)', s))
{'T1': 'R=N', 'T2': 'R=Y', 'SEQ': 'A=1%B=2', 'OPS': 'CC=0%G=2'}

How to concatenate (join) items in a list to a single string

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

extracting data from matchobjects

I have a long sequence with multiple repeats of a specific string( 'say GAATTC') randomly throughout the sequence string. I'm currently using the regular expression .span() to provide with me with the indices of where the pattern 'GAATTC' is found. Now I want to use those indices to slice the pattern between the G and A (i.e. 'G|AATTC').
How do I use the data from the match object to slice those out?

If I understand you correctly, you have the string and an index where the sequence GAATTC starts, so do you need this (i here is the m.start for the group)?
>>> seq = "GAATTC"
>>> s = "AATCCTGAGAATTCAAC"
>>> i = 8 # the index where seq starts in s
>>> s[i:]
'GAATTCAAC'
>>> s[i:i+len(seq)]
'GAATTC'
That extracts it. You can also slice the original sequence at the G like this:
>>> s[:i+1]
'AATCCTGAG'
>>> s[i+1:]
'AATTCAAC'
>>>

If what you want to do is replace the 'GAATTC' by the 'G|AATTC' one (not sure of what you want to do in the end), I think that you can manage this without regex:
>>> string = 'GAATTCAAGAATTCTTGAATTCGAATTCAATATATA'
>>> string.replace('GAATTC', 'G|AATTC')
'G|AATTCAAG|AATTCTTG|AATTCG|AATTCAATATATA'
EDIT: ok, this way can be adapted to suit what you want to do:
>>> groups = string.replace('GAATTC', 'G|AATTC').split('|')
>>> groups
['G', 'AATTCAAG', 'AATTCTTG', 'AATTCG', 'AATTCAATATATA']
>>> map(len, groups)
[1, 8, 8, 6, 13]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python split and remove duplicates - python

If you have an array of values, and you want to remove duplicates, you can use set. >>> l = [1,2,3,1,2,3] >>> l [1, 2, 3, 1, 2, 3] >>> set(l) {1, 2, 3} You can get to a useful array by str.split('-')[0]-ing every value.

Related

Python Split/Slice a string into a list while keeping delimeter

I am able to parse the log file but not getting output in correct format in python [duplicate]

Splitting bracket-separated string to a dictionary

How to concatenate (join) items in a list to a single string

extracting data from matchobjects

Categories

Resources