Splitting bracket-separated string to a dictionary

Splitting bracket-separated string to a dictionary - python

I want to make this string to be dictionary.
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
Following
{'SEQ':'A=1%B=2', 'OPS':'CC=0%G=2', 'T1':'R=N', 'T2':'R=Y'}
I tried this code
d = dict(item.split('(') for item in s.split(')'))
But an error occurred
ValueError: dictionary update sequence element #4 has length 1; 2 is required
I know why this error occurred, the solution is deleting bracket of end
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y'
But it is not good for me. Any other good solution to make this string to be dictionary type ...?

More compactly:
import re
s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
print dict(re.findall(r'(.+?)\((.*?)\)', s))

Add a if condition in your generator expression.
>>> s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
>>> s.split(')')
['SEQ(A=1%B=2', 'OPS(CC=0%G=2', 'T1(R=N', 'T2(R=Y', '']
>>> d = dict(item.split('(') for item in s.split(')') if item!='')
>>> d
{'T1': 'R=N', 'OPS': 'CC=0%G=2', 'T2': 'R=Y', 'SEQ': 'A=1%B=2'}

Alternatively, this could be solved with a regular expression:
>>> import re
>>> s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)'
>>> print dict(match.groups() for match in re.finditer('([^(]+)\(([^)]+)\)', s))
{'T1': 'R=N', 'T2': 'R=Y', 'SEQ': 'A=1%B=2', 'OPS': 'CC=0%G=2'}

Related

python split and remove duplicates

I have the following output with print var:
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan
I'm trying to find the correct split function to populate below:
test.qa.home-page.website.com
test.qa.home-page.website.net
test.qa.home-page.website.com
test.qa.home-page.website.net
...as well as remove duplicates:
test.qa.home-page.website.com
test.qa.home-page.website.net
The numeric values after "com-" or "net-" are random so I think my struggle is finding out how to rsplit ("-" + [CHECK_FOR_ANY_NUMBER])[0] . Any suggestions would be great, thanks in advance!

How about :
import re
output = [
"test.qa.home-page.website.com-3412-jan",
"test.qa.home-page.website.net-5132-mar",
"test.qa.home-page.website.com-8422-aug",
"test.qa.home-page.website.net-9111-jan"
]
trimmed = set([re.split("-[0-9]", item)[0] for item in output])
print(trimmed)
# out : {'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}

If you have an array of values, and you want to remove duplicates, you can use set.
>>> l = [1,2,3,1,2,3]
>>> l
[1, 2, 3, 1, 2, 3]
>>> set(l)
{1, 2, 3}
You can get to a useful array by str.split('-')[0]-ing every value.

You could use a regex to parse the individual lines and a set comprehension to uniqueify:
txt='''\
test.qa.home-page.website.com-3412-jan
test.qa.home-page.website.net-5132-mar
test.qa.home-page.website.com-8422-aug
test.qa.home-page.website.net-9111-jan'''
import re
>>> {re.sub(r'^(.*\.(?:com|net)).*', r'\1', s) for s in txt.split() }
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
Or just use the same regex with set and re.findall with the re.M flag:
>>> set(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M))
{'test.qa.home-page.website.net', 'test.qa.home-page.website.com'}
If you want to maintain order, use {}.fromkeys() (since Python 3.6):
>>> list({}.fromkeys(re.findall(r'^(.*\.(?:com|net))', txt, flags=re.M)).keys())
['test.qa.home-page.website.com', 'test.qa.home-page.website.net']
Or, if you know your target is always 2 - from the end, just use .rsplit() with maxsplit=2:
>>> {s.rsplit('-',maxsplit=2)[0] for s in txt.splitlines()}
{'test.qa.home-page.website.com', 'test.qa.home-page.website.net'}

I am able to parse the log file but not getting output in correct format in python [duplicate]

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

How to replace string to the other string in list (python)

What is the best way to replace every string in the list?
For example if I have a list:
a = ['123.txt', '1234.txt', '654.txt']
and I would like to have:
a = ['123', '1234', '654']

Assuming that sample input is similar to what you actually have, use os.path.splitext() to remove file extensions:
>>> import os
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> [os.path.splitext(item)[0] for item in a]
['123', '1234', '654']

Use a list comprehension as follows:
a = ['123.txt', '1234.txt', '654.txt']
answer = [item.replace('.txt', '') for item in a]
print(answer)
Output
['123', '1234', '654']

Assuming that all your strings end with '.txt', just slice the last four characters off.
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> a = [x[:-4] for x in a]
>>> a
['123', '1234', '654']
This will also work if you have some weird names like 'some.txtfile.txt'

You could split you with . separator and get first item:
In [486]: [x.split('.')[0] for x in a]
Out[486]: ['123', '1234', '654']

Another way to do this:
a = [x[: -len("txt")-1] for x in a]

What is the best way to replace every string in the list?
That completely depends on how you define 'best'. I, for example, like regular expressions:
import re
a = ['123.txt', '1234.txt', '654.txt']
answer = [re.sub('^(\w+)\..*', '\g<1>', item) for item in a]
#print(answer)
#['123', '1234', '654']
Depending on the content of the strings, you could adjust it:
\w+ vs [0-9]+ for only digits
\..* vs \.txt if all strings end with .txt

data.colname = [item.replace('anythingtoreplace', 'desiredoutput') for item in data.colname]
Please note here 'data' is the dataframe, 'colname' is the column name you might have in that dataframe. Even the spaces are accounted, if you want to remove them from a string or number. This was quite useful for me. Also this does not change the datatype of the column so you might have to do that separately if required.

How to convert a malformed string to a dictionary?

I have a string s (note that the a and b are not enclosed in quotation marks, so it can't directly be evaluated as a dict):
s = '{a:1,b:2}'
I want convert this variable to a dict like this:
{'a':1,'b':2}
How can I do this?

This will work with your example:
import ast
def elem_splitter(s):
return s.split(':',1)
s = '{a:1,b:2}'
s_no_braces = s.strip()[1:-1] #s.translate(None,'{}') is more elegant, but can fail if you can have strings with '{' or '}' enclosed.
elements = (elem_splitter(ss) for ss in s_no_braces.split(','))
d = dict((k,ast.literal_eval(v)) for k,v in elements)
Note that this will fail if you have a string formatted as:
'{s:"foo,bar",ss:2}' #comma in string is a problem for this algorithm
or:
'{s,ss:1,v:2}'
but it will pass a string like:
'{s ss:1,v:2}' #{"s ss":1, "v":2}
You may also want to modify elem_splitter slightly, depending on your needs:
def elem_splitter(s):
k,v = s.split(':',1)
return k.strip(),v # maybe `v.strip() also?`
*Somebody else might cook up a better example using more of the ast module, but I don't know it's internals very well, so I doubt I'll have time to make that answer.

As your string is malformed as both json and Python dict so you neither can use json.loads not ast.literal_eval to directly convert the data.
In this particular case, you would have to manually translate it to a Python dictionary by having knowledge of the input data
>>> foo = '{a:1,b:2}'
>>> dict(e.split(":") for e in foo.translate(None,"{}").split(","))
{'a': '1', 'b': '2'}
As Updated by Tim, and my short-sightedness I missed the fact that the values should be integer, here is an alternate implementation
>>> {k: int(v) for e in foo.translate(None,"{}").split(",")
for k, v in [e.split(":")]}
{'a': 1, 'b': 2}

import re,ast
regex = re.compile('([a-z])')
ast.literal_eval(regex.sub(r'"\1"', s))
out:
{'a': 1, 'b': 2}
EDIT:
If you happen to have something like {foo1:1,bar:2} add an additional capture group to the regex:
regex = re.compile('(\w+)(:)')
ast.literal_eval(regex.sub(r'"\1"\2', s))

You can do it simply with this:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
to_int = lambda x: int(x) if x.isdigit() else x
d = dict((to_int(i) for i in pair.split(":", 1)) for pair in content.split(","))
For simplicity I've omitted exception handling if the string doesn't contain a valid specification, and also this version doesn't strip whitespace, which you may want. If the interpretation you prefer is that the key is always a string and the value is always an int, then it's even easier:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
d = dict((int(pair[0]), pair[1].strip()) for pair in content.split(","))
As a bonus, this version also strips whitespace from the key to show how simple it is.

import simplejson
s = '{a:1,b:2}'
a = simplejson.loads(s)
print a

How to concatenate (join) items in a list to a single string

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting bracket-separated string to a dictionary - python

More compactly: import re s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)' print dict(re.findall(r'(.+?)\((.*?)\)', s))

Alternatively, this could be solved with a regular expression: >>> import re >>> s = 'SEQ(A=1%B=2)OPS(CC=0%G=2)T1(R=N)T2(R=Y)' >>> print dict(match.groups() for match in re.finditer('([^(]+)\(([^)]+)\)', s)) {'T1': 'R=N', 'T2': 'R=Y', 'SEQ': 'A=1%B=2', 'OPS': 'CC=0%G=2'}

Related

python split and remove duplicates

I am able to parse the log file but not getting output in correct format in python [duplicate]

How to replace string to the other string in list (python)

How to convert a malformed string to a dictionary?

How to concatenate (join) items in a list to a single string

Categories

Resources