How to handle missing data in split?

How to handle missing data in split? - python

I want to split a string where a part may be missing.
E.g., "foo-bar" should be split into "foo" and "bar" while "zot" into "zot" and None.
foo,bar = line.split('-',1)
works for the first case but not for the second one:
ValueError: need more than 1 value to unpack
I can go, of course, the long way:
foobar = line.split('-',1)
if len(foobar) == 2:
foo,bar = foobar
else:
foo,bar = foobar[0],None
but I wonder if this is the most "pythonic" way.

Catch the exception:
try:
foo, bar = line.split('-', 1)
except ValueError:
# not enough values
foo, bar = line, None
Note that you'd need to split once to get two values, not two times.

For this exact example, I'd use the partition method.
>>> 'foo-bar'.partition('-')
('foo', '-', 'bar')
>>> 'foobar'.partition('-')
('foobar', '', '')
>>> 'foo-bar-baz'.partition('-')
('foo', '-', 'bar-baz')
For the general case where there's more than one split, but still a known number, I usually check the length of the result of split, but Martijn is (unsurprisingly) right that catching the exception is fine too and is probably a better choice if strings missing the delimiter are uncommon.

using list Comprehension:
i=['ff-bb','cc','dd-ss-vv']
[string+[None] if len(string)==1 else string for string in [x.split('-') for x in i]]
returns
[['ff', 'bb'], ['cc', None], ['dd', 'ss', 'vv']]

Related

I am able to parse the log file but not getting output in correct format in python [duplicate]

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

fix length re.sub with dictionary

I have a dictionary with all keys three letter long: threeLetterDict={'abc': 'foo', 'def': 'bar', 'ghi': 'ha' ...}
Now I need to translate a sentence abcdefghi into foobarha. I'm trying the method below with re.sub, but don't know how to put dictionary in to it:
p = re.compile('.{3}') # match every three letters
re.sub(p,'how to put dictionary here?', "abcdefghi")
Thanks! (no need to check if input length is multiple of three)

You can pass any callable to re.sub, so:
p.sub(lambda m: threeLetterDict[m.group(0)], "abcdefghi")
It works!

A solution that avoids re entirely:
threeLetterDict={'abc': 'foo', 'def': 'bar', 'ghi': 'ha'}
threes = map("".join, zip(*[iter('abcdefghi')]*3))
"".join(threeLetterDict[three] for three in threes)
#>>> 'foobarha'

You might not need to use sub here:
>>> p = re.compile('.{3}')
>>> ''.join([threeLetterDict.get(i, i) for i in p.findall('abcdefghi')])
'foobarha'
Just an alternate solution :).

How to convert a malformed string to a dictionary?

I have a string s (note that the a and b are not enclosed in quotation marks, so it can't directly be evaluated as a dict):
s = '{a:1,b:2}'
I want convert this variable to a dict like this:
{'a':1,'b':2}
How can I do this?

This will work with your example:
import ast
def elem_splitter(s):
return s.split(':',1)
s = '{a:1,b:2}'
s_no_braces = s.strip()[1:-1] #s.translate(None,'{}') is more elegant, but can fail if you can have strings with '{' or '}' enclosed.
elements = (elem_splitter(ss) for ss in s_no_braces.split(','))
d = dict((k,ast.literal_eval(v)) for k,v in elements)
Note that this will fail if you have a string formatted as:
'{s:"foo,bar",ss:2}' #comma in string is a problem for this algorithm
or:
'{s,ss:1,v:2}'
but it will pass a string like:
'{s ss:1,v:2}' #{"s ss":1, "v":2}
You may also want to modify elem_splitter slightly, depending on your needs:
def elem_splitter(s):
k,v = s.split(':',1)
return k.strip(),v # maybe `v.strip() also?`
*Somebody else might cook up a better example using more of the ast module, but I don't know it's internals very well, so I doubt I'll have time to make that answer.

As your string is malformed as both json and Python dict so you neither can use json.loads not ast.literal_eval to directly convert the data.
In this particular case, you would have to manually translate it to a Python dictionary by having knowledge of the input data
>>> foo = '{a:1,b:2}'
>>> dict(e.split(":") for e in foo.translate(None,"{}").split(","))
{'a': '1', 'b': '2'}
As Updated by Tim, and my short-sightedness I missed the fact that the values should be integer, here is an alternate implementation
>>> {k: int(v) for e in foo.translate(None,"{}").split(",")
for k, v in [e.split(":")]}
{'a': 1, 'b': 2}

import re,ast
regex = re.compile('([a-z])')
ast.literal_eval(regex.sub(r'"\1"', s))
out:
{'a': 1, 'b': 2}
EDIT:
If you happen to have something like {foo1:1,bar:2} add an additional capture group to the regex:
regex = re.compile('(\w+)(:)')
ast.literal_eval(regex.sub(r'"\1"\2', s))

You can do it simply with this:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
to_int = lambda x: int(x) if x.isdigit() else x
d = dict((to_int(i) for i in pair.split(":", 1)) for pair in content.split(","))
For simplicity I've omitted exception handling if the string doesn't contain a valid specification, and also this version doesn't strip whitespace, which you may want. If the interpretation you prefer is that the key is always a string and the value is always an int, then it's even easier:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
d = dict((int(pair[0]), pair[1].strip()) for pair in content.split(","))
As a bonus, this version also strips whitespace from the key to show how simple it is.

import simplejson
s = '{a:1,b:2}'
a = simplejson.loads(s)
print a

How to concatenate (join) items in a list to a single string

How do I concatenate a list of strings into a single string?
For example, given ['this', 'is', 'a', 'sentence'], how do I get "this-is-a-sentence"?
For handling a few strings in separate variables, see How do I append one string to another in Python?.
For the opposite process - creating a list from a string - see How do I split a string into a list of characters? or How do I split a string into a list of words? as appropriate.

Use str.join:
>>> words = ['this', 'is', 'a', 'sentence']
>>> '-'.join(words)
'this-is-a-sentence'
>>> ' '.join(words)
'this is a sentence'

A more generic way (covering also lists of numbers) to convert a list to a string would be:
>>> my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> my_lst_str = ''.join(map(str, my_lst))
>>> print(my_lst_str)
12345678910

It's very useful for beginners to know
why join is a string method.
It's very strange at the beginning, but very useful after this.
The result of join is always a string, but the object to be joined can be of many types (generators, list, tuples, etc).
.join is faster because it allocates memory only once. Better than classical concatenation (see, extended explanation).
Once you learn it, it's very comfortable and you can do tricks like this to add parentheses.
>>> ",".join("12345").join(("(",")"))
Out:
'(1,2,3,4,5)'
>>> list = ["(",")"]
>>> ",".join("12345").join(list)
Out:
'(1,2,3,4,5)'

Edit from the future: Please don't use the answer below. This function was removed in Python 3 and Python 2 is dead. Even if you are still using Python 2 you should write Python 3 ready code to make the inevitable upgrade easier.
Although #Burhan Khalid's answer is good, I think it's more understandable like this:
from str import join
sentence = ['this','is','a','sentence']
join(sentence, "-")
The second argument to join() is optional and defaults to " ".

list_abc = ['aaa', 'bbb', 'ccc']
string = ''.join(list_abc)
print(string)
>>> aaabbbccc
string = ','.join(list_abc)
print(string)
>>> aaa,bbb,ccc
string = '-'.join(list_abc)
print(string)
>>> aaa-bbb-ccc
string = '\n'.join(list_abc)
print(string)
>>> aaa
>>> bbb
>>> ccc

We can also use Python's reduce function:
from functools import reduce
sentence = ['this','is','a','sentence']
out_str = str(reduce(lambda x,y: x+"-"+y, sentence))
print(out_str)

We can specify how we join the string. Instead of '-', we can use ' ':
sentence = ['this','is','a','sentence']
s=(" ".join(sentence))
print(s)

If you have a mixed content list and want to stringify it, here is one way:
Consider this list:
>>> aa
[None, 10, 'hello']
Convert it to string:
>>> st = ', '.join(map(str, map(lambda x: f'"{x}"' if isinstance(x, str) else x, aa)))
>>> st = '[' + st + ']'
>>> st
'[None, 10, "hello"]'
If required, convert back to the list:
>>> ast.literal_eval(st)
[None, 10, 'hello']

If you want to generate a string of strings separated by commas in final result, you can use something like this:
sentence = ['this','is','a','sentence']
sentences_strings = "'" + "','".join(sentence) + "'"
print (sentences_strings) # you will get "'this','is','a','sentence'"

def eggs(someParameter):
del spam[3]
someParameter.insert(3, ' and cats.')
spam = ['apples', 'bananas', 'tofu', 'cats']
eggs(spam)
spam =(','.join(spam))
print(spam)

Without .join() method you can use this method:
my_list=["this","is","a","sentence"]
concenated_string=""
for string in range(len(my_list)):
if string == len(my_list)-1:
concenated_string+=my_list[string]
else:
concenated_string+=f'{my_list[string]}-'
print([concenated_string])
>>> ['this-is-a-sentence']
So, range based for loop in this example , when the python reach the last word of your list, it should'nt add "-" to your concenated_string. If its not last word of your string always append "-" string to your concenated_string variable.

Splitting a string by capital letters

I currently have the following code, which finds capital letters in a string 'formula': http://pastebin.com/syRQnqCP
Now, my question is, how can I alter that code (Disregard the bit within the "if choice = 1:" loop) so that each part of that newly broken up string is put into it's own variable?
For example, putting in NaBr would result in the string being broken into "Na" and "Br". I need to put those in separate variables so I can look them up in my CSV file.
Preferably it'd be a kind of generated thing, so if there are 3 elements, like MgSO4, O would be put into a separate variable like Mg and S would be.
If this is unclear, let me know and I'll try and make it a bit more comprehensible... No way of doing so comes to mind currently, though. :(
EDIT: Relevant pieces of code:
Function:
def split_uppercase(string):
x=''
for i in string:
if i.isupper(): x+=' %s' %i
else: x+=i
return x.strip()
String entry and lookup:
formula = raw_input("Enter formula: ")
upper = split_uppercase(formula)
#Pull in data from form.csv
weight1 = float(formul_data.get(element1.lower()))
weight2 = float(formul_data.get(element2.lower()))
weight3 = float(formul_data.get(element3.lower()))
weightSum = weight1 + weight2 + weight3
print "Total weight =", weightSum

I think there is a far easier way to do what you're trying to do. Use regular expressions. For instance:
>>> [a for a in re.split(r'([A-Z][a-z]*)', 'MgSO4') if a]
['Mg', u'S', u'O', u'4']
If you want the number attached to the right element, just add a digit specifier in the regex:
>>> [a for a in re.split(r'([A-Z][a-z]*\d*)', txt) if a]
[u'Mg', u'S', u'O4']
You don't really want to "put each part in its own variable". That doesn't make sense in general, because you don't know how many parts there are, so you can't know how many variables to create ahead of time. Instead, you want to make a list, like in the example above. Then you can iterate over this list and do what you need to do with each piece.

You can use re.split to perform complex splitting on strings.
import re
def split_upper(s):
return filter(None, re.split("([A-Z][^A-Z]*)", s))
>>> split_upper("fooBarBaz")
['foo', 'Bar', 'Baz']
>>> split_upper("fooBarBazBB")
['foo', 'Bar', 'Baz', 'B', 'B']
>>> split_upper("fooBarBazBB4")
['foo', 'Bar', 'Baz', 'B', 'B4']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to handle missing data in split? - python

Catch the exception: try: foo, bar = line.split('-', 1) except ValueError: # not enough values foo, bar = line, None Note that you'd need to split once to get two values, not two times.

using list Comprehension: i=['ff-bb','cc','dd-ss-vv'] [string+[None] if len(string)==1 else string for string in [x.split('-') for x in i]] returns [['ff', 'bb'], ['cc', None], ['dd', 'ss', 'vv']]

Related

I am able to parse the log file but not getting output in correct format in python [duplicate]

fix length re.sub with dictionary

How to convert a malformed string to a dictionary?

How to concatenate (join) items in a list to a single string

Splitting a string by capital letters

Categories

Resources