Python string split, handling single quotes - python

I am trying to split a string by ",".
'split' function works fine for the following 'example1' as expected.
example1 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc'"
example1.split(",")
Result: ['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc'"]
But, here i have a scenario, where there are commas within the single quotes, on which i do not want to split on.
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
example2.split(",")
Result: ["1,'aaa',337.5,17195,.02,0,0,'yes','abc,", 'def,', "xyz'"]
But I am trying to get this result instead:
['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc, def, xyz'"]
How can I achieve this with string split function?

You should first try to use built-ins or the standard library to read in your data as a list, for instance directly from a CSV file via the csv module.
If your string is from a source you cannot control, adding opening and closing square brackets gives a valid list, so you can use ast.literal_eval:
from ast import literal_eval
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
res = literal_eval(f'[{example2}]')
# [1, 'aaa', 337.5, 17195, 0.02, 0, 0, 'yes', 'abc, def, xyz']
This does convert numeric data to integers / floats as appropriate. If you would like to keep them as strings, as per #JonClements' comment, you can pass to csv.reader:
import csv
res = next(csv.reader([example2], quotechar="'"))
# ['1', 'aaa', '337.5', '17195', '.02', '0', '0', 'yes', 'abc, def, xyz']

Assuming that you want to keep those 's around the elements ("'aaa'" instead of 'aaa' as in your expected output), here's how you may do it with a function:
def spl(st, ch):
res = []
temp = []
in_quote = False
for x in st:
if (x == "'"):
in_quote = not in_quote
if (not in_quote and x == ch):
res.append("".join(temp))
temp = []
else:
temp.append(x)
res.append("".join(temp))
return res
example2 = "1,'aaa',337.5,17195,.02,0,0,'yes','abc, def, xyz'"
print(spl(example2, ','))
Output:
['1', "'aaa'", '337.5', '17195', '.02', '0', '0', "'yes'", "'abc, def, xyz'"]

Related

how to create dictionary or json from list of strings in python?

I am trying to create dictionary or json from list of strings which is going to be value for dictionary and I have dictionary key that going to be match value. I think it is intuitive for me to get this done, but I have following error:
> Traceback (most recent call last):
> output[elm].append(k) TypeError: list indices must be integers or slices, not str
I am curious what cause this problem. can anyone point me out what is wrong with me code?
my attempt
here is my code:
update
lst = {api, 1,0,0}
mystring = lst.split(",")
mystring = ['api', '1', '0', '0']
names = {'name', 'mayor', 'minor', 'patch'}
output =[]
for elm in range(len(names)):
for k in range(len(mystring)):
output[elm].append(k)
print(output)
how can I fix the error? Is there efficient way to make dictionary or json from list of strings without using nested for loop? any better idea?
desired output:
I want to get dictionary or json something like this:
{
"mayor": 1,
"minor": 0,
"name": "api",
"patch": 0
}
I am sorry if my coding defect is minor, I couldn't locate source of the problem. thanks
A neat one-liner:
my_string = ['api', '1', '0', '0']
names = ['name', 'mayor', 'minor', 'patch']
output = {names[i]: my_string[i] for i in range(len(my_string))}
print(output)
This should give:
{'name': 'api', 'mayor': '1', 'minor': '0', 'patch': '0'}
Note that your variable names is no longer a set; There was no need to have a set. Also, if there was a set, the code wouldn't work because sets are not subscriptable.
There are a lot of errors in your code:
Over here, you are declaring a list, not a dictionary:
output =[]
Over here you are going over the length of your set and list, you have to go over them:
for elm in range(len(names)):
for k in range(len(mystring)):
Use builtin zip function. Zip works on lists, so you need to convert your set to list first. Of cause it's not a good idea to rely on the order in the set, so it's better to create both lists as list
mystring = ['api', '1', '0', '0']
names = {'name', 'mayor', 'minor', 'patch'}
output = zip(list(names),mystring)
print(dict(output))
or using two lists as an input
mystring = ['api', '1', '0', '0']
names = ['name', 'mayor', 'minor', 'patch']
output = zip(names, mystring)
print(dict(output))
Both prints
{'name': 'api', 'patch': '1', 'mayor': '0', 'minor': '0'}
Of cause if you want numbers to be a number and not string, make sure their types are correct in the list that you pass to zip:
mystring = ['api', 1, 0, 0]
names = {'name', 'mayor', 'minor', 'patch'}
output = zip(list(names),mystring)
print(dict(output))
prints
{'patch': 'api', 'minor': 1, 'mayor': 0, 'name': 0}

How to split a complex string in Python

Given the following string:
my_string = "fan_num=2,fan1=0,fan2=0,chain_xtime8={X0=8,X1=3,X2=11},chain_offside_6=0,chain_offside_7=0,chain_offside_8=0,chain_opencore_6=0,chain_opencore_7=0,chain_opencore_8=0"
How can I split it such that I get the following output:
[
fan_num=2,
fan1=0,
fan2=0,
chain_xtime8={X0=8,X1=3,X2=11},
chain_offside_6=0,
chain_offside_7=0,
chain_offside_8=0,
chain_opencore_6=0,
chain_opencore_7=0,
chain_opencore_8=0
]
I've tried:
output = my_string.split(',')
However, that splits the chain_xtime8 value which is not what I want. I am using Python 2.7.
Through a series of convoluted replacements, I converted this style into JSON. Then using json.loads you can convert it into a Python dict. This ASSUMES you do not use the characters being replaced and you continue only using integers as values
This can obviously be tightened up but I wanted to leave it readable
import json
my_string = "fan_num=2,fan1=0,fan2=0,chain_xtime8={X0=8,X1=3,X2=11},chain_offside_6=0,chain_offside_7=0,chain_offside_8=0,chain_opencore_6=0,chain_opencore_7=0,chain_opencore_8=0"
my_string = '{"' + my_string + '"}'
my_string = my_string.replace('=', '":"')
my_string = my_string.replace(',', '","')
my_string = my_string.replace('"{', '{"')
my_string = my_string.replace('}"', '"}')
myDict = json.loads(my_string)
pprint of myDict results :
{'chain_offside_6': '0',
'chain_offside_7': '0',
'chain_offside_8': '0',
'chain_opencore_6': '0',
'chain_opencore_7': '0',
'chain_opencore_8': '0',
'chain_xtime8': {'X0': '8', 'X1': '3', 'X2': '11'},
'fan1': '0',
'fan2': '0',
'fan_num': '2'}
Also one more example -
print(myDict['chain_xtime8']['X0'])
>> 8

Separate each item of a list in an specific way

I have an input, which is a tuple of strings, encoded in a1z26 cipher: numbers from 1 to 26 represent alphabet letters, hyphens represent same word letters and spaces represent an space between words.
For example:
8-9 20-8-5-18-5 should translate to 'hi there'
Let's say that the last example is a tuple in a var called string
string = ('8-9','20-8-5-18-5')
The first thing I find logical is convert the tuple into a list using
string = list(string)
so now
string = ['8-9','20-8-5-18-5']
The problem now is that when I iterate over the list to compare it with a dictionary which has the translated values, double digit numbers are treated as one, so instead of, for example, translating '20' it translate '2' and then '0', resulting in the string saying 'hi bheahe' (2 =b, 1 = a and 8 = h)
so I need a way to convert the list above to the following
list
['8','-','9',' ','20','-','8','-','5','-','18','-','5',]
I've already tried various codes using
list(),
join() and
split()
But it ends up giving me the same problem.
To sum up, I need to make any given list (converted from the input tuple) into a list of characters that takes into account double digit numbers, spaces and hyphens altogether
This is what I've got so far. (The last I wrote) The input is further up in the code (string)
a1z26 = {'1':'A', '2':'B', '3':'C', '4':'D', '5':'E', '6':'F', '7':'G', '8':'H', '9':'I', '10':'J', '11':'K', '12':'L', '13':'M', '14':'N', '15':'O', '16':'P', '17':'Q', '18':'R', '19':'S', '20':'T', '21':'U', '22':'V', '23':'W', '24':'X', '25':'Y', '26':'Z', '-':'', ' ' : ' ', ', ' : ' '}
translation = ""
code = list(string)
numbersarray1 = code
numbersarray2 = ', '.join(numbersarray1)
for char in numbersarray2:
if char in a1z26:
translation += a1z26[char]
There's no need to convert the tuple to a list. Tuples are iterable too.
I don't think the list you name is what you actually want. You probably want a 2d iterable (not necessarily a list, as you'll see below we can do this in one pass without generating an intermediary list), where each item corresponds to a word and is a list of the character numbers:
[[8, 9], [20, 8, 5, 18, 5]]
From this, you can convert each number to a letter, join the letters together to form the words, then join the words with spaces.
To do this, you need to pass a parameter to split, to tell it how to split your input string. You can achieve all of this with a one liner:
plaintext = ' '.join(''.join(num_to_letter[int(num)] for num in word.split('-'))
for word in ciphertext.split(' '))
This does exactly the splitting procedure as described above, and then for each number looks into the dict num_to_letter to do the conversion.
Note that you don't even need this dict. You can use the fact that A-Z in unicode is contiguous so to convert 1-26 to A-Z you can do chr(ord('A') + num - 1).
You don't really need hypens, am I right?
I suggest you the following approach:
a = '- -'.join(string).split('-')
Now a is ['8', '9', ' ', '20', '8', '5', '18', '5']
You can then convert each number to the proper character using your dictionary
b = ''.join([a1z26[i] for i in a])
Now b is equal to HI THERE
I think, it's better to apply regular expressions there.
Example:
import re
...
src = ('8-9', '20-8-5-18-5')
res = [match for tmp in src for match in re.findall(r"([0-9]+|[^0-9]+)", tmp + " ")][:-1]
print(res)
Result:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
using regex here is solution
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
print(data)
output
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
if you want to get hi there from the input string , here is a method (i am assuming all character are in uppercase):
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
new_str =''
for i in range(len(data)):
if data[i].isdigit():
new_str+=chr(int(data[i])+64)
else:
new_str+=data[i]
result = new_str.replace('-','')
output:
HI THERE
You could also try this itertools solution:
from itertools import chain
from itertools import zip_longest
def separate_list(lst, delim, sep=" "):
result = []
for x in lst:
chars = x.split(delim) # 1
pairs = zip_longest(chars, [delim] * (len(chars) - 1), fillvalue=sep) # 2, 3
result.extend(list(chain.from_iterable(pairs))) # 4
return result[:-1] # 5
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
Output:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
Explanation of above code:
Split each string by delimiter '-'.
Create interspersing delimiters.
Create pairs of characters and separators with itertools.zip_longest.
Extend flattened pairs to result list with itertools.chain.from_iterable.
Remove trailing ' ' from result list added.
You could also create your own intersperse generator function and apply it twice:
from itertools import chain
def intersperse(iterable, delim):
it = iter(iterable)
yield next(it)
for x in it:
yield delim
yield x
def separate_list(lst, delim, sep=" "):
return list(
chain.from_iterable(
intersperse(
(intersperse(x.split(delim), delim=delim) for x in lst), delim=[sep]
)
)
)
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
# ['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']

How to remove specific strings from a list

From the following list how can I remove elements ending with Text.
My expected result is a=['1,2,3,4']
My List is a=['1,2,3,4,5Text,6Text']
Should i use endswith to go about this problem?
Split on commas, then filter on strings that are only digits:
a = [','.join(v for v in a[0].split(',') if v.isdigit())]
Demo:
>>> a=['1,2,3,4,5Text,6Text']
>>> [','.join(v for v in a[0].split(',') if v.isdigit())]
['1,2,3,4']
It looks as if you really wanted to work with lists of more than one element though, at which point you could just filter:
a = ['1', '2', '3', '4', '5Text', '6Text']
a = filter(str.isdigit, a)
or, using a list comprehension (more suitable for Python 3 too):
a = ['1', '2', '3', '4', '5Text', '6Text']
a = [v for v in a if v.isdigit()]
Use str.endswith to filter out such items:
>>> a = ['1,2,3,4,5Text,6Text']
>>> [','.join(x for x in a[0].split(',') if not x.endswith('Text'))]
['1,2,3,4']
Here str.split splits the string at ',' and returns a list:
>>> a[0].split(',')
['1', '2', '3', '4', '5Text', '6Text']
Now filter out items from this list and then join them back using str.join.
try this. This works with every text you have in the end.
a=['1,2,3,4,5Text,6Text']
a = a[0].split(',')
li = []
for v in a:
try : li.append(int(v))
except : pass
print li

How to convert text format list into a python list

I am getting various data types from a config file and adding them to a dictionary. but I am having a problem with lists. I want to take a line with text: alist = [1,2,3,4,5,6,7] and convert into a list of integers. But I am getting
['1', ',', '2', ',', '3', ',', '4', ',', '5', ',', '6', ',', '7'].
How can I fix this?
Here is config.txt:
firstname="Joe"
lastname="Bloggs"
employeeId=715
type="ios"
push-token="12345"
time-stamp="Mon, 22 Jul 2013 18:45:58 GMT"
api-version="1"
phone="1010"
level=7
mylist=[1,2,3,4,5,6,7]
Here is my code to parse:
mapper = {}
def massage_type(s):
if s.startswith('"'):
return s[1:-1]
elif s.startswith('['):
return list(s[1:-1]) #in this case get 'mylist': ['1', ',', '2', ',', '3', ',', '4', ',', '5', ',', '6', ',', '7']
elif s.startswith('{'):
return "object" #todo
else:
return int(s)
doc = open('config.txt')
for line in doc:
line = line.strip()
tokens = line.split('=')
if len(tokens) == 2:
formatted = massage_type(tokens[1])
mapper[tokens[0]] = formatted
#check integer list
mapper["properlist"] = [1,2,3,4,5,6,7] #this one works
print mapper
Here is my printed output:
{'time-stamp': 'Mon, 22 Jul 2013 18:45:58 GMT', 'mylist': ['1', ',', '2', ',', '3', ',', '4', ',', '5', ',', '6', ',', '7'], 'employeeId': 715, 'firstname': 'Joe', 'level': 7, 'properlist': [1, 2, 3, 4, 5, 6, 7], 'lastname': 'Bloggs', 'phone': '1010', 'push-token': '12345', 'api-version': '1', 'type': 'ios'}
Update.
Thanks for the feedback. I realised that I could also get heterogeneous list so changed list part to:
elif s.startswith('['):
#check element type
elements = s[1:-1].split(',')
tmplist = [] #assemble temp list
for elem in elements:
if elem.startswith('"'):
tmplist.append(elem[1:-1])
else:
tmplist.append(int(elem))
return tmplist
It only handles strings and integers but is good enough for what I need right now.
You need to change the return statement to.
return [int(elem) for elem in s[1:-1].split(',')] # Or map(int, s[1:-1].split(','))
maybe try ast.literal_eval
here is an example:
import ast
str1 = '[1,2,3,4,5]'
ast.literal_eval(str1)
output will be a list like this:
[1,2,3,4,5]
it wont include the commas in the list
You might also consider using ConfigParser (Python 3 example below, Python 2 imports ConfigParser.ConfigParser, I believe):
from configparser import ConfigParser
parser = ConfigParser()
conf_file = os.path.join(dir_it's_located_in, 'config.txt')
parser.read(conf_file)
After that, it's really basic: your whole config file is treated like a dictionary object and all configuration lines are keys in the dictionary:
firstname = parser['firstname']
lastname = parser['lastname']
You can also set up sections in your configuration like so:
[employee info]
email = "something#something.com"
birthday = 10/12/98
And you can reference these in the following way:
birthday = parser["employee info"]["birthday"]
And, as always, there are some great examples in the docs: http://docs.python.org/3.2/library/configparser.html
You can use split():
elif s.startswith('['):
return [int(x) for x in s[1:-1].split(',')]
This will give you the list without the commas.
ummm
elif s.startswith('['):
return map(int,s[1:-1].split(","))
Currently you're converting a string to a list of characters. You want to be doing this:
map(int, str[1:-1].split(','))
That will give you the list of ints you are after.
I like the idea of using ConfigParser as #erewok mentioned, here's the whole "parser"
def parse(content):
def parseList(content):
# Recursive strategy
listed = content.strip("[]").split(",")
return map(parse, listed)
def parseString(content):
return content.strip("\"")
def parseNumber(content):
return int(content)
def parse(content):
if (content.startswith("\"")):
return parseString(content)
elif (content.startswith("[")):
return parseList(content)
elif (content.isdigit()):
return parseNumber(content)
# Create dictionary with values
result = {}
for line in content.splitlines():
key, value = line.split("=",1)
result[key] = parse(value)
return result
I'm using a recursive strategy to sub-parse elements within the list you are getting, in case the list comes with numbers and strings mixed

Categories

Resources