How to convert comma-delimited string to list in Python? - python

Given a string that is a sequence of several values separated by a commma:
mStr = 'A,B,C,D,E'
How do I convert the string to a list?
mList = ['A', 'B', 'C', 'D', 'E']

You can use the str.split method.
>>> my_string = 'A,B,C,D,E'
>>> my_list = my_string.split(",")
>>> print my_list
['A', 'B', 'C', 'D', 'E']
If you want to convert it to a tuple, just
>>> print tuple(my_list)
('A', 'B', 'C', 'D', 'E')
If you are looking to append to a list, try this:
>>> my_list.append('F')
>>> print my_list
['A', 'B', 'C', 'D', 'E', 'F']

In the case of integers that are included at the string, if you want to avoid casting them to int individually you can do:
mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]
It is called list comprehension, and it is based on set builder notation.
ex:
>>> mStr = "1,A,B,3,4"
>>> mList = [int(e) if e.isdigit() else e for e in mStr.split(',')]
>>> mList
>>> [1,'A','B',3,4]

Consider the following in order to handle the case of an empty string:
>>> my_string = 'A,B,C,D,E'
>>> my_string.split(",") if my_string else []
['A', 'B', 'C', 'D', 'E']
>>> my_string = ""
>>> my_string.split(",") if my_string else []
[]

>>> some_string='A,B,C,D,E'
>>> new_tuple= tuple(some_string.split(','))
>>> new_tuple
('A', 'B', 'C', 'D', 'E')

You can split that string on , and directly get a list:
mStr = 'A,B,C,D,E'
list1 = mStr.split(',')
print(list1)
Output:
['A', 'B', 'C', 'D', 'E']
You can also convert it to an n-tuple:
print(tuple(list1))
Output:
('A', 'B', 'C', 'D', 'E')

You can use this function to convert comma-delimited single character strings to list-
def stringtolist(x):
mylist=[]
for i in range(0,len(x),2):
mylist.append(x[i])
return mylist

#splits string according to delimeters
'''
Let's make a function that can split a string
into list according the given delimeters.
example data: cat;dog:greff,snake/
example delimeters: ,;- /|:
'''
def string_to_splitted_array(data,delimeters):
#result list
res = []
# we will add chars into sub_str until
# reach a delimeter
sub_str = ''
for c in data: #iterate over data char by char
# if we reached a delimeter, we store the result
if c in delimeters:
# avoid empty strings
if len(sub_str)>0:
# looks like a valid string.
res.append(sub_str)
# reset sub_str to start over
sub_str = ''
else:
# c is not a deilmeter. then it is
# part of the string.
sub_str += c
# there may not be delimeter at end of data.
# if sub_str is not empty, we should att it to list.
if len(sub_str)>0:
res.append(sub_str)
# result is in res
return res
# test the function.
delimeters = ',;- /|:'
# read the csv data from console.
csv_string = input('csv string:')
#lets check if working.
splitted_array = string_to_splitted_array(csv_string,delimeters)
print(splitted_array)

Related

Splitting a single item list at commas

MyList = ['a,b,c,d,e']
Is there any way to split a list (MyList) with a single item, 'a,b,c,d,e', at each comma so I end up with:
MyList = ['a','b','c','d','e']
Split the first element.
MyList = ['a,b,c,d,e']
MyList = MyList[0].split(',')
Out:
['a', 'b', 'c', 'd', 'e']
Use the split method on the string
MyList = MyList[0].split(',')
See below
lst_1 = ['a,b,c,d,e']
lst_2 = [x for x in lst_1[0] if x != ',']
print(lst_2)
output
['a', 'b', 'c', 'd', 'e']

Python - For Loop - Print only if the above line is equal

I've the following code:
characters = ['a', 'b', 'b', 'c','d', 'b']
for i in characters:
if i[0] == i[-1]:
print(i)
Basically I only want to extract the characters that are equal from the line above. For example, in my case I only want to extract the b from 1 and 2 position.
How can I do that?
Thanks!
a = ['a', 'b', 'b', 'c', 'd', 'b']
b = ['a', 'b', 'b', 'c', 'd', 'b', 'd']
import collections
print([item for item, count in collections.Counter(a).items() if count > 1])
print([item for item, count in collections.Counter(b).items() if count > 1])
output
['b']
['b', 'd']
Without iterating multiple times over the same list.
characters = ['a', 'b', 'b', 'c','d', 'b']
last_char = None
output = []
for char in characters:
if char == last_char:
output.append(char)
last_char = char
print(output)
To extract the characters form the list which matches only the last char from list you can do the following:
characters = ['a', 'b', 'b', 'c','d', 'b']
for i in range(0, len(characters) - 1):
if characters[i] == characters[-1]:
print(characters[i])
In you snippet i when you are looping is the individual chars from your list, and it looks you were trying to access last, and first item from the list.
equal = [a for a in characters[0:-1] if a == characters[-1]]
Unless you also want the last character which will always be equal to itself, then do:
equal = [a for a in characters if a == characters[-1]]
little modification in your code
characters = ['a', 'b', 'b', 'c','d', 'b']
ch= (characters[-1])
for i in characters:
if i == ch:
print(i

Python 3 sort list -> all entries starting with lower case first

l1 = ['B','c','aA','b','Aa','C','A','a']
the result should be
['a','aA','b','c','A','Aa','B','C']
so same as l1.sort() but beginning with all words that start with lower case.
Try this:
>>> l = ['B', 'b','a','A', 'aA', 'Aa','C', 'c']
>>> sorted(l, key=str.swapcase)
['a', 'aA', 'b', 'c', 'A', 'Aa', 'B', 'C']
EDIT:
A one-liner using the list.sort method for those who prefer the imperative approach:
>>> l.sort(key=str.swapcase)
>>> print l
['a', 'aA', 'b', 'c', 'A', 'Aa', 'B', 'C']
Note:
The first approach leaves the state of l unchanged while the second one does change it.
Here is what you might be looking for:
li = ['a', 'A', 'b', 'B']
def sort_low_case_first(li):
li.sort() # will sort the list, uppercase first
index = 0 # where the list needs to be cuted off
for i, x in enumerate(li): # iterate over the list
if x[0].islower(): # if we uncounter a string starting with a lowercase
index = i # memorize where
break # stop searching
return li[index:]+li[:index] # return the end of the list, containing the sorted lower case starting strings, then the sorted uppercase starting strings
sorted_li = sort_low_case_first(li) # run the function
print(sorted_li) # check the result
>>> ['a', 'b', 'A', 'B']

Convert a list of strings and lists into a list of chars and lists?

I have a list of strings and variables. For example:
['oz_', A, 'ab'], where A is a list and I don't want anything to happen to it.
And I want to convert it in:
['o','z','_', A, 'a', 'b']
A is a list, so I don't want anything to change it. How can I do this?
You'll need to iterate over each element and turn it into a list if it's a string, but otherwise leave it as a variable and append it.
source = ['oz_', A, 'ab']
result = []
for name in source:
if isinstance(name, str):
result += name
else:
result.append(name)
Note: Use isinstance(name, basetring) for Python2.x if you want to account for other types of string like unicode.
Updated now that we know A shall not be altered.
A = []
seq = ['oz_', A, 'ab']
res = []
for elt in seq:
if isinstance(elt, str):
for e in list(elt):
res.append(e)
else:
res.append(elt)
print(res)
output:
['o', 'z', '_', [], 'a', 'b']
Obligatory one-liner:
>>> A = []
>>> seq = ['oz_', A, 'ab']
>>> [value for values in seq
... for value in (values if isinstance(values, str)
... else [values])]
['o', 'z', '_', [], 'a', 'b']
For converting a list of strings into a list of character, I see two approaches:
Either use a list comprehension, containing literally each char for each of the strings:
>>> lst = ['oz_', 'A', 'ab']
>>> [char for string in lst for char in string]
['o', 'z', '_', 'A', 'a', 'b']
Or join the strings and turn the result into a list:
>>> list(''.join(lst))
['o', 'z', '_', 'A', 'a', 'b']
If A is meant to be a variable and you want to preserve it, things get more tricky. If A is a string, then that's just not possible, as A will get evaluated and is then indistinguishable from the other strings. If it is something else, then you will have to differentiate between the two types:
>>> joined = []
>>> for x in lst:
... joined += x if isinstance(x, str) else [x] # +x extends, +[x] appends
If the complete elements of the list were strings, You could use itertools.chain.from_iterable() , it takes an iterable (like list/tuple, etc) and then for each iterable element inside it, it creates a new list consisting of the elements of those inner iterables (which in this case are strings). Example -
In [5]: lst = ['oz_', 'A', 'ab']
In [6]: list(chain.from_iterable(lst))
Out[6]: ['o', 'z', '_', 'A', 'a', 'b']
As given in the updated question -
A is a list, so I don't want anything to change it.
You can do this (similar to what #SuperBiasedMan is suggesting) (For Python 3.x) -
In [14]: lst = ['oz_', 'A', 'ab',[1,2,3]]
In [15]: ret = []
In [18]: for i in lst:
....: if isinstance(i, str):
....: ret.extend(i)
....: else:
....: ret.append(i)
....:
In [19]: ret
Out[19]: ['o', 'z', '_', 'A', 'a', 'b', [1, 2, 3]]
You can use basestring in Python 2.x to account for both unicode as well as normal strings.
Please also note, the above method does not check whether a particular object in the list came from variable or not, it just breaks strings up into characters and for all other types it keeps it as it is.
>>> [a for a in ''.join(['oz_', 'A', 'ab'])]
['o', 'z', '_', 'A', 'a', 'b']
You can use chain.from_iterable either way, you just need to wrap your non strings in a list:
from itertools import chain
out = list(chain.from_iterable([sub] if not isinstance(sub, str) else sub for sub in l))

How to split line into 0,1, or 2 arrays based on line content

I am writing a program that will receive input lines in one of four structures:
a,b
(a,b,c),d
a,(b,c,d)
(a,b),(c,d)
the number of members inside each parenthesis might change. Now, I want to translate each of the above lines as following
['a','b']
[['a','b','c'],'d']
['a',['b','c','d']]
[['a','b'],['c','d']]
I can think of a way to do this by checking each character, but knowing python, I'm certain there is a way to do this easily, probably using regular expressions. Is there?
Edit: Edited the desired output.
Consider:
import re, ast
input = """
a,b
(a,b,c),d
a,(b,c,d)
(a,b),(c,d)
"""
input = re.sub(r'(\w+)', r"'\1'", input)
for line in input.strip().splitlines():
print ast.literal_eval(line)
> ('a', 'b')
> (('a', 'b', 'c'), 'd')
> ('a', ('b', 'c', 'd'))
> (('a', 'b'), ('c', 'd'))
This creates tuples, not lists, but that would be an easy fix.
Just use a regular expression to replace the parenthesis and then concatenate a [ and ] on the end of the string.
Don't use a regular expression. Use a stack instead:
def parse(inputstring):
result = []
stack = [result]
value = ''
for char in inputstring:
if char == '(':
# new context
if value:
stack[-1].append(value)
value = ''
stack[-1].append([])
stack.append(stack[-1][-1])
elif char == ')':
if value:
stack[-1].append(value)
value = ''
# pop off context
stack.pop()
elif char == ',':
if value:
stack[-1].append(value)
value = ''
else:
value += char
if value:
stack[-1].append(value)
return result
Demo:
>>> parse('a,b')
['a', 'b']
>>> parse('(a,b,c),d')
[['a', 'b', 'c'], ',d']
>>> parse('a,(b,c,d)')
['a', ['b', 'c', 'd']]
>>> parse('(a,b),(c,d)')
[['a', 'b'], ['c', 'd']]
You could do this:
import re
st = """
a,b
(a,b,c),d
a,(b,c,d)
(a,b),(c,d)
"""
def element(e):
e=e.strip()
e=re.sub(r'(\w+)',r'"\1"', e)
e=e.replace('(','[')
e=e.replace(')',']')
code=compile('temp={}'.format(e), '<string>', 'exec')
exec code
return list(temp)
print [element(x) for x in st.splitlines() if x.strip()]
# [['a', 'b'], [['a', 'b', 'c'], 'd'], ['a', ['b', 'c', 'd']], [['a', 'b'], ['c', 'd']]]

Categories

Resources