I have a list:
my_list = ['"3"', '"45"','"12"','"6"']
This list has single and double quotes and the item value. How can I replace either the single or double quotes from each item. I tried below, but the results are same:
my_list = [i.replace("''", " ") for i in my_list]
Your list doesn't contain any strings with single quotes. I think you are confusing the repr() representation of the strings with their values.
When you print a Python standard library container such as a list (or a tuple, set, dictionary, etc.) then the contents of such a container are shown their repr() representation output; this is great when debugging because it makes it clear what type of objects you have. For strings, the representation uses valid Python string literal syntax; you can copy the output and paste it into another Python script or the interactive interpreter and you'll get the exact same value.
For example, s here is a string that contains some text, some quote characters, and a newline character. When I print the string, the newline character causes an extra blank line to be printed, but when I use repr(), you get the string value in Python syntax form, where the single quotes are part of the syntax, not the value. Note that the newline character also is shown with the \n syntax, exactly the same as when I created the s string in the first place:
>>> s = 'They heard him say "Hello world!".\n'
>>> print(s)
They heard him say "Hello world!".
>>> print(repr(s))
'They heard him say "Hello world!".\n'
>>> s
'They heard him say "Hello world!".\n'
And when I echoed the s value at the end, the interactive interpreter also shows me the value using the repr() output.
So in your list, your strings do not have the ' characters as part of the value. They are part of the string syntax. You only need to replace the " characters, they are part of the value, because they are inside the outermost '...' string literal syntax. You could use str.replace('"', '') to remove them:
[value.replace('"', '') for value in my_list]
or, you could use the str.strip() method to only remove quotes that are at the start or end of the value:
[value.strip('"') for value in my_list]
Both work just fine for your sample list:
>>> my_list = ['"3"', '"45"','"12"','"6"']
>>> [value.replace('"', '') for value in my_list]
['3', '45', '12', '6']
>>> [value.strip('"') for value in my_list]
['3', '45', '12', '6']
Again, the ' characters are not part of the value:
>>> first = my_list[0].strip('"')
>>> first # echo, uses repr()
'3'
>>> print(first) # printing, the actual value written out
3
>>> len(first) # there is just a single character in the string
1
However, I have seen that you are reading your data from a tab-separated file that you hand-parse. You can avoid having to deal with the " quotes altogether if you instead used the csv.reader() object, configured to handle tabs as the delimiter. That class automatically will handle quoted columns:
import csv
with open(inputfile, 'r', newline='') as datafile:
reader = csv.reader(datafile, delimiter='\t')
for row in reader:
# row is a list with strings, *but no quotes*
# e.g. ['3', '45', '12', '6']
Demo showing how csv.reader() handles quotes:
>>> import csv
>>> lines = '''\
... "3"\t"45"\t"12"\t"6"
... "42"\t"81"\t"99"\t"11"
... '''.splitlines()
>>> reader = csv.reader(lines, delimiter='\t')
>>> for row in reader:
... print(row)
...
['3', '45', '12', '6']
['42', '81', '99', '11']
As suggested by #MartijnPieters in comments, you can use replace on the strings to get the desired output.
The change I like to suggest is that using .replace('"', '') instead of .replace('"', ' '). Otherwise the resultant strings will have a leading and trailing white space
You can use list comprehension to deal with the list you have like this
my_list = ['"3"', '"45"','"12"','"6"']
new_list = [x.replace('"', '') for x in my_list]
print(new_list) # ['3', '45', '12', '6']
You can use split:
[x.split('"')[1] for x in my_list]
or you can use:
[x.strip('"') for x in my_list]
Related
I am reading string information as input from a text file and placing them into lists, and one of the lines is like this:
30121,long,Mehtab,10,20,,30
I want to remove the empty value in between the ,, portion from this list, but have had zero results. I've tried .remove() and filter(). Python reads it as a 'str' value.
>>> import re
>>> re.sub(',,+', ',', '30121,long,Mehtab,10,20,,30')
'30121,long,Mehtab,10,20,30'
Use split() and remove()
In [11]: s = '30121,long,Mehtab,10,20,,30'
In [14]: l = s.split(',')
In [15]: l.remove('')
In [16]: l
Out[16]: ['30121', 'long', 'Mehtab', '10', '20', '30']
Filter should work. First I am writing the data in a list and then using filter operation to filter out items in a list which which are empty. In other words, only taking items that are not empty.
data = list("30121","long","Mehtab",10,20,"",30)
filtered_data = list(filter(lambda str: str != '', data))
print(filtered_data)
You can split the string based on your separator ("," for this) and then use list comprehension to consolidate the elements after making sure they are not blank.
",".join([element for element in string.split(",") if element])
We can also use element.strip() as if condition if we want to filter out string with only spaces.
When I was trying to read a file and store it in a list its failing to store string which is inside a single quote as single value in list.
sample file:
12 3 'dsf dsf'
the list should contain
listname = [12, 3, 'dsf dsf']
I am able to do this like below:
listname = [12, 3, 'dsf', 'dsf']
Please help
Use the csv module.
Demo:
>>> import csv
>>> with open('input.txt') as inp:
... print(list(csv.reader(inp, delimiter=' ', quotechar="'"))[0])
...
['12', '3', 'dsf dsf']
input.txt is the file containing your data in the example.
You can use shlex module to split your data in a simple way.
import shlex
data = open("sample file", 'r')
print shlex.split(data.read())
Try it:)
You can use regular expressions:
import re
my_regex = re.compile(r"(?<=')[\w\s]+(?=')|\w+")
with open ("filename.txt") as my_file:
my_list = my_regex.findall(my_file.read())
print(my_list)
Output for file content 12 3 'dsf dsf':
['12', '3', 'dsf dsf']
RegEx explanation:
(?<=') # matches if there's a single quote *before* the matched pattern
[\w\s]+ # matches one or more alphanumeric characters and spaces
(?=') # matches if there's a single quote *after* the matched pattern
| # match either the pattern above or below
\w+ # matches one or more alphanumeric characters
You can use:
>>> l = ['12', '3', 'dsf', 'dsf']
>>> l[2:] = [' '.join(l[2:])]
>>> l
['12', '3', 'dsf dsf']
Basically, you need to parse the data. Which is:
split it into tokens
interpret the resulting sequence
in your case, each token can be interpreted separately
For the 1st task:
each token is:
a set nonspace characters, or
a quote, then anything until another quote.
the separator is a single space (you didn't specify if runs of spaces/other whitespace characters are valid)
Interpretation:
quoted: take the enclosed text, discarding the quotes
non-quoted: convert to integer if possible (you didn't specify if it always is/should be an interger)
(you also didn't specify if it's always 2 integers + quoted string - i.e. if this combination should be enforced)
Since the syntax is very simple, the two tasks can be done at the same time:
import re
i=0
maxi=len(line)
tokens=[]
re_sep=r"\s"
re_term=r"\S+"
re_quoted=r"'(?P<enclosed>[^']*)'"
re_chunk=re.compile("(?:(?P<term>%(re_term)s)"\
"|(?P<quoted>%(re_quoted)s))"\
"(?:%(re_sep)s|$)"%locals())
del re_sep,re_term,re_quoted
while i<maxi:
m=re.match(re_chunk,line,i)
if not m: raise ValueError("invalid syntax at char %d"%i)
gg=m.groupdict()
token=gg['term']
if token:
try: token=int(token)
except ValueError: pass
elif gg['quoted']:
token=gg['enclosed']
else: assert False,"invalid match. locals=%r"%locals()
tokens.append(token)
i+=m.end()
del m,gg,token
This is an example of how it can be done by hand. You can, however, reuse any existing parsing algorithm that can process the same syntax. csv and shlex suggested in other answers are examples. Do note though that they likely accept other syntax, too, which you may or may not want. E.g.:
shlex also accepts double quotes and constructs like "asd"fgh and 'asd'\''fgh'
csv allows multiple consecutive separators (producing an empty element) and things like 'asd'fgh (stripping the quotes) and asd'def' (leaving the quotes intact)
When I run this input (saved as variable 'line'):
xsc_i,202,"House of Night",21,"/21_202"
through a csv reader:
for row in csv.reader(line):
print row
it splits the strings, not just the fields
['x']
['s']
['c']
['_']
['i']
['', '']
['2']
['0']
['2']
['', '']
etc.
It exhibits this behavior even if I explicitly set the delimiter:
csv.reader(line, delimiter=",")
It's treating even strings as arrays, but I can't figure out why, and I can't just split on commas because many commas are inside "" strings in the input.
Python 2.7, if it matters.
The first argument to csv.reader() is expected to be an iterable object containing csv rows. In your case the input is a string (which is also iterable) containing a single row. You need to enclose the line into a list:
for row in csv.reader([line]):
print row
Demo:
>>> import csv
>>> line = 'xsc_i,202,"House of Night",21,"/21_202"'
>>> for row in csv.reader([line]):
... print row
...
['xsc_i', '202', 'House of Night', '21', '/21_202']
Just in case you want to see re in action.
import re
line='xsc_i,202,"House of Night",21,"/21_202"'
print map(lambda x:x.strip('"'),re.split(r',(?=(?:[^"]*"[^"]*")*[^"]*$)',line))
Output:['xsc_i', '202', 'House of Night', '21', '/21_202']
This is because csv.reader expects
any object which supports the iterator protocol and returns a string
each time its next() method is called
You have passed a string to the reader.
If you say:
line = ['xsc_i,202,"House of Night",21,"/21_202"',]
Your code should work as expected.
Please see docs
How do I convert "1,,2'3,4'" into a list? Commas separate the individual items, unless they are within quotes. In that case, the comma is to be included in the item.
This is the desired result: ['1', '', '2', '3,4']. One regex I found on another thread to ignore the quotes is as follows:
re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
But this gives me this output:
['', '1', ',,', "2'3,4'", '']
I can't understand, where these extra empty strings are coming from, and why the two commas are even being printed at all, let alone together.
I tried making this regex myself:
re.compile(r'''(, | "[^"]*" | '[^']*')''')
which ended up not detecting anything, and just returned my original list.
I don't understand why, shouldn't it detect the commas at the very least? The same problem occurs if I add a ? after the comma.
Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:
from cStringIO import StringIO
from csv import reader
file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
print row
This results in the following output:
['1', '', '2', '3,4']
pyparsing includes a predefined expression for comma-separated lists:
>>> from pyparsing import commaSeparatedList
>>> s = "1,,2'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', "2'3", "4'"]
Hmm, looks like you have a typo in your data, missing a comma after the 2:
>>> s = "1,,2,'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', '2', "'3,4'"]
Suppose the following code (notice the commas inside the strings):
>>> a = ['1',",2","3,"]
I need to concatenate the values into a single string. Naive example:
>>> b = ",".join(a)
>>> b
'1,,2,3,'
And later I need to split the resulting object again:
>>> b.split(',')
['1', '', '2', '3', '']
However, the result I am looking for is the original list:
['1', ',2', '3,']
What's the simplest way to protect the commas in this process? The best solution I came up with looks rather ugly.
Note: the comma is just an example. The strings can contain any character. And I can choose other characters as separators.
The strings can contain any character.
If no matter what you use as a delimiter, there is a chance that the item itself contains the delimiter character, then use the csv module:
import csv
class PseudoFile(object):
# http://stackoverflow.com/a/8712426/190597
def write(self, string):
return string
writer = csv.writer(PseudoFile())
This concatenates the items in a using commas:
a = ['1',",2","3,"]
line = writer.writerow(a)
print(line)
# 1,",2","3,"
This recovers a from line:
print(next(csv.reader([line])))
# ['1', ',2', '3,']
Do you have to use comas to separate the items? Else you could also use another symbol that is not used in the items of the list.
In [1]: '|'.join(['1', ',2', '3,']).split('|')
Out[1]: ['1', ',2', '3,']
Edit: The string may apparently contain any character. Is it an option to use the json module? You could just dump and load the list.
In [3]: json.dumps(['1', ',2', '3,'])
Out[3]: '["1", ",2", "3,"]'
In [4]: json.loads('["1", ",2", "3,"]')
Out[4]: [u'1', u',2', u'3,']
Edit #2: If you may not use it, you could use str.encode('string-encode') to escape the characters in your string and then enclose the encoded version into single quotes and separate those with comas:
In [10]: print "'example'".encode('string-escape')
\'example\' #' (have to close the opened string for stackoverflow
In [11]: print r"\'example\'".decode('string-escape')
'example'
Edit #3: Running example of str.encode('string-encode'):
import re
def list_to_str(list):
return ','.join("'{}'".format(s.encode('string-escape')) for s in list)
def str_to_list(str):
return re.findall(r"'([^']*)'", str)
if __name__ == '__main__':
a = ['1', ',2', '3,']
b = list_to_str(a)
print 'It is {} that this works.'.format(str_to_list(b) == a)
When you are serializing a list to a String, then you need to choose as a separator a character that doesn't appear in the list items. Can't you just replace the comma with another character?
b = ";".join(a)
b.split(';')
Does the delimiter need to be only a single character? If not then you can use a delimiter made up of a sequence of characters that definitley wont appear in your string, like |#| or something similar.
You need to escape the comma and probably also escape the escape sequence. Here's one way:
>>> a = ['1',",2","3,"]
>>> b = ','.join(s.replace('%', '%%').replace(',', '%2c') for s in a)
>>> [s.replace('%2c', ',').replace('%%', '%') for s in b.split(',')]
['1', ',2', '3,']
>>> b
'1,%2c2,3%2c'
>>>
I would join and split using another character than ",", e.g. ";":
>>> b = ";".join(a)
>>> b.split(';')
['1', ',2', '3,']