Split function on list of strings python

Split function on list of strings python - python

I need to create a list of lists which can split a large string by newline first and then semi colon. I have a list of strings by splitting input by newline. I need to now take those elements in that list and split them by semi colon but is not letting me split again.
AttributeError: 'list' object has no attribute 'split'
items = sys.stdin.read()
collectionList = [(items.split('\n'))]
for item in collectionList:
item.split(':')

Try changing the second line to
collectionList = items.split( '\n' )
The split method automatically returns a list, so you don't need to encolse items.split( '\n' ) in square brackets. Also, you might want to store the result of each semicolon splitting in another list or some other kind of variable, for further processing:
results = []
for item in collectionList:
results.append( item.split( ':' ) )

Change the second line for this line
collectionList = items.split('\n')

Related

Tabs \n in list for python

I have simple script in python, want return per line the values
Tabs = # and \n
SCRIPT
output = ['192.168.0.1 #SRVNET\n192.168.0.254 #SRVDATA']
output = output[0].split('#')
output.split('\n')
OUTPUT
AttributeError: 'list' object has no attribute 'split'

After you split the first time, output is a list which doesn't support .split.
If splitting on two different items, you can use a regular expression with re.split:
>>> import re
>>> output = ['192.168.0.1 #SRVNET\n192.168.0.254 #SRVDADOS']
>>> re.split(r'\n|\s*#\s*',output[0]) # newline or comment (removing leading/trailing ws)
['192.168.0.1', 'SRVNET', '192.168.0.254', 'SRVDADOS']
You may want to group the IP with a comment as well, for example:
>>> [re.split(r'\s*#\s*',line) for line in output[0].splitlines()]
[['192.168.0.1', 'SRVNET'], ['192.168.0.254', 'SRVDADOS']]

The output of the line :
output = output[0].split('#')
is actually a list. ".split" always returns a list. In your case the output looks like this:
['192.168.0.1 ', 'SRVNET\n192.168.0.254 ', 'SRVDATA']
And as the error rightly points out, a list cannot be "split" using the ".split" which it does not support.
So now if you wanna further split the list when "#" is encountered, then this can be solved by iterating through the list and calling the split function like this:
output=['192.168.0.1 ', 'SRVNET\n192.168.0.254 ', 'SRVDATA']
for i in output:
if "\n" in i:
print("yes")
output_1=i.split("\n")
This will give the "output_1" as:
['SRVNET', '192.168.0.254 ']

If you don't want to use re, then you need to apply split("\n") to each element of output[0].split("#"), then concatenate the results together again. One way to do that is
result = [y for x in output[0].split("#") for y in x.split("\n")]

How to delete/ignore an entire line in a string if there is a certain character in that line using Python?

If a multi-line string contains a certain character like '$', how can I erase/ignore the whole line that character reside?
Note: The task is to get rid of any lines containing a certain character and not empty lines.
testString = """unknown value 1
unknown value 2
unknown value 3
$ unknown value 4
unknown value 5"""

First, you can split the string into a list of lines using the splitlines function. Then, using list comprehension you can iterate through the lines and test each line for the presence of "$", and return a new list of lines without any lines containing "$". Then you would recombine the new list with "\n" (the newline character) back into a string.
Here is the code:
testString = """unknown value 1
unknown value 2
unknown value 3
$ unknown value 4
unknown value 5"""
newTestString = "\n".join([x.strip() for x in testString.splitlines() if "$" not in x])

Create a generator, yielding that sring splitted by new line character. Instantiate that generator and in loop check it's output. If it (line) contains your special character, call next on the generator.

first, split the string into an array in which each elemnt is a line
myLines = testString.splitlines()
this will output an array each element was a line in your string, now loop the contents of this array and if you find an element with '$' , just remove it
for line in myLines:
for character in line:
if character == '$'
myLines.remove(line)
there's another method called strip() which may suit your needs too, try searching for it

How can I use list comprehension to replace a comma with an escaped comma in a list that contains strings and numbers

I have a tuple that contains elements that are both strings and numbers, and I am trying to replace a comma that exists in one of the elements of the tuple with an escaped comma, e.g. the input looks like the following
('', 'i_like_cats', 'I like, cookies', 10319708L, "* / Item ID='10319708'", 'i_wish_was_an_oscar_meyer_weiner',
0.101021321)
and what I want as output is
('', 'i_like_cats', 'I like\, cookies', 10319708L, "* / Item ID='10319708'", 'i_wish_was_an_oscar_meyer_weiner',
0.101021321)
What I want to do is replace the , after like with /, because I'm outputting the content to a csv file, so when the next step of the pipeline reads the file it splits the file at the comma between 'like' and 'cookies' even though I don't want it to. I am unable to modify the code of the downstream part of the pipeline so I can't switch to something like a semicolon or tab delimited file, and I can't change.
What I've tried to do is use list comprehension to solve this as follows
line = map(lambda x: str.replace(x, '"', '\"'), line)
but that generates a type error indicating that replace requires a string object but it received a long.
The only solution I can think of is to break the tuple down into it's individual elements, modify the element that contains the comma, and then build a new tuple and append each of the elements back on, but it seems like there has to be a more pythonic way to do that.
Thanks!

I think list comprehension works the best for apply a rule for every element; if you know which single string you would like to change, why can't you just change that single one?
If you want to change all comma into escaped comma, probably try this out:
strtuple = ('', 'i_like_cats', 'I like, cookies', 10319708, "* / Item ID='10319708'", 'i_wish_was_an_oscar_meyer_weiner',
0.101021321)
converted_strlist = [i.replace(',', '\\,') if isinstance(i, str) else i for i in strtuple]

You might want to verify an element is a str instance before calling replace on it.
Something like:
data = (
'',
'i_like_cats',
'I like, cookies',
10319708L,
"* / Item ID='10319708'",
'i_wish_was_an_oscar_meyer_weiner',
0.101021321,
)
line = [
x.replace(",", "\,") if isinstance(x, str) else x for x in data
]

Using replace method in python 3.6

I need to replace "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.," with a blank. I am using replace method but it seems it is deprecated on python 3.6. word_list = [] is a list which will have all the words extracted from the webpage. Then clean_up_list method will clean the symbols and replace them with blank space.
I used for to loop through the length of symbols and replace symbols with blank. I used
word = word.replace(symbols[i],"") ; Any help on how to use the replace method so that symbols are replaced and words are printed without symbols between them.
Error:
AttributeError: 'list' object has no attribute 'replace'
My Code:
url = urllib.request.urlopen("https://www.servicenow.com/solutions-by-category.html").read()
word_list = []
soup = bs.BeautifulSoup(url,'lxml')
word_list.append([element.get_text() for element in soup.select('a')])
print(word_list)
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"
for i in range(0,len(symbols)):
word = word.replace(symbols[i],"")
#print(type(word))
#print(type(word))
#word.replace(symbols[i]," ")
if(len(word) > 0):
#print(word)
clean_word_list.append(word)

There are two errors here: first you do not construct a list of strings, but a list of lists of strings. This line:
word_list.append([element.get_text() for element in soup.select('a')])
should be:
word_list.extend([element.get_text() for element in soup.select('a')])
Furthermore you cannot call replace on the list directly (it is not a method of a list object). You need to this for every entry.
Next you also specify (correctly) than you then have to call replace(..) for every character in the symbols string. Which is of course inefficient. You can however use translate(..) for that.
So you can replace the entire for loop with with list comprehension:
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"
clean_word_list = [word.translate(None,symbols) for word in word_list]

Try explicitly converting the word to a string, as the error code you're receiving mentions the object is a 'list' not string and that the replace method cannot be called on lists. For example (notice the second to last line):
def clean_up_list(word_list):
clean_word_list = []
for word in word_list:
word = str(word)
symbols = "!##$%^&*()\n{}[]()_-+=<>?\xa0;'/.,"

separate line in words by slash and use /W but avoid :

i am trying to parse a txt file with a lot of lines like this:
470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003
i am making a dictionary where the key is the first number on the line, and the values are (for each key) the words separated by the slash "/", every one of this words is saved into a list, for example list1 gets all cms_trk_dcs_05:CAEN, list2 would be all CMS_TRACKER_SY1527_7, etc
but when i use pattern = re.split('\W',line) to split the line, it takes into account
the ":" character, i mean when i try to print cms_trk_dcs_05:CAEN it only returns cms_trk_dcs_05, how can i save in the list all the word cms_trk_dcs_05:CAEN, and save in my list all the words separated by slash
I am new at python, so i apologize if this is for dummys
anyway thank you in advance

Use split() to match first the space after the number, and then the '/':
>>> stringin = "470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003"
>>> splitstring = stringin.split(' ')
>>> num = splitstring[0]
>>> stringlist = splitstring[1].split('/')
>>> num
'470115572'
>>> stringlist
['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']
>>>
Or as a (less obvious) one-liner:
>>> [x.split('/') for x in stringin.split(' ')]
[['470115572'], ['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']]
Note, though, that the second approach creates the first element as a list.

As in Trimax's comment: : (colon) is a nonword character, so to split line correctly you need to include it in pattern. Or use SiHa's answer.
About pattern, \W equals to [^a-zA-Z0-9_] (https://docs.python.org/2/library/re.html#regular-expression-syntax), so you can just add colon to it: [^a-zA-Z0-9_:]
As for second part, just use first element of result list as dict key and assign remained list to it in form of slice.
Something like this:
result_dict = {}
for line in file_lines:
line_splitted = re.split('[^a-zA-Z0-9_:]+', line)
result_dict[line_splitted[0]] = line_splitted[1:]
Note though, if your text contains lines with same numbers, you'll lose data, as when assigning new value (list of words in this case) to existing key, it will overwrite previous value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split function on list of strings python - python

Change the second line for this line collectionList = items.split('\n')

Related

Tabs \n in list for python

How to delete/ignore an entire line in a string if there is a certain character in that line using Python?

How can I use list comprehension to replace a comma with an escaped comma in a list that contains strings and numbers

Using replace method in python 3.6

separate line in words by slash and use /W but avoid :

Categories

Resources