separate line in words by slash and use /W but avoid : - python

i am trying to parse a txt file with a lot of lines like this:
470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003
i am making a dictionary where the key is the first number on the line, and the values are (for each key) the words separated by the slash "/", every one of this words is saved into a list, for example list1 gets all cms_trk_dcs_05:CAEN, list2 would be all CMS_TRACKER_SY1527_7, etc
but when i use pattern = re.split('\W',line) to split the line, it takes into account
the ":" character, i mean when i try to print cms_trk_dcs_05:CAEN it only returns cms_trk_dcs_05, how can i save in the list all the word cms_trk_dcs_05:CAEN, and save in my list all the words separated by slash
I am new at python, so i apologize if this is for dummys
anyway thank you in advance

Use split() to match first the space after the number, and then the '/':
>>> stringin = "470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003"
>>> splitstring = stringin.split(' ')
>>> num = splitstring[0]
>>> stringlist = splitstring[1].split('/')
>>> num
'470115572'
>>> stringlist
['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']
>>>
Or as a (less obvious) one-liner:
>>> [x.split('/') for x in stringin.split(' ')]
[['470115572'], ['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']]
Note, though, that the second approach creates the first element as a list.

As in Trimax's comment: : (colon) is a nonword character, so to split line correctly you need to include it in pattern. Or use SiHa's answer.
About pattern, \W equals to [^a-zA-Z0-9_] (https://docs.python.org/2/library/re.html#regular-expression-syntax), so you can just add colon to it: [^a-zA-Z0-9_:]
As for second part, just use first element of result list as dict key and assign remained list to it in form of slice.
Something like this:
result_dict = {}
for line in file_lines:
line_splitted = re.split('[^a-zA-Z0-9_:]+', line)
result_dict[line_splitted[0]] = line_splitted[1:]
Note though, if your text contains lines with same numbers, you'll lose data, as when assigning new value (list of words in this case) to existing key, it will overwrite previous value.

Related

How to avoid .replace replacing a word that was already replaced

Given a string, I have to reverse every word, but keeping them in their places.
I tried:
def backward_string_by_word(text):
for word in text.split():
text = text.replace(word, word[::-1])
return text
But if I have the string Ciao oaiC, when it try to reverse the second word, it's identical to the first after beeing already reversed, so it replaces it again. How can I avoid this?
You can use join in one line plus generator expression:
text = "test abc 123"
text_reversed_words = " ".join(word[::-1] for word in text.split())
s.replace(x, y) is not the correct method to use here:
It does two things:
find x in s
replace it with y
But you do not really find anything here, since you already have the word you want to replace. The problem with that is that it starts searching for x from the beginning at the string each time, not at the position you are currently at, so it finds the word you have already replaced, not the one you want to replace next.
The simplest solution is to collect the reversed words in a list, and then build a new string out of this list by concatenating all reversed words. You can concatenate a list of strings and separate them with spaces by using ' '.join().
def backward_string_by_word(text):
reversed_words = []
for word in text.split():
reversed_words.append(word[::-1])
return ' '.join(reversed_words)
If you have understood this, you can also write it more concisely by skipping the intermediate list with a generator expression:
def backward_string_by_word(text):
return ' '.join(word[::-1] for word in text.split())
Splitting a string converts it to a list. You can just reassign each value of that list to the reverse of that item. See below:
text = "The cat tac in the hat"
def backwards(text):
split_word = text.split()
for i in range(len(split_word)):
split_word[i] = split_word[i][::-1]
return ' '.join(split_word)
print(backwards(text))

How do I replace certain pieces in a string in a list in python?

['Parent=transcript:Zm00001d034962_T001', 'Parent=transcript:Zm00001d034962_T002', 'Parent=transcript:Zm00001d034962_T003', 'Parent=transcript:Zm00001d034962_T003', 'Parent=transcript:Zm00001d034962_T004', 'Parent=transcript:Zm00001d034962_T005', 'Parent=transcript:Zm00001d034962_T005', 'Parent=transcript:Zm00001d034962_T005']
This is what it looks like.
I would like to replace Parent=transcript: and _T00
please help. not sure what command to use
Use python's built-in replace() function. For the last part, if it's always 5 characters you can easily exclude them:
items = [
'Parent=transcript:Zm00001d034962_T001',
'Parent=transcript:Zm00001d034962_T002',
'Parent=transcript:Zm00001d034962_T003',
'Parent=transcript:Zm00001d034962_T003',
'Parent=transcript:Zm00001d034962_T004',
'Parent=transcript:Zm00001d034962_T005',
'Parent=transcript:Zm00001d034962_T005',
'Parent=transcript:Zm00001d034962_T005'
]
# use enumerate to replace the item in the list
for index, item in enumerate(items):
# this replaces the items with an empty string, deleting it
new_item = item.replace('Parent=transcript:', '')
# this accesses all the characters in the string minus the last 5
new_item = new_item[0:len(new_item) - 5] + "whatever you want to replace that with"
# replace the item in the list
items[index] = new_item
I am assuming you want to replace the following strings to ''
Replace Parent=transcript: to ''
Replace _T00 to ''
For example,
'Parent=transcript:Zm00001d034962_T001' will get replaced as 'Zm00001d0349621'.
The ending string 1 from _T001 will get concatenated to Zm00001d034962.
If that is your expected result, the code is:
new_list = [x.replace('Parent=transcript:','').replace('_T00','') for x in input_list]
print (new_list)
The output of new_list will be:
['Zm00001d0349621', 'Zm00001d0349622', 'Zm00001d0349623', 'Zm00001d0349623', 'Zm00001d0349624', 'Zm00001d0349625', 'Zm00001d0349625', 'Zm00001d0349625']
Note you can replace '' with whatever you want the new string to be. I have marked it as '' as I don't know what your new replaced string will be.

Python - Finding all numeric values in a string, then storing each numeric in a list uniquely

I would like to be able to grab any and all numeric values from a string if found. Then store them in a list individually.
Currently able to identify all numeric values, but not able to figure out how to store them individually.
phones = list()
comment = "Sues phone numbers are P#3774794773 and P#6047947730."
words = comment.split()
for word in words:
word = word.rstrip()
nums = re.findall(r'\d{10,10}',word)
if nums not in phones:
phones.append(nums)
print(phones)
I would like to get those two values to be stored as such.... 3774794773,6047947730. Instead of a list within a list.
End goal output (print) each value separately.
Current Print: [ [], ['3774794773'], ['6047947730'] ]
Needed Print: 3774794773, 6047947730
Thanks in advance.
You're doing a double job with the regex (split is also basically regex based) just do the whole thing with a 10 digit number matching regex, like so:
comment = "Sues phone numbers are P#3774794773 and P#6047947730."
nums = re.findall(r'\d{10,10}', comment)
print(nums)
If you want the numbers also to be exact (not to match longer sequences) you can do the following:
comment = "Sues phone numbers are P#3774794773 123145125125215 and P#6047947730."
nums = re.findall(r'\b\d{10,10}\b', comment)
print(nums)
(\b is an interesting regex symbol which doesn't really match a part of the string but rather matches "the space between characters" in the string)
both result in:
['3774794773', '6047947730']
Save your comment variable in a file and then use this code to separate them into variables
with open("CS.txt", "r") as f:
number1,number2 = f.read().split(" ")
print(number1)
print(number2)

Python split by character only if wrapped in parenthesis

I am parsing a large text file that has key value pairs separated by '='. I need to split these key value pairs into a dictionary. I was simply going to split by '='. However I noticed that some of the values contain the equals sign character. When a value contains the equals sign character, it seems to be always wrapped in parenthesis.
Question: How can I split by equals sign only when the equals sign is not in between two parenthesis?
Example data:
PowSup=PS1(type=Emerson,fw=v.03.05.00)
Desired output:
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}
UPDATE: The data does not seem to have any nested parenthesis. (Hopefully that remains true in the future)
UPDATE 2: The key doesn't ever seem to have equals sign either.
UPDATE 3: The full requirements are much more complicated and at this point I am stuck so I have opened up a new question here: Python parse output of mixed format text file to key value pair dictionaries
You could try partition('=') to split from the first instance
'PowSup=PS1(type=Emerson,fw=v.03.05.00)'.partition('=')[0:3:2]
mydict=dict()
for line in file:
k,v=line.split('=',1)
mydict[k]=v
Simple solution using str.index() function:
s = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
pos = s.index('=') # detecting the first position of `=` character
print {s[:pos]:s[pos+1:]}
The output:
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}
You can limit the split() operation to a single split (the first =):
>>> x = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
>>> x.split('=', 1)
['PowSup', 'PS1(type=Emerson,fw=v.03.05.00)']
You can then use these values to populate your dict:
>>> x = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
>>> key, value = x.split('=', 1)
>>> out = {}
>>> out[key] = value
>>> out
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}

Last element in python list, created by splitting a string is empty

So I have a string which I need to parse. The string contains a number of words, separated by a hyphen (-). The string also ends with a hyphen.
For example one-two-three-.
Now, if I want to look at the words on their own, I split up the string to a list.
wordstring = "one-two-three-"
wordlist = wordstring.split('-')
for i in range(0, len(wordlist)):
print(wordlist[i])
Output
one
two
three
#empty element
What I don't understand is, why in the resulting list, the final element is an empty string.
How can I omit this empty element?
Should I simply truncate the list or is there a better way to split the string?
You have an empty string because the split on the last - character produces an empty string on the RHS. You can strip all '-' characters from the string before splitting:
wordlist = wordstring.strip('-').split('-')
If the final element is always a - character, you can omit it by using [:-1] which grabs all the elements of the string besides the last character.
Then, proceed to split it as you did:
wordlist = wordstring[:-1].split('-')
print(wordlist)
['one', 'two', 'three']
You can use regex to do this :
import re
wordlist = re.findall("[a-zA-Z]+(?=-)", wordstring)
Output :
['one', 'two', 'three']
You should use the strip built-in function of Python before splitting your String. E.g:
wordstring = "one-two-three-"
wordlist = wordstring.strip('-').split('-')
I believe .split() is assuming there is another element after the last - but it is obviously a blank entry.
Are you open to removing the dash in wordstring before splitting it?
wordstring = "one-two-three-"
wordlist = wordstring[:-1].split('-')
print wordlist
OUT: 'one-two-three'
This is explained in the docs:
...
If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']).
...
If you know your strings will always end in '-', then just remove the last one by doing wordlist.pop().
If you need something more complicated you may want to learn about regular expressions.
Just for the variaty of options:
wordlist = [x for x in wordstring.split('-') if x]
Note that the above also handles cases such as: wordstring = "one-two--three-" (double hyphen)
First strip() then split()
wordstring = "one-two-three-"
x = wordstring.strip('-')
y = x.split('-')
for word in y:
print word
Strip/trim the string before splitting. This way you will remove the trailing "\n" and you should be fine.

Categories

Resources