I am parsing a large text file that has key value pairs separated by '='. I need to split these key value pairs into a dictionary. I was simply going to split by '='. However I noticed that some of the values contain the equals sign character. When a value contains the equals sign character, it seems to be always wrapped in parenthesis.
Question: How can I split by equals sign only when the equals sign is not in between two parenthesis?
Example data:
PowSup=PS1(type=Emerson,fw=v.03.05.00)
Desired output:
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}
UPDATE: The data does not seem to have any nested parenthesis. (Hopefully that remains true in the future)
UPDATE 2: The key doesn't ever seem to have equals sign either.
UPDATE 3: The full requirements are much more complicated and at this point I am stuck so I have opened up a new question here: Python parse output of mixed format text file to key value pair dictionaries
You could try partition('=') to split from the first instance
'PowSup=PS1(type=Emerson,fw=v.03.05.00)'.partition('=')[0:3:2]
mydict=dict()
for line in file:
k,v=line.split('=',1)
mydict[k]=v
Simple solution using str.index() function:
s = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
pos = s.index('=') # detecting the first position of `=` character
print {s[:pos]:s[pos+1:]}
The output:
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}
You can limit the split() operation to a single split (the first =):
>>> x = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
>>> x.split('=', 1)
['PowSup', 'PS1(type=Emerson,fw=v.03.05.00)']
You can then use these values to populate your dict:
>>> x = "PowSup=PS1(type=Emerson,fw=v.03.05.00)"
>>> key, value = x.split('=', 1)
>>> out = {}
>>> out[key] = value
>>> out
{'PowSup': 'PS1(type=Emerson,fw=v.03.05.00)'}
Related
This question already has answers here:
Remove char at specific index - python
(8 answers)
Closed 3 years ago.
I am new to Python and I‘m currently just messing around a little bit... but I am stuck with this one problem. I am trying to remove certain indexes from a string with a for loop. My Idea was something like this:
Text="Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
for i in range(0,7):
Text=Text.replace(Text[i], "")
print(Text)
But it removes only one index and is restoring the already replaced ones, for example:
1.loop: vusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
2.loop: Gusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
Surely, there is many ways to get the desired result. Below, there is a good one based on your logic (using a for loop). As you are replacing a character by an empty character, It is better to directly remove the desired character.
In the code, I have transformed the text into a list for easy handling. After that, I have removed all characters based on the indexes. Finally, a join operation yields the desired text.
text = "Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
text_list = list(text)
for index in range(0, 7):
text_list.remove(text[index])
text = ''.join(text_list)
print(text) # dsz8audvbsauzdgsavuczisagbcsuzaicbhas
If I understand what you are trying to do, you are trying to omit letters of the string Text that are in indexes 0,1,2,3,4,5 and 6. but your code doesn't do that currently, but indeed it will take the first letter which is G and it will remove it from all the string(there is one occurrence), the next loop Text is equal to u because Text is equal vusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas after omitting G in the first iteration, as there are five occurrences of u, Text will be equal to vsaibdsz8advbsazdgsavczisagbcszaicbhas and so on..
you can put print(Text) inside the for loop and watch the results:
>>> Text="Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
>>> for i in range(0,7):
... Text=Text.replace(Text[i], "")
... print(Text)
...
vusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
vsaibdsz8advbsazdgsavczisagbcszaicbhas
vsibdsz8dvbszdgsvczisgbcszicbhs
vsidsz8dvszdgsvczisgcszichs
vidz8dvzdgvczigczich
viz8vzgvczigczich
viz8vzvcziczich
In Python you can do that without a loop and the best way for that by using slicing as follows:
Text = Text[7:]
This will give you Text equal to dsz8audvbsauzdgsavuczisagbcsuzaicbhas.
If your goal is to reach this through a loop(supposing you are in need of Text in every iteration), you can try this:
>>> Text="Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
>>> for i in range(0,7):
... Text = Text[1:]
... print(Text)
...
vusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
usaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
saibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
aibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
ibdsz8audvbsauzdgsavuczisagbcsuzaicbhas
bdsz8audvbsauzdgsavuczisagbcsuzaicbhas
dsz8audvbsauzdgsavuczisagbcsuzaicbhas
I hope this will help!
Take a look at string slicing:
text = "Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
newtext = text[1:]
print(newtext)
--> "Gvusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
--> "vusaibdsz8audvbsauzdgsavuczisagbcsuzaicbhas"
You can replace
Text=Text.replace(Text[i], "")
with
Text = Text[:i] + Text[i+1:]
which uses slicing to rebuild a string without the specific index.
Side note: variable names should be lower case.
I am reading some data from a dataframe column and I do some manipulation on each value if the value contains a "-". These manipulations include spliting based on the "-". However I do not understand why each value in the list has an "\n*" as for instance
['2010\n1', '200\n2 450\n3', ..., '1239\n1000']
here is a sample of my code:
splited = []
wantedList = []
val = str(x) # x represents the value in the value read from the dataframe column
print val # the val variable does not does not contain those special characters
if val.find('-') != -1:
splited = val.split('-')
wantedList.append(splited[0])
print splited # splited list contains those special characters
print wantedList # wantedList contains those special characters
I guess this has to do with the way I created the list or the way I am appending to it.
Does anyone knows why something like this does happen
There isn't nothing in your code that could possibly automagically add a new line character at some random position within your strings. I'd say the characters are already in the string but print isn't showing as \n but as a new line.
You can confirm that by printing the representation of the string:
print repr(val)
If you want them out of your strings, you can with a simple str.replace for all \n.
I have this string and I need to get a specific number out of it.
E.G. encrypted = "10134585588147, 3847183463814, 18517461398"
How would I pull out only the second integer out of the string?
You are looking for the "split" method. Turn a string into a list by specifying a smaller part of the string on which to split.
>>> encrypted = '10134585588147, 3847183463814, 18517461398'
>>> encrypted_list = encrypted.split(', ')
>>> encrypted_list
['10134585588147', '3847183463814', '18517461398']
>>> encrypted_list[1]
'3847183463814'
>>> encrypted_list[-1]
'18517461398'
Then you can just access the indices as normal. Note that lists can be indexed forwards or backwards. By providing a negative index, we count from the right rather than the left, selecting the last index (without any idea how big the list is). Note this will produce IndexError if the list is empty, though. If you use Jon's method (below), there will always be at least one index in the list unless the string you start with is itself empty.
Edited to add:
What Jon is pointing out in the comment is that if you are not sure if the string will be well-formatted (e.g., always separated by exactly one comma followed by exactly one space), then you can replace all the commas with spaces (encrypt.replace(',', ' ')), then call split without arguments, which will split on any number of whitespace characters. As usual, you can chain these together:
encrypted.replace(',', ' ').split()
i am trying to parse a txt file with a lot of lines like this:
470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003
i am making a dictionary where the key is the first number on the line, and the values are (for each key) the words separated by the slash "/", every one of this words is saved into a list, for example list1 gets all cms_trk_dcs_05:CAEN, list2 would be all CMS_TRACKER_SY1527_7, etc
but when i use pattern = re.split('\W',line) to split the line, it takes into account
the ":" character, i mean when i try to print cms_trk_dcs_05:CAEN it only returns cms_trk_dcs_05, how can i save in the list all the word cms_trk_dcs_05:CAEN, and save in my list all the words separated by slash
I am new at python, so i apologize if this is for dummys
anyway thank you in advance
Use split() to match first the space after the number, and then the '/':
>>> stringin = "470115572 cms_trk_dcs_05:CAEN/CMS_TRACKER_SY1527_7/branchController00/easyCrate3/easyBoard16/channel003"
>>> splitstring = stringin.split(' ')
>>> num = splitstring[0]
>>> stringlist = splitstring[1].split('/')
>>> num
'470115572'
>>> stringlist
['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']
>>>
Or as a (less obvious) one-liner:
>>> [x.split('/') for x in stringin.split(' ')]
[['470115572'], ['cms_trk_dcs_05:CAEN', 'CMS_TRACKER_SY1527_7', 'branchController00', 'easyCrate3', 'easyBoard16', 'channel003']]
Note, though, that the second approach creates the first element as a list.
As in Trimax's comment: : (colon) is a nonword character, so to split line correctly you need to include it in pattern. Or use SiHa's answer.
About pattern, \W equals to [^a-zA-Z0-9_] (https://docs.python.org/2/library/re.html#regular-expression-syntax), so you can just add colon to it: [^a-zA-Z0-9_:]
As for second part, just use first element of result list as dict key and assign remained list to it in form of slice.
Something like this:
result_dict = {}
for line in file_lines:
line_splitted = re.split('[^a-zA-Z0-9_:]+', line)
result_dict[line_splitted[0]] = line_splitted[1:]
Note though, if your text contains lines with same numbers, you'll lose data, as when assigning new value (list of words in this case) to existing key, it will overwrite previous value.
Is it possible to replace a single character inside a string that occurs many times?
Input:
Sentence=("This is an Example. Thxs code is not what I'm having problems with.") #Example input
^
Sentence=("This is an Example. This code is not what I'm having problems with.") #Desired output
Replace the 'x' in "Thxs" with an i, without replacing the x in "Example".
You can do it by including some context:
s = s.replace("Thxs", "This")
Alternatively you can keep a list of words that you don't wish to replace:
whitelist = ['example', 'explanation']
def replace_except_whitelist(m):
s = m.group()
if s in whitelist: return s
else: return s.replace('x', 'i')
s = 'Thxs example'
result = re.sub("\w+", replace_except_whitelist, s)
print(result)
Output:
This example
Sure, but you essentially have to build up a new string out of the parts you want:
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> s[22]
'x'
>>> s[:22] + "i" + s[23:]
"This is an Example. This code is not what I'm having problems with."
For information about the notation used here, see good primer for python slice notation.
If you know whether you want to replace the first occurrence of x, or the second, or the third, or the last, you can combine str.find (or str.rfind if you wish to start from the end of the string) with slicing and str.replace, feeding the character you wish to replace to the first method, as many times as it is needed to get a position just before the character you want to replace (for the specific sentence you suggest, just one), then slice the string in two and replace only one occurrence in the second slice.
An example is worth a thousands words, or so they say. In the following, I assume you want to substitute the (n+1)th occurrence of the character.
>>> s = "This is an Example. Thxs code is not what I'm having problems with."
>>> n = 1
>>> pos = 0
>>> for i in range(n):
>>> pos = s.find('x', pos) + 1
...
>>> s[:pos] + s[pos:].replace('x', 'i', 1)
"This is an Example. This code is not what I'm having problems with."
Note that you need to add an offset to pos, otherwise you will replace the occurrence of x you have just found.