problem about split a string

problem about split a string - python

I wrote a program to read a registry entry from a file.
And the entry looks like this:
reg='HKEY_LOCAL_MACHINE\SOFTWARE\TT\Tools\SYS\exePath' #it means rootKey=HKEY_LOCAL_MACHINE, subKey='SOFTWARE\TT\Tools\SYS', property=exePath
I want to read this entry from the file and break it into rootKey, subKey and property.
Apparently, I can do it this way:
rootKey = reg.split('\\', 1)[0]
subKey = reg.split('\\', 1)[1].rsplit('\\', 1)[0] #might be a stupid way
property = reg.rsplit('\\, 1)[1]
Maybe the entry is a stupid one, but any better way to break it into parts like above?

import re
t=re.search(r"(.+?)\\(.+)\\(.+)", reg)
t.groups()
('HKEY_LOCAL_MACHINE', 'SOFTWARE\\TT\\Tools\\SYS', 'exePath')

How about doing the following? There's no need to call .split() so many times, anyway...
s = reg.split('\\')
property = s.pop()
root_key = s.pop(0)
sub_key = '\\'.join(s)

I like to use partition over split when I can, because partition ensures each of the returned tuple elements is a string.
root_key, _, s = reg.partition("\\")
_, sub_key, property = s.rpartition("\\") # note, _r_partition

Related

How to split string with multiple delimiters in Python?

My First String
xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv
But I want to result like this below
bonding_err_bond0-if_eth2
I try some code but seems not work correctly
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
x = csv.rsplit('.', 4)[2]
print(x)
But Result that I get is com-bonding_err_bond0-if_eth2-d But my purpose is bonding_err_bond0-if_eth2

If you are allowed to use the solution apart from regex,
You can break the solution into a smaller part to understand better and learn about join if you are not aware of it. It will come in handy.
solution= '-'.join(csv.split('.', 4)[2].split('-')[1:3])
Thanks,
Shashank

Probably you got the answer, but if you want a generic method for any string data you can do this:
In this way you wont be restricted to one string and you can loop the data as well.
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
first_index = csv.find("-")
second_index = csv.find("-d")
result = csv[first_index+1:second_index]
print(result)
# OUTPUT:
# bonding_err_bond0-if_eth2

You can just separate the string with -, remove the beginning and end, and then join them back into a string.
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
x = '-'.join(csv.split('-')[1:-1])
Output
>>> csv
>>> bonding_err_bond0-if_eth2

Python - Iterate through CSV rows and create XML string

I have a CSV file that contains a header row followed by a potentially unlimited number of rows with values. For example:
FieldA,FieldB,FieldC,FieldD
1,asdf,2,ghjk
3,qwer,4,yuio
5,slslkd,,aldkjslkj
What I need to do is for each row, create a quasi-XML string where the elements are labeled as the column name and information within each element is the value of the cell. Using the above as an example, if I iterate through each of the three rows I would end up with these three strings:
<FieldA>1</FieldA><FieldB>asdf</FieldB><FieldC>2</FieldC><FieldD>ghjk</FieldD>
<FieldA>3</FieldA><FieldB>qwer</FieldB><FieldC>4</FieldC><FieldD>yuio</FieldD>
<FieldA>5</FieldA><FieldB>slslkd</FieldB><FieldD>aldkjslkj</FieldD>
The way I am currently doing is is:
for row in r:
if row['FieldA']:
fielda = '<FieldA>{0}</FieldA>'.format(row['FieldA'])
else:
fielda = ''
if row['FieldB']:
fieldb = '<FieldB>{0}</FieldB>'.format(row['FieldB'])
else:
fieldb = ''
if row['FieldC']:
fieldc = '<FieldC>{0}</FieldC>'.format(row['FieldC'])
else:
fieldc = ''
if row['FieldD']:
fieldd = '<FieldD>{0}</FieldD>'.format(row['FieldD'])
else:
fieldd = ''
# Compile the string
final_string = fielda + fieldb + fieldc + fieldd
# Process further
do_something(final_string)
As it iterates through each row, this creates the appropriate string and then I can pass it on for further processing.
Is there a better way to achieve what I want, or is my approach the best way? My guess is there is a better, more Pythonic, and more efficient way, but I'm new-ish to Python.
Thanks.

Slightly modified code that fixed the issue I was having. Turned out to be pretty trivial:
with open(csv_file) as f:
for row in csv.DictReader(f):
top = Element('event')
for k, v in row.items():
child = SubElement(top, k)
child.text = v
print tostring(top)
Thanks for the help!

Python is Batteries Included.
In this case, you can use the csv module and the xml module, with code that looks like this:
# CSV module
import csv
# Stuff from the XML module
from xml.etree.ElementTree import Element, SubElement, tostring
# Topmost XML element
top = Element('top')
# Open a file
with open('stuff.csv') as csvfile:
# And use a dictionary-reader
for d in csv.DictReader(csvfile)
# For each mapping in the dictionary
for (k, v) in d.iteritems():
# Create an XML node
child = SubElement(top, k)
child.text = v
print tostring(top)

'Top' is just the highest level node -- you could use whatever text you want to wrap the whole document.
You can pretty-print it pretty simply as well:
http://pymotw.com/2/xml/etree/ElementTree/create.html#pretty-printing-xml

Find string and replace the next few lines with something

I am writing a Python script that will ask for a file and a name (e.g. "John").
The file contains a whole bunch of lines like this:
...
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
...
I want to open this file, ask the user for a name, and replace the pay value for all entries that match that name. So, for example, the user inputs "John", I want to change the Pay for all "John"s to be, say, 5000. The Pay value for other names don't change.
So far, I've opened up the file and concatenated everything into one long string to make things a bit easier:
for line in file:
file_string += line
At first, I was thinking about some sort of string replace but that didn't pan out since I would search for "John" but I don't want to replace the "John", but rather the Pay value that is two lines down.
I started using regex instead and came up with something like this.
# non-greedy matching
re.findall("Name=(.*?)\nAge=(.*?)\nPay=(.*?)\n", file_string, re.S)
Okay, so that spits out a list of 3-tuples of those groupings and it does seem to find everything fine. Now, to do the actual replacement...
I read on another question here on StackOverflow that I can set the name of a grouping and use that grouping later on...:
re.sub(r'Name=(.*?)\nAge=(.*?)\nPay=', r'5000', file_string, re.S)
I tried that to see if it would work and replace all Names with 5000, but it didn't. If it would then I would probably do a check on the first group to see if it matched the user-inputed name or something.
The other problem is that I read on the Python docs that re.sub only replaces the left-most occurrence. I want to replace all occurrences. How do I do that?
Now I am a bit loss of what to do so if anyone can help me that would be great!

I don't think that regex is the best solution to this problem. I prefer more general solutions. The other answers depend on one or more of the following things:
There are always 4 properties for a person.
Every person has the same properties.
The properties are always in the same order.
If these are true in your case, then regex could be ok.
My solution is more verbose, but it isn't depending on these. It handles mixed/missing properties, mixed order, and able to set and get any property value. You could even extend it a little, and support new property or person insertion if you need.
My code:
# i omitted "data = your string" here
def data_value(person_name, prop_name, new_value = None):
global data
start_person = data.find("Name=" + person_name + "\n")
while start_person != -1:
end_person = data.find("Name=", start_person + 1)
start_value = data.find(prop_name + "=", start_person, end_person)
if start_value != -1:
start_value += len(prop_name) + 1
end_value = data.find("\n", start_value, end_person)
if new_value == None:
return data[start_value:end_value]
else:
data = data[:start_value] + str(new_value) + data[end_value:]
start_person = data.find("Name=" + person_name + "\n", end_person)
return None
print data_value("Mark", "Pay") # Output: None (missing person)
print data_value("Bob", "Weight") # Output: None (missing property)
print data_value("Bob", "Pay") # Output: "500" (current value)
data_value("Bob", "Pay", 1234) # (change it)
print data_value("Bob", "Pay") # Output: "1234" (new value)
data_value("John", "Pay", 555) # (change it in both Johns)

Iterate 4 lines at a time. If the first line contains 'John' edit the line that comes two after.
data = """
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
"""
lines = data.split()
for i, value in enumerate(zip(*[iter(lines)]*4)):
if 'John' in value[0]:
lines[i*4 + 2] = "Pay=5000"
print '\n'.join(lines)

The following code will do what you need:
import re
text = """
Name=John
Age=30
Pay=1000
Married=1
Name=Bob
Age=25
Pay=500
Married=0
Name=John
Age=56
Pay=3000
Married=1
"""
# the name you're looking for
name = "John"
# the new payment
pay = 500
print re.sub(r'Name={0}\nAge=(.+?)\nPay=(.+?)\n'.format(re.escape(name)), r'Name=\1\nAge=\2\nPay={0}\n'.format(pay), text)

how to use string as list's indices in Python

for line in f.readlines():
(addr, vlanid, videoid, reqs, area) = line.split()
if vlanid not in dict:
dict[vlanid] = []
video_dict = dict[vlanid]
if videoid not in video_dict:
video_dict[videoid] = []
video_dict[videoid].append((addr, vlanid, videoid, reqs, area))
Here is my code, I want to use videoid as indices to creat a list. the real data of videoid are different strings like this : FYFSYJDHSJ
I got this error message:
video_dict[videoid] = []
TypeError: list indices must be integers, not str
But now how to add identifier like 1,2,3,4 for different strings in this case?

Use a dictionary instead of a list:
if vlanid not in dict:
dict[vlanid] = {}
P.S. I recommend that you call dict something else so that it doesn't shadow the built-in dict.

Don't use dict as a variable name. Try this (d instead of dict):
d = {}
for line in f.readlines():
(addr, vlanid, videoid, reqs, area) = line.split()
video_dict = d.setdefault(vlanid, {})
video_dict.setdefault(videoid, []).append((addr, vlanid, videoid, reqs, area))

As suggested above, creating dictionaries would be the most ideal code to implement. (Although you should avoid calling them dict, as that means something important to Python.
Your code may look something like what #aix had already posted above:
for line in f.readlines():
d = dict(zip(("addr", "vlanid", "videoid", "reqs", "area"), tuple(line.split())))
You would be able to do something with the dictionary d later in your code. Just remember - iterating through this dictionary will mean that, if you don't use d until after the loop is complete, you'll only get the last values from the file.

python string manipulation

Going to re-word the question.
Basically I'm wondering what is the easiest way to manipulate a string formatted like this:
Safety/Report/Image/489
or
Safety/Report/Image/490
And sectioning off each word seperated by a slash(/), and storing each section(token) into a store so I can call it later. (Reading in about 1200 cells from a CSV file).

The answer for your question:
>>> mystring = "Safety/Report/Image/489"
>>> mystore = mystring.split('/')
>>> mystore
['Safety', 'Report', 'Image', '489']
>>> mystore[2]
'Image'
>>>
If you want to store data from more than one string, then you have several options depending on how do you want to organize it. For example:
liststring = ["Safety/Report/Image/489",
"Safety/Report/Image/490",
"Safety/Report/Image/491"]
dictstore = {}
for line, string in enumerate(liststring):
dictstore[line] = string.split('/')
print dictstore[1][3]
print dictstore[2][3]
prints:
490
491
In this case you can use in the same way a dictionary or a list (a list of lists) for storage. In case each string has a especial identifier (one better than the line number), then the dictionary is the option to choose.

I don't quite understand your code and don't have too much time to study it, but I thought that the following might be helpful, at least if order isn't important ...
in_strings = ['Safety/Report/Image/489',
'Safety/Report/Image/490',
'Other/Misc/Text/500'
]
out_dict = {}
for in_str in in_strings:
level1, level2, level3, level4 = in_str.split('/')
out_dict.setdefault(level1, {}).setdefault(
level2, {}).setdefault(
level3, []).append(level4)
print out_dict
{'Other': {'Misc': {'Text': ['500']}}, 'Safety': {'Report': {'Image': ['489', '490']}}}

If your csv is line seperated:
#do something to load the csv
split_lines = [x.strip() for x in csv_data.split('\n')]
for line_data in split_lines:
split_parts = [x.strip() for x in line_data.split('/')]
# do something with individual part data
# such as some_variable = split_parts[1] etc
# if using indexes, I'd be sure to catch for index errors in case you
# try to go to index 3 of something with only 2 parts
check out the python csv module for some importing help (I'm not too familiar).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

problem about split a string - python

import re t=re.search(r"(.+?)\\(.+)\\(.+)", reg) t.groups() ('HKEY_LOCAL_MACHINE', 'SOFTWARE\\TT\\Tools\\SYS', 'exePath')

How about doing the following? There's no need to call .split() so many times, anyway... s = reg.split('\\') property = s.pop() root_key = s.pop(0) sub_key = '\\'.join(s)

I like to use partition over split when I can, because partition ensures each of the returned tuple elements is a string. root_key, _, s = reg.partition("\\") _, sub_key, property = s.rpartition("\\") # note, _r_partition

Related

How to split string with multiple delimiters in Python?

Python - Iterate through CSV rows and create XML string

Find string and replace the next few lines with something

how to use string as list's indices in Python

python string manipulation

Categories

Resources