I am trying to increment a version number using regex but I can't seem to get the hang of regex at all. I'm having trouple with the symbols in the string I am trying to read and change. The code I have so far is:
version_file = "AssemblyInfo.cs"
read_file = open(version_file).readlines()
write_file = open(version_file, "w")
r = re.compile(r'(AssemblyFileVersion\s*(\s*"\s*)(\S+))\s*"\s*')
for l in read_file:
m1 = r.match(l)
if m1:
VERSION_ID=map(int,m1.group(2).split("."))
VERSION_ID[2]+=1 # increment version
l = r.sub(r'\g<1>' + '.'.join(['%s' % (v) for v in VERSION_ID]), l)
write_file.write(l)
write_file.close()
The string I am trying to read and change is:
[assembly: AssemblyFileVersion("1.0.0.0")]
What I would like written to the file is:
[assembly: AssemblyFileVersion("1.0.0.1")]
So basically I want to increment the build number by one.
Can anyone help me fix my regualr expression. I seem to have trouble getting to grips with regular expression that have to get around symbols.
Thanks for any help.
If you specify the version as "1.0.0.*" then AFAIK it gets updated on each build automagically, at least if you're using Visual Studio.NET.
I'm not sure regex is your best bet, but one way of doing it would be this:
import re
# Don't bother matching everything, just the bits that matter.
pat = re.compile(r'AssemblyFileVersion.*\.(\d+)"')
# ... lines omitted which set up read_file, write_file etc.
for line in read_file:
m = pat.search(line)
if m:
start, end = m.span(1)
line = line[:start] + str(int(line[start:end]) + 1) + line[end:]
write_file.write(line)
Good luck with regex.
If I had to do the same, I'd convert the string to int by removing the dots, add one and convert back to string.
Well, I'd have also used a integer version number in the first place.
Related
I'm trying to get all the substrings under a "customLabel" tag, for example "Month" inside of ...,"customLabel":"Month"},"schema":"metric...
Unusual issue: this is a 1071552 characters long ndjson file, of a single line ("for line in file:" is pointless since there's only one).
The best I found was that:
How to find a substring of text with a known starting point but unknown ending point in python
but if I use this, the result obviously doesn't stop (at Month) and keeps writing the whole remaining of the file, same as if using partition()[2].
Just know that Month is only an example, customLabel has about 300 variants and they are not listed (I'm actually doing this to list them...)
To give some details here's my script so far:
with open("file.ndjson","rt", encoding='utf-8') as ndjson:
filedata = ndjson.read()
x="customLabel"
count=filedata.count(x)
for i in range (count):
if filedata.find(x)>0:
print("Found "+str(i+1))
So right now it properly tells me how many occurences of customLabel there are, I'd like to get the substring that comes after customLabel":" instead (Month in the example) to put them all in a list, to locate them way more easily and enable the use of replace() for traductions later on.
I'd guess regex are the solution but I'm pretty new to that, so I'll post that question by the time I learn about them...
If you want to search for all (even nested) customLabel values like this:
{"customLabel":"Month" , "otherJson" : {"customLabel" : 23525235}}
you can use RegEx patterns with the re module
import re
label_values = []
regex_pattern = r"\"customLabel\"[ ]?:[ ]?([1-9a-zA-z\"]+)"
with open("file.ndjson", "rt", encoding="utf-8") as ndjson:
for line in ndjson:
values = re.findall(regex_pattern, line)
label_values.extend(values)
print(label_values) # ['"Month"', '23525235']
# If you don't want the items to have quotations
label_values = [i.replace('"', "") for i in label_values]
print(label_values) # ['Month', '23525235']
Note: If you're only talking about ndjson files and not nested searching, then it'd be better to use the json module to parse the lines and then easily get the value of your specific key which is customLabel.
import json
label = "customLabel"
label_values = []
with open("file.ndjson", "rt", encoding="utf-8") as ndjson:
for line in ndjson:
line_json = json.loads(line)
if line_json.get(label) is not None:
label_values.append(line_json.get(label))
print(label_values) # ['Month']
I have a large file with several lines as given below.I want to read in only those lines which have the _INIT pattern in them and then strip off the _INIT from the name and only save the OSD_MODE_15_H part in a variable. Then I need to read the corresponding hex value, 8'h00 in this case, ans strip off the 8'h from it and replace it with a 0x and save in a variable.
I have been trying strip the off the _INIT,the spaces and the = and the code is becoming really messy.
localparam OSD_MODE_15_H_ADDR = 16'h038d;
localparam OSD_MODE_15_H_INIT = 8'h00
Can you suggest a lean and clean method to do this?
Thanks!
The following solution uses a regular expression (compiled to speed searching up) to match the relevant lines and extract the needed information. The expression uses named groups "id" and "hexValue" to identify the data we want to extract from the matching line.
import re
expression = "(?P<id>\w+?)_INIT\s*?=.*?'h(?P<hexValue>[0-9a-fA-F]*)"
regex = re.compile(expression)
def getIdAndValueFromInitLine(line):
mm = regex.search(line)
if mm == None:
return None # Not the ..._INIT parameter or line was empty or other mismatch happened
else:
return (mm.groupdict()["id"], "0x" + mm.groupdict()["hexValue"])
EDIT: If I understood the next task correctly, you need to find the hexvalues of those INIT and ADDR lines whose IDs match and make a dictionary of the INIT hexvalue to the ADDR hexvalue.
regex = "(?P<init_id>\w+?)_INIT\s*?=.*?'h(?P<initValue>[0-9a-fA-F]*)"
init_dict = {}
for x in re.findall(regex, lines):
init_dict[x.groupdict()["init_id"]] = "0x" + x.groupdict()["initValue"]
regex = "(?P<addr_id>\w+?)_ADDR\s*?=.*?'h(?P<addrValue>[0-9a-fA-F]*)"
addr_dict = {}
for y in re.findall(regex, lines):
addr_dict[y.groupdict()["addr_id"]] = "0x" + y.groupdict()["addrValue"]
init_to_addr_hexvalue_dict = {init_dict[x] : addr_dict[x] for x in init_dict.keys() if x in addr_dict}
Even if this is not what you actually need, having init and addr dictionaries might help to achieve your goal easier. If there are several _INIT (or _ADDR) lines with the same ID and different hexvalues then the above dict approach will not work in a straight forward way.
try something like this- not sure what all your requirements are but this should get you close:
with open(someFile, 'r') as infile:
for line in infile:
if '_INIT' in line:
apostropheIndex = line.find("'h")
clean_hex = '0x' + line[apostropheIndex + 2:]
In the case of "16'h038d;", clean_hex would be "0x038d;" (need to remove the ";" somehow) and in the case of "8'h00", clean_hex would be "0x00"
Edit: if you want to guard against characters like ";" you could do this and test if a character is alphanumeric:
clean_hex = '0x' + ''.join([s for s in line[apostropheIndex + 2:] if s.isalnum()])
You can use a regular expression and the re.findall() function. For example, to generate a list of tuples with the data you want just try:
import re
lines = open("your_file").read()
regex = "([\w]+?)_INIT\s*=\s*\d+'h([\da-fA-F]*)"
res = [(x[0], "0x"+x[1]) for x in re.findall(regex, lines)]
print res
The regular expression is very specific for your input example. If the other lines in the file are slightly different you may need to change it a bit.
I have a log file and at the end of each line in the file there is this string:
Line:# where # is the line number.
I am trying to get the # and compare it to the previous line's number. what would be the best way to do that in python?
I would probably use str.split because it seems easy:
with open('logfile.log') as fin:
numbers = [ int(line.split(':')[-1]) for line in fin ]
Now you can use zip to compare one number with the next one:
for num1,num2 in zip(numbers,numbers[1:]):
compare(num1,num2) #do comparison here.
Of course, this isn't lazy (you store every line number in the file at once when you really only need 2 at a time), so it might take up a lot of memory if your files are HUGE. It wouldn't be hard to make it lazy though:
def elem_with_next(iterable):
ii = iter(iterable)
prev = next(ii)
for here in ii:
yield prev,here
prev = here
with open('logfile.log') as fin:
numbers = ( int(line.split(':')[-1]) for line in fin )
for num1,num2 in elem_with_next(numbers):
compare(num1,num2)
I'm assuming that you don't have something convenient to split a string on, meaning a regular expression might make more sense. That is, if the lines in your log file are structured like:
date: 1-15-2013, error: mildly_annoying, line: 121
date: 1-16-2013, error: err_something_bad, line: 123
Then you won't be able to use line.split('#') as mgilson as suggested, although if there is always a colon, line.split(':') might work. In any case, a regular expression solution would look like:
import re
numbers = []
for line in log:
digit_match = re.search("(\d+)$", line)
if digit_match is not None:
numbers.append(int(digit_match.group(1)))
Here the expression "(\d+)$" is matching some number of digits and then the end of the line. We extract the digits with the group(1) method on the returned match object and then add them to our list of line numbers.
If you're not confident that the "Line: #" will always come at the end of the log, you could replace the regular expression used above with something akin to "Line:\s*(\d+)" which checks for the string "Line:" then some (or no) whitespace, and then any number of digits.
Program Details:
I am writing a program for python that will need to look through a text file for the line:
Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.
Problem:
Then after the program has found that line, it will then store the line into an array and get the value 19.612545, from f = 19.612545.
Question:
I so far have been able to store the line into an array after I have found it. However I am having trouble as to what to use after I have stored the string to search through the string, and then extract the information from variable f. Does anyone have any suggestions or tips on how to possibly accomplish this?
Depending upon how you want to go at it, CosmicComputer is right to refer you to Regular Expressions. If your syntax is this simple, you could always do something like:
line = 'Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.'
splitByComma=line.split(',')
fValue = splitByComma[1].replace('f= ', '').strip()
print(fValue)
Results in 19.612545 being printed (still a string though).
Split your line by commas, grab the 2nd chunk, and break out the f value. Error checking and conversions left up to you!
Using regular expressions here is maddness. Just use string.find as follows: (where string is the name of the variable the holds your string)
index = string.find('f=')
index = index + 2 //skip over = and space
string = string[index:] //cuts things that you don't need
string = string.split(',') //splits the remaining string delimited by comma
your_value = string[0] //extracts the first field
I know its ugly, but its nothing compared with RE.
I am trying to use regular expression to search a document fo a UUID number and replace the end of it with a new number. The code I have so far is:
read_file = open('test.txt', 'r+')
write_file = open('test.txt', 'w')
r = re.compile(r'(self.uid\s*=\s*5EFF837F-EFC2-4c32-A3D4\s*)(\S+)')
for l in read_file:
m1 = r.match(l)
if m1:
new=(str,m1.group(2))
new??????
This where I get stuck.
The file test.txt has the below UUID stored in it:
self.uid = '5EFF837F-EFC2-4c32-A3D4-D15C7F9E1F22'
I want to replace the part D15C7F9E1F22.
I have also tried this:
r = re.compile(r'(self.uid\s*=\s*)(\S+)')
for l in fp:
m1 = r.match(l)
new=map(int,m1.group(2).split("-")
new[4]='RHUI5345JO'
But I cannot seem to match the string.
Thanks in advance for any help.
Why are you using a regex for such a straight forward substitution?
Couldn't you just use
for l in read_file:
l.replace("5EFF837F-EFC2-4c32-A3D4-D15C7F9E1F22",
"5EFF837F-EFC2-4c32-A3D4-RHUI5345JO")
# Write to file..
or is there more to the story than you're telling us? Also, unless it is a too big of a file, I would recommend reading the whole file into a string and doing just one replace on it for the sake of speed.
I think your regular expression is off:
r = re.compile(r'(self.uid\s*=\s*5EFF837F-EFC2-4c32-A3D4\s*)(\S+)')
Should be:
r = re.compile(r"(self\.uid\s*=\s*'5EFF837F-EFC2-4c32-A3D4-)([^']*)'")
Then, when you have a match, grab group 1 and assign it to a variable and append your replacement string to it.
The ([^']*) group will search for any character up to the ' mark. That's your target remove group.
Edit: June 11th, 2010 # 2:27 EST: Justin Peel has a good point. You can do a straight search and replace with this data. Unless you are looking for the pattern of 8 characters, followed by 4, 4, 4, and 12... In which case you could use the pattern:
r = re.compile(r"self\.uid\s*=\s*('\w{8}-(:?\w{4}-){3})(\w{12})'")