Python modify text file by the name of arguments - python

I have a text file ("input.param"), which serves as an input file for a package. I need to modify the value of one argument. The lines need to be changed are the following:
param1 0.01
model_name run_param1
I need to search the argument param1 and modify the value of 0.01 for a range of different values, meanwhile the model_name will also be changed accordingly for different value of param1. For example, if the para1 is changed to be 0.03, then the model_name is changed to be 'run_param1_p03'. Below is some of my attempting code:
import numpy as np
import os
param1_range = np.arange(0.01,0.5,0.01)
with open('input.param', 'r') as file :
filedata = file.read()
for p_value in param1_range:
filedata.replace('param1 0.01', 'param1 ' + str(p_value))
filedata.replace('model_name run_param1', 'model_name run_param1' + '_p0' + str(int(round(p_value*100))))
with open('input.param', 'w') as file:
file.write(filedata)
os.system('./bin/run_app param/input.param')
However, this is not working. I guess the main problem is that the replace command can not recognize the space. But I do not know how to search the argument param1 or model_name and change their values.

I'm editing this answer to more accurately answer the original question, which it did not adequately do.
The problem is "The replace command can not recognize the space". In order to do this, the re, or regex module, can be of great help. Your document is composed of an entry and its value, separated by spaces:
param1 0.01
model_name run_param1
In regex, a general capture would look like so:
import re
someline = 'param1 0.01'
pattern = re.match(r'^(\S+)\s+(\S+)$', someline)
pattern.groups()
# ('param1', '0.01')
The regex functions as follows:
^ captures a start-of-line
\S is any non-space char, or, anything not in ('\t', ' ', '\r', '\n')
+ indicates one or more as a greedy search (will go forward until the pattern stops matching)
\s+ is any whitespace char (opposite of \S, note the case here)
() indicate groups, or how you want to group your search
The groups make it fairly easy for you to unpack your arguments into variables if you so choose. To apply this to the code you have already:
import numpy as np
import re
param1_range = np.arange(0.01,0.5,0.01)
filedata = []
with open('input.param', 'r') as file:
# This will put the lines in a list
# so you can use ^ and $ in the regex
for line in file:
filedata.append(line.strip()) # get rid of trailing newlines
# filedata now looks like:
# ['param1 0.01', 'model_name run_param1']
# It might be easier to use a dictionary to keep all of your param vals
# since you aren't changing the names, just the values
groups = [re.match('^(\S+)\s+(\S+)$', x).groups() for x in filedata]
# Now you have a list of tuples which can be fed to dict()
my_params = dict(groups)
# {'param1': '0.01', 'model_name': 'run_param1'}
# Now just use that dict for setting your params
for p_value in param1_range:
my_params['param1'] = str(p_value)
my_params['model_name'] = 'run_param1_p0' + str(int(round(p_value*100)))
# And for the formatting back into the file, you can do some quick padding to get the format you want
with open('somefile.param', 'w') as fh:
content = '\n'.join([k.ljust(20) + v.rjust(20) for k,v in my_params.items()])
fh.write(content)
The padding is done using str.ljust and str.rjust methods so you get a format that looks like so:
for k, v in dict(groups).items():
intstr = k.ljust(20) + v.rjust(20)
print(intstr)
param1 0.01
model_name run_param1
Though you could arguably leave out the rjust if you felt so inclined.

Related

How can I replace a specific text in string python?

I'm confused when trying to replace specific text in python
my code is:
Image = "/home/user/Picture/Image-1.jpg"
Image2 = Image.replace("-1", "_s", 1)
print(Image)
print(Image2)
Output:
/home/user/Picture/Image-1.jpg
/home/user/Picture/Image_s.jpg
The output what I want from Image2 is:
/home/user/Picture/Image-1_s.jpg
You are replacing the -1 with _s
If you want to keep the -1 as well, you can just add it in the replacement
Image = "/home/user/Picture/Image-1.jpg"
Image2 = Image.replace("-1", "-1_s", 1)
print(Image)
print(Image2)
Output
/home/user/Picture/Image-1.jpg
/home/user/Picture/Image-1_s.jpg
If the digits can be variable, you can also use a pattern with for example 2 capture groups, and then use those capture groups in the replacement with _s in between
import re
pattern = r"(/home/user/Picture/Image-\d+)(\.jpg)\b"
s = "/home/user/Picture/Image-1.jpg\n/home/user/Picture/Image-2.jpg"
print(re.sub(pattern, r"\1_s\2", s))
Output
/home/user/Picture/Image-1_s.jpg
/home/user/Picture/Image-2_s.jpg
Or for example only taking the /Image- into account and then use the full match in the replacement instead of using capture groups:
import re
pattern = r"/Image-\d+(?=\.jpg)\b"
s = "/home/user/Picture/Image-1.jpg\n/home/user/Picture/Image-2.jpg"
print(re.sub(pattern, r"\g<0>_s", s))
Output
/home/user/Picture/Image-1_s.jpg
/home/user/Picture/Image-2_s.jpg
The behaviour of the code you wrote is working as I would have expected from reading it. Now, as to how to correct it to do what you expected/wanted it to do is a little different. You don't necessarily need to replace here, instead, you can consider appending what you need, as it seems from the behaviour you are looking for is in fact appending something to the end of the path before the extension.
We can try to help the code a bit by making it a little more "generic" by allowing us to simply "append" anything to the end of a string. The steps we can do to achieve this is (for other readers, yes there are more foolproof ways to do this, for now sticking to a simple example) :
split the string at . so that you end up with a list containing:
["/home/user/Picture/Image-1", "jpg"]
Append to the first element what you need to the end of the string so you end up with:
"/home/user/Picture/Image-1_s"
Use join to re-craft your string, but use .:
".".join(["/home/user/Picture/Image-1_s", "jpg"])
You will finally get:
/home/user/Picture/Image-1_s.jpg
Coding the above, we can have it work as follows:
>>> Image1 = "/home/user/Picture/Image-1.jpg"
>>> img_split = Image1.split(".")
>>> img_split
['/home/user/Picture/Image-1', 'jpg']
>>> img_split[0] = img_split[0] + "_s"
>>> img_split
['/home/user/Picture/Image-1_s', 'jpg']
>>> final_path = ".".join(img_split)
>>> final_path
'/home/user/Picture/Image-1_s.jpg'
More idiomatic approach using Python's pathlib module is an interesting solution too.
from pathlib import Path
Image1 = "/home/user/Picture/Image-1.jpg"
p = Path(Image1)
# you have access to all the parts you need. Like the path to the file:
p.parent # outputs PosixPath('/home/user/Picture/')
# The name of the file without extension
p.stem # outputs 'Image-1'
# The extension of the file
p.suffix # outputs '.jpg'
# Finally, we get to now rename it using the rename method!
p.rename(p.parent / f"{p.stem}_s{p.suffix}")
# This will now result in the following object with renamed file!
# PosixPath('/home/user/Picture/Image-1_s.jpg')
The replace function replaces "-1" with "_s".
If you want the output to be: /home/user/Picture/Image-1_s.jpg
You should replace "-1" with "-1_s".
Try:
Image = "/home/user/Picture/Image-1.jpg"
Image2 = Image.replace("-1", "-1_s")
print(Image)
print(Image2)
Try this
i think you should append the string in a certain position not replace
Image = "/home/user/Picture/Image-1.jpg"
Image2 = Image[:26]+ '_s' + Image[26:]
print(Image2)
The output

How to read strings as integers when reading from a file in python

I have the following line of code reading in a specific part of a text file. The problem is these are numbers not strings so I want to convert them to ints and read them into a list of some sort.
A sample of the data from the text file is as follows:
However this is not wholly representative I have uploaded the full set of data here: http://s000.tinyupload.com/?file_id=08754130146692169643 as a text file.
*NSET, NSET=Nodes_Pushed_Back_IB
99915527, 99915529, 99915530, 99915532, 99915533, 99915548, 99915549, 99915550,
99915551, 99915552, 99915553, 99915554, 99915555, 99915556, 99915557, 99915558,
99915562, 99915563, 99915564, 99915656, 99915657, 99915658, 99915659, 99915660,
99915661, 99915662, 99915663, 99915664, 99915665, 99915666, 99915667, 99915668,
99915669, 99915670, 99915885, 99915886, 99915887, 99915888, 99915889, 99915890,
99915891, 99915892, 99915893, 99915894, 99915895, 99915896, 99915897, 99915898,
99915899, 99915900, 99916042, 99916043, 99916044, 99916045, 99916046, 99916047,
99916048, 99916049, 99916050
*NSET, NSET=Nodes_Pushed_Back_OB
Any help would be much appreciated.
Hi I am still stuck with this issue any more suggestions? Latest code and error message is as below Thanks!
import tkinter as tk
from tkinter import filedialog
file_path = filedialog.askopenfilename()
print(file_path)
data = []
data2 = []
data3 = []
flag= False
with open(file_path,'r') as f:
for line in f:
if line.strip().startswith('*NSET, NSET=Nodes_Pushed_Back_IB'):
flag= True
elif line.strip().endswith('*NSET, NSET=Nodes_Pushed_Back_OB'):
flag= False #loop stops when condition is false i.e if false do nothing
elif flag: # as long as flag is true append
data.append([int(x) for x in line.strip().split(',')])
result is the following error:
ValueError: invalid literal for int() with base 10: ''
Instead of reading these as strings I would like each to be a number in a list, i.e [98932850 98932852 98932853 98932855 98932856 98932871 98932872 98932873]
In such cases I use regular expressions together with string methods. I would solve this problem like so:
import re
with open(filepath) as f:
txt = f.read()
g = re.search(r'NSET=Nodes_Pushed_Back_IB(.*)', txt, re.S)
snums = g.group(1).replace(',', ' ').split()
numbers = [int(num) for num in snums]
I read the entire text into txt.
Next I use a regular expression and use the last portion of your header in the text as an anchor, and capture with capturing parenthesis all the rest (the re.S flag means that a dot should capture also newlines). I access all the nubers as one unit of text via g.group(1).
Next. I remove all the commas (actually replace them with spaces) because on the resulting text I use split() which is an excellent function to use on text items that are separated with spaces - it doesn't matter the amount of spaces, it just splits it as you would intent.
The rest is just converting the text to numbers using a list comprehension.
Your line contains more than one number, and some separating characters. You could parse that format by judicious application of split and perhaps strip, or you could minimize string handling by having re extract specifically the fields you care about:
ints = list(map(int, re.findall(r'-?\d+', line)))
This regular expression will find each group of digits, optionally prefixed by a minus sign, and then map will apply int to each such group found.
Using a sample of your string:
strings = ' 98932850, 98932852, 98932853, 98932855, 98932856, 98932871, 98932872, 98932873,\n'
I'd just split the string, strip the commas, and return a list of numbers:
numbers = [ int(s.strip(',')) for s in strings.split() ]
Based on your comment and regarding the larger context of your code. I'd suggest a few things:
from itertools import groupby
number_groups = []
with open('data.txt', 'r') as f:
for k, g in groupby(f, key=lambda x: x.startswith('*NSET')):
if k:
pass
else:
number_groups += list(filter('\n'.__ne__, list(g))) #remove newlines in list
data = []
for group in number_groups:
for str_num in group.strip('\n').split(','):
data.append(int(str_num))

parse statement string for arguments using regex in Python

I have user input statements which I would like to parse for arguments. If possible using regex.
I have read much about functools.partial on Stackoverflow where I could not find argument parsing. Also in regex on Stackoverflow I could not find how to check for a match, but exclude the used tokens. The Python tokenizer seems to heavy for my purpose.
import re
def getarguments(statement):
prog = re.compile("([(].*[)])")
result = prog.search(statement)
m = result.group()
# m = '(interval=1, percpu=True)'
# or m = "('/')"
# strip the parentheses, ugly but it works
return statement[result.start()+1:result.end()-1]
stm = 'psutil.cpu_percent(interval=1, percpu=True)'
arg_list = getarguments(stm)
print(arg_list) # returns : interval=1, percpu=True
# But combining single and double quotes like
stm = "psutil.disk_usage('/').percent"
arg_list = getarguments(stm) # in debug value is "'/'"
print(arg_list) # when printed value is : '/'
callfunction = psutil.disk_usage
args = []
args.append(arg_list)
# args.append('/')
funct1 = functools.partial(callfunction, *args)
perc = funct1().percent
print(perc)
This results an error :
builtins.FileNotFoundError: [Errno 2] No such file or directory: "'/'"
But
callfunction = psutil.disk_usage
args = []
#args.append(arg_list)
args.append('/')
funct1 = functools.partial(callfunction, *args)
perc = funct1().percent
print(perc)
Does return (for me) 20.3 This is correct.
So there is somewhere a difference.
The weird thing is, if I view the content in my IDE (WingIDE) the result is "'/'" and then, if I want to view the details then the result is '/'
I use Python 3.4.0 What is happening here, and how to solve?
Your help is really appreciated.
getarguments("psutil.disk_usage('/').percent") returns '/'. You can check this by printing len(arg_list), for example.
Your IDE adds ", because by default strings are enclosed into single quotes '. Now you have a string which actually contains ', so IDE uses double quotes to enclose the string.
Note, that '/' is not equal to "'/'". The former is a string of 1 character, the latter is a string of 3 characters. So in order to get things right you need to strip quotes (both double and single ones) in getarguments. You can do it with following snippet
if (s.startswith('\'') and s.endswith('\'')) or
(s.startswith('\"') and s.endswith('\"')):
s = s[1:-1]

Writing out comma separated values in a single cell in spreadsheet

I am cataloging attribute fields for each feature class in the input list, below, and then I am writing the output to a spreadsheet for the occurance of the attribute in one or more of the feature classes.
import arcpy,collections,re
arcpy.env.overwriteOutput = True
input = [list of feature classes]
outfile= # path to csv file
f=open(outfile,'w')
f.write('ATTRIBUTE,FEATURE CLASS\n\n')
mydict = collections.defaultdict(list)
for fc in input:
cmp=[]
lstflds=arcpy.ListFields(fc)
for fld in lstflds:
cmp.append(fld.name)
for item in cmp:
mydict[item].append(fc)
for keys, vals in mydict.items():
#remove these characters
char_removal = ["[","'",",","]"]
new_char = '[' + re.escape(''.join(char_removal)) + ']'
v=re.sub(new_char,'', str(vals))
line=','.join([keys,v])+'\n'
print line
f.write(line)
f.close()
This code gets me 90% of the way to the intended solution. I still cannot get the feature classes(values) to separate by a comma within the same cell(being comma delimited it shifts each value over to the next column as I mentioned). In this particular code the "v" on line 20(feature class names) are output to the spreadsheet, separated by a space(" ") in the same cell. Not a huge deal because the replace " " with "," can be done very quickly in the spreadsheet itself but it would be nice to work this into the code to improve reusability.
For a CSV file, use double-quotes around the cell content to preserve interior commas within, like this:
content1,content2,"content3,contains,commas",content4
Generally speaking, many libraries that output CSV just put all contents in quotes, like this:
"content1","content2","content3,contains,commas","content4"
As a side note, I'd strongly recommend using an existing library to create CSV files instead of reinventing the wheel. One such library is built into Python 2.6+.
As they say, "Good coders write. Great coders reuse."
import arcpy,collections,re,csv
arcpy.env.overwriteOutput = True
input = [# list of feature classes]
outfile= # path to output csv file
f=open(outfile,'wb')
csv_write=csv.writer(f)
csv_write.writerow(['Field','Feature Class'])
csv_write.writerow('')
mydict = collections.defaultdict(list)
for fc in input:
cmp=[]
lstflds=arcpy.ListFields(fc)
for fld in lstflds:
cmp.append(fld.name)
for item in cmp:
mydict[item].append(fc)
for keys, vals in mydict.items():
# remove these characters
char_removal = ["[","'","]"]
new_char = '[' + re.escape(''.join(char_removal)) + ']'
v=re.sub(new_char,'', str(vals))
csv_write.writerow([keys,""+v+""])
f.close()

Splitting lines in a file into string and hex and do operations on the hex values

I have a large file with several lines as given below.I want to read in only those lines which have the _INIT pattern in them and then strip off the _INIT from the name and only save the OSD_MODE_15_H part in a variable. Then I need to read the corresponding hex value, 8'h00 in this case, ans strip off the 8'h from it and replace it with a 0x and save in a variable.
I have been trying strip the off the _INIT,the spaces and the = and the code is becoming really messy.
localparam OSD_MODE_15_H_ADDR = 16'h038d;
localparam OSD_MODE_15_H_INIT = 8'h00
Can you suggest a lean and clean method to do this?
Thanks!
The following solution uses a regular expression (compiled to speed searching up) to match the relevant lines and extract the needed information. The expression uses named groups "id" and "hexValue" to identify the data we want to extract from the matching line.
import re
expression = "(?P<id>\w+?)_INIT\s*?=.*?'h(?P<hexValue>[0-9a-fA-F]*)"
regex = re.compile(expression)
def getIdAndValueFromInitLine(line):
mm = regex.search(line)
if mm == None:
return None # Not the ..._INIT parameter or line was empty or other mismatch happened
else:
return (mm.groupdict()["id"], "0x" + mm.groupdict()["hexValue"])
EDIT: If I understood the next task correctly, you need to find the hexvalues of those INIT and ADDR lines whose IDs match and make a dictionary of the INIT hexvalue to the ADDR hexvalue.
regex = "(?P<init_id>\w+?)_INIT\s*?=.*?'h(?P<initValue>[0-9a-fA-F]*)"
init_dict = {}
for x in re.findall(regex, lines):
init_dict[x.groupdict()["init_id"]] = "0x" + x.groupdict()["initValue"]
regex = "(?P<addr_id>\w+?)_ADDR\s*?=.*?'h(?P<addrValue>[0-9a-fA-F]*)"
addr_dict = {}
for y in re.findall(regex, lines):
addr_dict[y.groupdict()["addr_id"]] = "0x" + y.groupdict()["addrValue"]
init_to_addr_hexvalue_dict = {init_dict[x] : addr_dict[x] for x in init_dict.keys() if x in addr_dict}
Even if this is not what you actually need, having init and addr dictionaries might help to achieve your goal easier. If there are several _INIT (or _ADDR) lines with the same ID and different hexvalues then the above dict approach will not work in a straight forward way.
try something like this- not sure what all your requirements are but this should get you close:
with open(someFile, 'r') as infile:
for line in infile:
if '_INIT' in line:
apostropheIndex = line.find("'h")
clean_hex = '0x' + line[apostropheIndex + 2:]
In the case of "16'h038d;", clean_hex would be "0x038d;" (need to remove the ";" somehow) and in the case of "8'h00", clean_hex would be "0x00"
Edit: if you want to guard against characters like ";" you could do this and test if a character is alphanumeric:
clean_hex = '0x' + ''.join([s for s in line[apostropheIndex + 2:] if s.isalnum()])
You can use a regular expression and the re.findall() function. For example, to generate a list of tuples with the data you want just try:
import re
lines = open("your_file").read()
regex = "([\w]+?)_INIT\s*=\s*\d+'h([\da-fA-F]*)"
res = [(x[0], "0x"+x[1]) for x in re.findall(regex, lines)]
print res
The regular expression is very specific for your input example. If the other lines in the file are slightly different you may need to change it a bit.

Categories

Resources