Strip prefix from all variable names in SPSS

Strip prefix from all variable names in SPSS - python

I have a similar question as asked here (Strip suffix from all variable names in SPSS) and the answers there already helped a lot but there is still one question remaining.
I have a dataset in which every variable name has the prefix "v23_1_". I want to remove this prefix from all variables, but there are hundreds of them, so I am looking for a way to do it without using the RENAME statement hundreds of times.
I used this code:
begin program.
vdict=spssaux.VariableDict()
mylist=vdict.range(start="v23_1_dg_mnpdocid", end="v23_1_phq9t0_asku3t0")
nvars = len(mylist)
for i in range(nvars):
myvar = mylist[i]
mynewvar = myvar.strip("v23_1_")
spss.Submit(r"""
rename variables ( %s = %s) .
""" %(myvar, mynewvar))
end program.
Here is a list of the first few variables:
v23_1_dg_mnppusid
v23_1_dg_sigstatus
v23_1_dg_mnpvsno
v23_1_dg_mnpvslbl
v23_1_dg_mnpcvpid
v23_1_dg_mnpvisid
v23_1_dg_mnpvisno
v23_1_dg_mnpvispdt
v23_1_dg_mnpvisfdt
v23_1_dg_mnpfs0
v23_1_dg_mnpfs1
v23_1_dg_mnpfs2
v23_1_dg_mnpfs3
v23_1_dg_mnpfcs0
v23_1_dg_mnpfcs1
v23_1_dg_mnpfcs2
It worked ok for the first variables but then stopped with the message "renaming has created two variables named dg_mnpfs". But the next variable would after stripping has the name "dg_mnpfs2". What has happened is that the 1 at the end in "v23_1_dg_mnpfs1" gets deleted too. And then it propbably intends to also delete the 2 at the end in "v23_1_dg_mnpfs2", which will then lead to the same variable. I don't understand why this is happening and how I can avoid it.
Thanks a lot for your support!
Kind regards,
Beate

As you syntax looks right now, it will run on a variable-by-variable basis. You are submitting/running the RENAME VARIABLES command as many times as the number of variables in your list.
On one hand, this is in-efficient, as it takes longer to run than what I am suggesting below.
On the other (and more important) hand, doing it variable by variable, does not guard against duplicate variables. I am guessing that you already have in your datafile a variable named dg_mnpfs, and you are attempting to create a new one by renaming v23_1_dg_mnpfs. Just check your datafile, after your python code breaks.
A more efficient way of writing you code would be to create lists with the old names, and new names, and submit the syntax with only one command.
begin program.
import spss,spssaux
vdict=spssaux.VariableDict()
mylist=vdict.range(start="v23_1_dg_mnpdocid", end="v23_1_phq9t0_asku3t0")
nvars = len(mylist)
my_new_list=[]
for i in range(nvars):
myvar = mylist[i]
mynewvar = myvar.strip("v23_1_")
my_new_list.append(mynewvar)
my_syntax="ren var (" + " ".join(mylist) + "=" + " ".join(my_new_list) +")."
spss.Submit(my_syntax)
end program.
And one more thing: the strip function removes the text from both ends of the variables. If you only want to remove the prefix, consider using lstrip. Details can be found here, in the official documentation.

Here's a version of the process using SPSS macro. Using SPSSINC SELECT VARIABLES lets you get the whole list of all relevant variables, whatever order they are in, without naming them in the command:
*this is just to create a sample data to play with.
data list list/v23_1_var1 to v23_1_var6.
begin data
end data.
The following creates a list of the relevant variables:
SPSSINC SELECT VARIABLES MACRONAME="!list" /PROPERTIES PATTERN = "v23_1_*".
* the following macro creates one rename command for all the list.
define !doRename ()
rename variables (!eval(!list)=!do !i !in(!eval(!list)) !substr(!i, 7) !doend).
!enddefine.
!doRename .

Related

Extract text from a config file [duplicate]

This question already has answers here:
Parse key value pairs in a text file
(7 answers)
Closed 1 year ago.
I'm using a config file to inform my Python script of a few key-values, for use in authenticating the user against a website.
I have three variables: the URL, the user name, and the API token.
I've created a config file with each key on a different line, so:
url:<url string>
auth_user:<user name>
auth_token:<API token>
I want to be able to extract the text after the key words into variables, also stripping any "\n" that exist at the end of the line. Currently I'm doing this, and it works but seems clumsy:
with open(argv[1], mode='r') as config_file:
lines = config_file.readlines()
for line in lines:
url_match = match('jira_url:', line)
if url_match:
jira_url = line[9:].split("\n")[0]
user_match = match('auth_user:', line)
if user_match:
auth_user = line[10:].split("\n")[0]
token_match = match('auth_token', line)
if token_match:
auth_token = line[11:].split("\n")[0]
Can anybody suggest a more elegant solution? Specifically it's the ... = line[10:].split("\n")[0] lines that seem clunky to me.
I'm also slightly confused why I can't reuse my match object within the for loop, and have to create new match objects for each config item.

you could use a .yml file and read values with yaml.load() function:
import yaml
with open('settings.yml') as file:
settings = yaml.load(file, Loader=yaml.FullLoader)
now you can access elements like settings["url"] and so on

If the format is always <tag>:<value> you can easily parse it by splitting the line at the colon and filling up a custom dictionary:
config_file = open(filename,"r")
lines = config_file.readlines()
config_file.close()
settings = dict()
for l in lines:
elements = l[:-1].split(':')
settings[elements[0]] = ':'.join(elements[1:])
So, you get a dictionary that has the tags as keys and the values as values. You can then just refer to these dictionary entries in your pogram.
(e.g.: if you need the auth_token, just call settings["auth_token"]

if you can add 1 line for config file, configparser is good choice
https://docs.python.org/3/library/configparser.html
[1] config file : 1.cfg
[DEFAULT] # configparser's config file need section name
url:<url string>
auth_user:<user name>
auth_token:<API token>
[2] python scripts
import configparser
config = configparser.ConfigParser()
config.read('1.cfg')
print(config.get('DEFAULT','url'))
print(config.get('DEFAULT','auth_user'))
print(config.get('DEFAULT','auth_token'))
[3] output
<url string>
<user name>
<API token>
also configparser's methods is useful
whey you can't guarantee config file is always complete

You have a couple of great answers already, but I wanted to step back and provide some guidance on how you might approach these problems in the future. Getting quick answers sometimes prevents you from understanding how those people knew about the answers in the first place.
When you zoom out, the first thing that strikes me is that your task is to provide config, using a file, to your program. Software has the remarkable property of solve-once, use-anywhere. Config files have been a problem worth solving for at least 40 years, so you can bet your bottom dollar you don't need to solve this yourself. And already-solved means someone has already figured out all the little off-by-one and edge-case dramas like stripping line endings and dealing with expected input. The challenge of course, is knowing what solution already exists. If you haven't spent 40 years peeling back the covers of computers to see how they tick, it's difficult to "just know". So you might have a poke around on Google for "config file format" or something.
That would lead you to one of the most prevalent config file systems on the planet - the INI file. Just as useful now as it was 30 years ago, and as a bonus, looks not too dissimilar to your example config file. Then you might search for "read INI file in Python" or something, and come across configparser and you're basically done.
Or you might see that sometime in the last 30 years, YAML became the more trendy option, and wouldn't you know it, PyYAML will do most of the work for you.
But none of this gets you any better at using Python to extract from text files in general. So zooming in a bit, you want to know how to extract parts of lines in a text file. Again, this problem is an age-old problem, and if you were to learn about this problem (rather than just be handed the solution), you would learn that this is called parsing and often involves tokenisation. If you do some research on, say "parsing a text file in python" for example, you would learn about the general techniques that work regardless of the language, such as looping over lines and splitting each one in turn.
Zooming in one more step closer, you're looking to strip the new line off the end of the string so it doesn't get included in your value. Once again, this ain't a new problem, and with the right keywords you could dig up the well-trodden solutions. This is often called "chomping" or "stripping", and with some careful search terms, you'd find rstrip() and friends, and not have to do awkward things like splitting on the '\n' character.
Your final question is about re-using the match object. This is much harder to research. But again, the "solution" wont necessarily show you where you went wrong. What you need to keep in mind is that the statements in the for loop are sequential. To think them through you should literally execute them in your mind, one after one, and imagine what's happening. Each time you call match, it either returns None or a Match object. You never use the object, except to check for truthiness in the if statement. And next time you call match, you do so with different arguments so you get a new Match object (or None). Therefore, you don't need to keep the object around at all. You can simply do:
if match('jira_url:', line):
jira_url = line[9:].split("\n")[0]
if match('auth_user:', line):
auth_user = line[10:].split("\n")[0]
and so on. Not only that, if the first if triggered then you don't need to bother calling match again - it will certainly not trigger any of other matches for the same line. So you could do:
if match('jira_url:', line):
jira_url = line[9:].rstrip()
elif match('auth_user:', line):
auth_user = line[10:].rstrip()
and so on.
But then you can start to think - why bother doing all these matches on the colon, only to then manually split the string at the colon afterwards? You could just do:
tokens = line.rstrip().split(':')
if token[0] == 'jira_url':
jira_url = token[1]
elif token[0] == 'auth_user':
auth_user = token[1]
If you keep making these improvements (and there's lots more to make!), eventually you'll end up re-writing configparse, but at least you'll have learned why it's often a good idea to use an existing library where practical!

Maya Python, Renaming joints: more than one object matches name

Ok, so I get two errors whenever I try to run this script: but before I get ahead of myself: lets get to my objective.
create two joint chains, names are unimportant: but essentially I know that you can use brackets to list and isolate joint chains with matching children names. Instead my script seems to be ignoring the brackets and giving me the error anyways. I've tried every different flag for list relatives: but all that seems to do is change the error to something else.
I know that if this script was properly working it would only work on one joint chain because of the hardcoded names: but the script I'm pulling it from has name prefexes tied to the GUI to avoid hardcoding and allow adaptive naming: I'm only using the hardcoded as an example for this script. My complaint is this script doesn't work on ANY joint chain because I keep getting the error "more than one object matches name."
To run the script,save the following code as a .py in your maya documents script folder, restart your copy of maya, then open a new python tab and run the first three lines of code above import maya.cmds
'''
import exampleScriptTemplate
reload (exampleScriptTemplate)
exampleScriptTemplate.gui()
'''
import maya.cmds as cmds
if cmds.window("buildWin", exists =True):
cmds.deleteUI("buildWin", window = True)
myWindow = cmds.window("buildWin",t='DS_pvFinder',rtf=1,w=100, h=100, toolbox=True)
column = cmds.columnLayout(adj=True)
def gui(*args):
cmds.columnLayout()
cmds.button(w=300,label='build placement curve',c=printMultiple)
cmds.showWindow(myWindow)
def printMultiple(*args):
root = cmds.ls(sl=True)[0]
child = cmds.listRelatives(root,ad=1,f=True,children=True,type='joint')
child.append(root)
child.reverse()
limbJnt = child
print (child)
armroot = []
for j in limbJnt:
wstJnt = cmds.rename(child[3], 'wrist_BIND')
elbJnt = cmds.rename(child[2], 'elbow_BIND')
sdrJnt = cmds.rename(child[1], 'shoulder_BIND')
clvJnt = cmds.rename(child[0], 'clavicle_BIND')
armroot.append(j)
return armroot
I know I'm in the right ballpark. I just need to know how to properly use the brackets to store the list of what i'm selecting instead of searching all of worldspace and breaking.
Thank you for your help

The code you provided is incomplete, no window is opening, so I tried only the printMultiple function which causes a Error: No object matches name in my case.
Your code cannot work like this since you mix hardcoded names with a loop which does nothing. I suppose your main problem is the order of your renamings. The child array contains absolute names like:
[u'joint1', u'|joint1|joint2', u'|joint1|joint2|joint3']
If you now rename child[0] to 'clavicle_BIND', all the remaining elements in the list become invalid because their real names in the scene now look like this:
[u'clavicle_BIND', u'|clavicle_BIND|joint2', u'|clavicle_BIND|joint2|joint3']
What results in an error at the second rename line. Inverting the order sovles this problem, first rename the leaf node, then the ones above.

Python loop error in SPSS syntax only if i run the same code twice

I'm quite new in python programming.
I'm trying to automate some tabulations in SPSS using python (and i kind of managed it...) using a loop and some python code, but it works fine only the first time i run the syntax, the second time it tabulates only once:
I have an SPSS file with different projects merged together (i.e. different countries) , so first i try to extract a list of projects using a built in function.
Once i have my list of project i run a loop and i change the spss syntax for the case selection and tabulation.
this is the code:
begin program.
import spss
#Function that extracts the data from spss
def DatiDaSPSS(vars, num):
if num == 0:
num = spss.GetCaseCount()
if vars == None:
varNums = range(spss.GetVariableCount())
else:
allvars = [spss.GetVariableName(i) for i in range(spss.GetVariableCount())]
varNums = [allvars.index(i) for i in vars]
data = spss.Cursor(varNums)
pydata = data.fetchmany(num)
data.close()
return pydata
#store the result of the function into a list:
all_prj=DatiDaSPSS(vars=["Project"],num=0)
#remove duplicates and keep only the country that i need:
prj_list=list(set([i[0] for i in all_prj]))
#loop for the tabulation:
for i in range(len(prj_list)):
prj_now=str(prj_list[i])
spss.Submit("""
compute filter_$=Project='%s'.
filter by filter_$.
exe.
TEXT "Country"
/OUTLINE HEADING="%s" TITLE="Country".
CTABLES
/VLABELS VARIABLES=HisInterviewer HisResult DISPLAY=DEFAULT
/TABLE HisInterviewer [C][COUNT F40.0, ROWPCT.COUNT PCT40.1] BY HisResult [C]
/CATEGORIES VARIABLES=HisInterviewer HisResult ORDER=A KEY=VALUE EMPTY=EXCLUDE TOTAL=YES
POSITION=AFTER
/CRITERIA CILEVEL=95.
""" %(prj_now,prj_now))
end program.
When i run it the second time it shows only the last value of the list (and only one tabulation). If i restart SPSS it works fine the first time.
Is it because of the function?
i'm using spss25

can I reply myself, should i edit the discussion or maybe delete it? i think i found out the reason, i guess the function picks up only the values that are already selected, i tried now adding this SPSS code before the begin and it seems to be working:
use all.
exe.
begin program.
...
at the last loop there is a filter on the data and i removed it before of running the script. please let me know if you want me to edit or remove the message

Key Error despite code previously working

predictors = ['gender','SeniorCitizen','Partner', 'Dependents','tenure','PhoneService',\
'MultipleLines','InternetService','OnlineSecurity','OnlineBackup','DeviceProtection',\
'TechSupport','StreamingTV','StreamingMovies','Contract','PaperlessBilling',\
'PaymentMethod','MonthlyCharges','TotalCharges']
This line of code is from a project I have to do for a school assignment to create a decision tree. I copied and pasted the line from a different project, changing only the variable names, so it should have worked. I've already double checked the variable names in the code against the data file I'm using. But for some reason I'm getting this error:
KeyError: "['gender' 'Partner' 'Dependents' 'PhoneService' 'MultipleLines'\n 'InternetService' 'OnlineSecurity' 'OnlineBackup' 'DeviceProtection'\n 'TechSupport' 'StreamingTV' 'StreamingMovies' 'Contract'\n 'PaperlessBilling' 'PaymentMethod'] not in index"
I know the data is in the .csv file that I'm using, and I know it worked earlier in the same code in the k-nearest-neighbor method.
EDIT: Here is the line Python doesn't like:
dt.fit(training[predictors],training['Churn'])
And here is the definition of dt:
dt = tree.DecisionTreeClassifier(criterion='entropy')
EDIT 2: Here is a list of all the dummy variables in this list
data = pandas.get_dummies(data, columns=['gender','SeniorCitizen','Partner','Dependents','PhoneService','MultipleLines', 'InternetService', \
'OnlineSecurity','OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', \
'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod'])

It seems you forgot to put comma after every variable of your predictors.
See the error, it says
KeyError: "['gender' 'Partner' 'Dependents' 'PhoneService' 'MultipleLines'\n 'InternetService' 'OnlineSecurity' 'OnlineBackup' 'DeviceProtection'\n 'TechSupport' 'StreamingTV' 'StreamingMovies' 'Contract'\n 'PaperlessBilling' 'PaymentMethod'] not in index"
You said that you copied the line from different source, maybe they don't have comma between variables (Not sure).
So here, you have no comma after 'gender'(, -> you should put comma) 'Partner'(,-> you should put comma) ... . So you are getting error because program assume all the variables as one variable.
Try to edit and may it work.

%s without accompanying variable - how does this work?

I've been looking over some github repos of python scrapy spiders. In this repo I found the following lines of code:
FIRSTPAGE_URL_SYNTAX = 'http://%s.tumblr.com'
OTHERPAGE_URL_SYNTAX = 'http://%s.tumblr.com/page/%s'
name = "tumblr"
According to the documentation and SO thread that I've found, %s requires an in-line reference to a variable. As you can see the code above contains no such reference. Is this working code? Why?

Those variables serve as a template. Later in the code, you'll see something like
FIRSTPAGE_URL_SYNTAX % user
or
OTHERPAGE_URL_SYNTAX % (user, page)
You can do the same thing with {} in strings:
template = "{} blah blah {}"
print(template.format(s1, s2))
This allows for a user to easily repeatedly use this one variable, if they need to reference multiple URLs with similar bodies, as opposed to having to retype the entire URL every time.

Those are just strings. All characters, including %s, are valid in strings. The requirement for a substitution value comes when you use the % operator, which they're not doing here. They're declaring those strings as templates, and will substitute values into them later.
By declaring them all in one place like that, it makes it easier to find them and change them. It's good coding practice.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.