spss.Cursor and Frequency - python

I am trying to find an alternate/faster method to running the Frequency command on a single variable and writing the number of times the value appears in the dataset to a new variable. My current setup uses Syntax and writes the output to a new SAV file (oms send), which take several hours to run.
I am looking for some sample code that might show how this can be done with spss.Cursor, where it first reads the variable I want to get the Frequency on, saved it to a list by the number of times each value occurs, then writes the value to a new variable within the current dataset.
I understand how the read and write cursors work, but am having an issue how to count the number of times the variable occurs/stores it in a list, which is then written to the new variable. I have read through the Spss/python plugin manual, and haven't been able to recognize the solution. Thanks!

Have you considered the AGGREGATE command with MODE = ADDVARIABLES? For example:
AGGREGATE OUTFILE = * MODE = ADDVARIABLES
/BREAK = var1
/var1_n = n.

Related

Generate always unique ID every-time it called with Python

What is the best way to create a python script which by calling which, it will ALWAYS generate a new UNIQUE ID (Autoincremental)
You run the script and it will tell you 1, then close script and open again and will tell you 2.
Purpose of it is to create a script which will be used across and this ID will be used to track the latest changes and so on.
P.S. I'm not talking to make a function which will do it.
import uuid
uniqueid = uuid.uuid1()
Since you didnt provide any code, I will also not provide any code.
Solution 1: Unique ID
1) TIME: create function to give you timestamp
2) ID: create function that generate long string with random numbers and letters
This is of course 'risky' because there is a chance you will generate already existing ID, but from statistical point of view, it is so called 'impossible even if it is possible'
save in file or somewhere
Solution 2: offset - incremental
1) have file with a 0 in it.
2) open a file, read line, convert to integer, increment to +1, write in a file.
Note:
Your title is wrong. One moment you talk about UNIQUE ID, Next moment you are talking about offset. Unique ID and counting running python script are quite contradicting ideas
I assume you have a script, it will generate some result every time it is executed. Then you need need a value that (1) distinguish one result from another and (2) shows which result came last. Is that right? If so, we have many options here. In the simplest case (a script always running in the same machine) I would suggest two options
Save a count to a file
In this case, you would have a file and would read the number from it:
try:
with open('count.txt') as count_file:
content = count_file.read()
count = int(content)
except Exception:
count = 0
After doing whatever your script does, you would write to the file the value you've read, but incremented:
with open('count.txt', 'w') as count_file:
count_file.write(str(count + 1))
Save the timestamp
A simpler option, however, is not to increment a value but get a timestamp. You could use time.time(), that returns the number of seconds since Unix epoch:
>>> import time
>>> time.time()
1547479233.9383247
You will always know which result came later than the others. Personally, however, I would rather format the current time, it is easier to read and reason about:
>>> from datetime import datetime
>>> datetime.now().strftime('%Y%m%d%H%M%S')
'20190114132407'
Those are basic ideas, you may need to pay attention to corner cases and possible failures (especially with the file-based solution). That said, I guess those are quite viable first steps.
A technical note
What you want here is to a program to remember a piece of information between two or more executions, and we have a technical term for that: the information should be persistent. Since you asked for an autoincrementing feature, you wanted a persistent count. I suspect, however, you do not need that if you use the timestamp option. It is up to you to decide what to do here.
I had the same situation. I ended up in creating a CSV file so that I can map variable names.
def itemId_generator(itemIdLocation):
# Importing value of ItemID from the csv file
df = pd.read_csv(itemIdLocation)
#return value which is current ItemID in the csv file
ItemId = df.loc[0, 'ItemId']
# If loop to set limit to maximum Item ID
if ItemId>= 10000:
df.loc[0, 'ItemId'] = 1
elif ItemId<10000:
# updating the column value/data
df.loc[0, 'ItemId'] = df.loc[0,'ItemId'] + 1
else:
print("Invalid value returned")
sys.exit()
# writing new ItemID into the file
df.to_csv(itemIdLocation, index=False)
# The function .item() converts numpy integer to a normal int
return str(ItemId.item())
If there is any chance of the file being accessed concurrently, it is best to lock the file. Keep trying if the file is locked.
http://tilde.town/~cristo/file-locking-in-python.html
Old answer:
You could store it as an environment variable on the system. If not set, initialise to 1. Else increment it by 1.

spss python string occurrences in a variable

How to count number of occurrences in string using spss python.I am trying to count the number of "i" in a particular entire variable columns. I am new to spss python. Can anyone help me with this.
You don't need python for this task, just work with regular SPSS syntax.
Paste the following in a syntax window, and change "MyString" into the name of your string variable:
EDIT: changed to dinamic sized loop:
compute Nletters=length(rtrim(MyString)).
compute MyCount=0.
loop #n=1 to Nletters.
if char.substr(MyString,#n,1)="i" MyCount=MyCount+1.
end loop.
exe.
This will create a new variable that will contain the number of "i"s in the string in each line.
You can now sum that up for the whole column using the aggregate command or the following command to get the sum in the output window:
means MyCount /cells=sum.

How to 'flatten' lines from text file if they meet certain criteria using Python?

To start I am a complete new comer to Python and programming anything other than web languages.
So, I have developed a script using Python as an interface between a piece of Software called Spendmap and an online app called Freeagent. This script works perfectly. It imports and parses the text file and pushes it through the API to the web app.
What I am struggling with is Spendmap exports multiple lines per order where as Freeagent wants One line per order. So I need to add the cost values from any orders spread across multiple lines and then 'flatten' the lines into One so it can be sent through the API. The 'key' field is the 'PO' field. So if the script sees any matching PO numbers, I want it to flatten them as per above.
This is a 'dummy' example of the text file produced by Spendmap:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
The above has been formatted for easier reading and normally is just one line after the next with no text formatting.
The 'key' or PO field is the first bold item and the second bold/italic item is the cost to be totalled. So if this example was to be passed through the script id expect the first row to be left alone, the Second and Third row costs to be added as they're both from the same PO number and the Fourth line to left alone.
Expected result:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,401.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
Any help with this would be greatly appreciated and if you need any further details just say.
Thanks in advance for looking!
I won't give you the solution. But you should:
Write and test a regular expression that breaks the line down into its parts, or use the CSV library.
Parse the numbers out so they're decimal numbers rather than strings
Collect the lines up by ID. Perhaps you could use a dict that maps IDs to lists of orders?
When all the input is finished, iterate over that dict and add up all orders stored in that list.
Make a string format function that outputs the line in the expected format.
Maybe feed the output back into the input to test that you get the same result. Second time round there should be no changes, if I understood the problem.
Good luck!
I would use a dictionary to compile the lines, using get(key,0.0) to sum values if they exist already, or start with zero if not:
InputData = """5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP COMMENT,002143"""
OutD = {}
ValueD = {}
for Line in InputData.split('\n'):
# commas in comments won't matter because we are joining after anyway
Fields = Line.split(',')
PO = Fields[3]
Value = float(Fields[5])
# set up the output string with a placeholder for .format()
OutD[PO] = ",".join(Fields[:5] + ["{0:.3f}"] + Fields[6:])
# add the value to the old value or to zero if it is not found
ValueD[PO] = ValueD.get(PO,0.0) + Value
# the output is unsorted by default, but you could sort or preserve original order
for POKey in ValueD:
print OutD[POKey].format(ValueD[POKey])
P.S. Yes, I know Capitals are for Classes, but this makes it easier to tell what variables I have defined...

pulling an excel sheet with =rand(between) in a Python while loop, and exporting results as .dbf

To preface my question: I am very new to stack overflow, and relatively new to Python.
I am working on setting up a sensitivity analysis. I am working with 40 parameters that range from 0.1 - 1. My analysis requires simultaneously varying these parameters by +-0.1 roughly ~500 times. These values will then be fed into an ArcGIS tool. So, I need 500 sets of random values, for 40 parameters. These parameter values will then be compared to the output of the tool, to see which parameters the model is the most sensitive to. I've set up an Excel sheet that will randomly calculate these values each time it's opened, but the issue is that they need to be in .dbf format to be read by the ArcGIS tool.
I have set up a while loop (for 10 iterations to start, but will need to be ~500) and tried two different methods, in hopes that I could automate the process of calling the .xls to generate random numbers, and then exporting it to .dbf.
The first, arcpy.CopyRows_management correctly exported to .dbf. The issue was that the output was the exact same for each iteration, and instead of having values of 0.1, 0.2, 0.3 etc, it contained values of 0.22, 0.37, 0.68 etc. It wasn't to the tenths, even though that was specified in the formulas in the .xls.
I also tried arcpy.TabletoTable_conversion but that was throwing the ERROR 999999: Error executing function.
I am open to all kinds of suggestions. Perhaps there is an easier way to randomly sample and export results to .dbf in Python. This does not need to be done using arcpy, but that is all I've really worked with. I really appreciate any help that is provided! Thanks for your time.
i = 0
while i < 10:
# Set run specific variables
lulc = "D:\\SARuns\\lulc_nosoils_rand.xls\\lulc_nosoils$"
folder = "D:\\SARuns"
print "Reading lulc"
newlulc = "D:\\SARuns\\lulc_nosoils_rand"+str(i)+".dbf"
print "Reading newlulc"
# Copy rows is copying it to a dbf, but the values inside
# are the same for each run. And, none are correct.
arcpy.CopyRows_management(lulc, newlulc)
# Table to table should work. But isn't.
# arcpy.TableToTable_conversion(lulc, folder, newlulc)
print "Converting table"
i+= 1
When calculating values in Excel you will always get all decimals. Even if you specify the numer format to 1 decimal, the whole number is still there. So I assume that is why you do not get the "exact numbers" with one decimal. Apply a round function to get one decimal only.
Copy rows does only copy the values out of your dbf, it does not open Excel to call that formula calculating a new random value.
So you will either have to calculate the random values in Python and create dBase Output or you need to actually open up an Excel instance - wich shoult trigger your new random values - and save that excel sheet as dBase.
Maybe populating an ArcMap table with your values using CalculateField_management and exporting that data to dBase would work, i.e. with something like this:
arcpy.TableToDBASE_conversion(tableName, outPath)

python. write to file, cannot understand behavior

I don't understand why I cannot write to file in my python program. I have list of strings measurements. I want just write them to file. Instead of all strings it writes only 1 string. I cannot understand why.
This is my piece of code:
fmeasur = open(fmeasur_name, 'w')
line1st = 'rev number, alg time\n'
fmeasur.write(line1st)
for i in xrange(len(measurements)):
fmeasur.write(measurements[i])
print measurements[i]
fmeasur.close()
I can see all print of these trings, but in the file there is only one. What could be the problem?
The only plausible explanation that I have is that you execute the above code multiple times, each time with a single entry in measurements (or at least the last time you execute the code, len(measurements) is 1).
Since you're overwriting the file instead of appending to it, only the last set of measurements would be present in the file, but all of them would appear on the screen.
edit Or do you mean that the data is there, but there's no newlines between the measurements? The easiest way to fix that is by using print >>fmeasur, measurements[i] instead of fmeasur.write(...).

Categories

Resources