Mapping Python for loops through an SPSS regression - python

I need to run two loops through my regression, one of them being the independent variable and the other is a suffix for the prediction I need to save with each round of independent variables. I can do either of these loops separately and it works fine but not when I combine them in the same regression. I think this has something to do with the loop mapping at the end of my regression after the %. I get the error code "TypeError: list indices must be integers, not str." But, that is because my Dependent variables are read as strings to get the values from SPSS data frame. Any way to map a for loop in a regression that includes string variables?
I have tried using the map() function, but I got the code that the iteration is not supported.
begin program.
import spss,spssaux
dependent = ['dv1', 'dv2', 'dv3', 'dv4', 'dv5']
spssSyntax = ''
depList = spssaux.VariableDict(caseless = True).expand(dependent)
varSuffix = [1,2,3,4,5]
for dep in depList:
for var in varSuffix:
spssSyntax += '''
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT %(dep)s
/METHOD=FORWARD iv1 iv2 iv3
/SAVE PRED(PRE_%(var)d).
'''%(depList[dep],varSuffix[var])
end program.
I get the error code 'TypeError: list indices must be integers, not str'
with the code above. How do I map the loop while also including a string?

In Python, when you loop directly through an iterable, the loop variable becomes the current value so there is no need to index original lists with depList[dep] and varSuffix[var] but use variables directly: dep and var.
Additionally, consider str.format for string interpolation which is the Python 3 preferred method rather than the outmoded, de-emphasized (not yet deprecated) string modulo % operator:
for dep in depList:
for var in varSuffix:
spssSyntax += '''REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT {0}
/METHOD=FORWARD iv1 iv2 iv3
/SAVE PRED(PRE_{1})
'''.format(dep, var)
Alternatively, consider combining the two lists for one loop using itertools.product, then use a list comprehension to build string with join instead of concatenating loop iterations with +=:
from itertools import product
import spss,spssaux
dependent = ['dv1', 'dv2', 'dv3', 'dv4', 'dv5']
depList = spssaux.VariableDict(caseless = True).expand(dependent)
varSuffix = [1,2,3,4,5]
base_string = '''REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT {0}
/METHOD=FORWARD iv1 iv2 iv3
/SAVE PRED(PRE_{1})
'''
# LIST COMPREHENSION UNPACKING TUPLES TO FORMAT BASE STRING
# JOIN RESULTING LIST WITH LINE BREAKS SEPARATING ITEMS
spssSyntax = "\n".join([base_string.format(*dep_var)
for dep_var in product(depList, varSuffix)])
Now if you need to iterate in parallel elementwise between the equal length lists consider zip instead of product:
spssSyntax = "\n".join([base_string.format(d,v)
for d,v in zip(depList, varSuffix)])
Or enumerate for index number:
spssSyntax = "\n".join([base_string.format(d,i+1)
for i,d in enumerate(depList)])

Related

list indices must be integers or slices, not decimal.Decimal - Using zip() function to take multiple iterables simultaneously

Pardon if my phrasing/code lingo isn't correct. Haven't been coding long.
So I am trying to iterate multiple variables simultaneously through a definition.
Sub = "SH"
columnName = ["sh01ia","sh01ib","sh01ic","sh01in","sh02ia","sh02ib","sh02ic","sh02in","sh03ia","sh03ib","sh03ic","sh03in"] #column names in currentminuteload
ncolumnName = ["cir01ia","cir01ib","cir01ic","cir01in","cir02ia","cir02ib","cir02ic","cir02in","cir03ia","cir03ib","cir03ic","cir03in"] #currentPeak column names in sub_historical_peak
tcolumnName = ["cir01iatime","cir01ibtime","cir01ictime","cir01intime","cir02iatime","cir02ibtime","cir02ictime","cir02intime","cir03iatime","cir03ibtime","cir03ictime","cir03intime"] #timestamp column names in sub_historical_peak
currentVal = [currentPeaks(i,tableName,currentYear,currentMonth)for i in columnName]
print("Successfully collected current peak values for {}".format(Sub))
histVal = histCheck(ncolumnName, histYear, currentYear, currentMonth, histMonth, Sub)
for (i,j,k,l,m) in zip(currentVal,histVal,columnName,ncolumnName,tcolumnName):
updateTable(tableName, histTable, currentYear, currentMonth, histYear, histMonth, currentVal[i], histVal[j], columnName[k], ncolumnName[l], tcolumnName[m])
print("Update complete")
currentVal and histVal each return a list of decimals. Note that some of the variables used in updateTable() are being passed through from elsewhere. I've tried to delete out parameters of the for loop to determine which was causing the problem, but no matter which or how many I delete, this error occurs.
TypeError: list indices must be integers or slices, not decimal.Decimal
I'm thinking that I'm using the wrong process to iterate through these variables simultaneously.
currentVal and histVal each return a list of decimals
In other words, currentVal is a list of decimal.Decimal.
Thus, the i in for (i, ...) in zip(currentVal, ...): is an element of currentVal - one of these Decimals (not an index into currentVal!).
You're using this Decimal to index into currentVal in the loop: currentVal[i], but that's an error because "list indices must be integers or slices, not decimal.Decimal".
for (i,j,k,l,m) in zip(currentVal,histVal,columnName,ncolumnName,tcolumnName):
updateTable(tableName, histTable, currentYear, currentMonth, histYear, histMonth, i, j, k, l, m)
needed to take variable names out of the updateTable(...) portion

Lists and indexes gives error because list out of range

I get an error when I try to run my funtion.
I know the reason. but I search a way to fix this.
list2=['name','position','salary','bonus']
list3=['name','position','salary']
def funtionNew(list):
print(len(list))
po= '{} {} {} {}'.format(list[0],list[1],list[2],list[3])
print(po)
funtionNew(list3)
So that I can make this for list2
po='{}{}{}{}'..format(list[0],list[1],list[2],list[3])
and make this for list3
po='{}{}{}'..format(list[0],list[1],list[2])
From the function implementation it seems like you try to concat the list items with spaces in between so you can try instead -
po=' '.join(list)
This is independent from the list length, however you have to make sure that all the items in the list are strings. So you can do the following -
po = ' '.join[str(s) for s in list]
Try the following:
def funtionNew(list):
print(len(list))
string_for_formatting = '{} ' * len(list)
po = string_for_formatting.format(*list)
print(po)
This creates a string with a variable number of {} terms according to the length of your list, and then uses format on the list. The asterisk *, unpacks the elements of your list as inputs for the function.

Sort list for date in Python

import re
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
"""
Takes *strats in date,return format
combines infinite amount of strats with date, return and packs them into
one
single sorted array
>> RETURN: combined list
"""
temp = []
# create combined list
for v in enumerate(strats):
i = 0
while i < len(v[1]):
temp.append(v[1][i])
#k = re.findall(r"[\w']+", temp)[:6]
i += 1
temp2 = sorted(timestamps, key=lambda d: tuple(map(int, re.findall(r"[\w']+", d[0]))))
return temp2
Hi,
I've been trying to finish this function, which should combine multiple lists of dates,percentage returns and sort them.
I've come across a solution with lambda but all I get is this message:
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
Do you know an easier solution to the problem or what the error is caused by? I can't seem to figure it out.
Anything appreciated :)
The very basic error in your code is in line:
for v in enumerate(strats):
You have apparently forgotten that enumerate(...) returns two
values: the index and the current value from the iterable.
So, as you used just single v, it gets the index, not the value.
Another important point is that if the datetime strings are written as
yyyy.MM.dd hh:mm:ss, you can sort them using just string sort.
So, to gather the strings, you need a list comprehension, with 2 nested
loops.
And to sort them, you should use sorted function, specifying as the sort
key the "initial" (date / time) part, before the comma.
To sum up, to get the sorted list of strings, taken from a couple of
arguments of your function, sorted on the date / time part,
you can use the following program, written using version 3.6 of Python:
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
temp = [ v2 for v1 in strats for v2 in v1 ]
return sorted(temp, key = lambda v: v.split(',')[0])
res = combine_strats_lambda(arr1, arr2)
for x in res:
parts = x.split(',')
print("{:20s} {:>6s}".format(parts[0], parts[1]))
It does not even use re module.

Python List comprehension and JSON parsing

I'm new to Python and trying to figure out the best way to parse the values of a JSON object into an array, using a list comprehension.
Here is my code - I'm querying the publicly available iNaturalist API and would like to take the JSON object that it returns, so that I take specific parts of the JSON object into a bumpy array:
import json
import urllib2
#Set Observations URL request for Resplendent Quetzal of Costa Rica
query = urllib2.urlopen("http://api.inaturalist.org/v1/observations?place_id=6924&taxon_id=20856&per_page=200&order=desc&order_by=created_at")
obSet = json.load(query)
#Print out Lat Long of observation
n = obSet['total_results']
for i in range(n) :
print obSet['results'][i]['location']
This all works fine and gives the following output:
9.5142456535,-83.8011438905
10.2335478381,-84.8517773638
10.3358965682,-84.9964271008
10.3744851815,-84.9871494128
10.2468720343,-84.9298072822
...
What I'd like to do next is replace the for loop with a list comprehension, and store the location value in a tuple. I'm struggling with the syntax in that I'm guessing it's something like this:
[(long,lat) for i in range(n) for (long,lat) in obSet['results'][i]['location']]
But this doesn't work...thanks for any help.
obSet['results'] is a list, no need to use range to iterate over it:
for item in obSet['results']:
print(item['location'])
To make this into list comprehension you can write:
[item['location'] for item in obSet['results']]
But, each location is coded as a string, instead of list or tuple of floats. To get it to the proper format, use
[tuple(float(coord) for coord in item['location'].split(','))
for item in obSet['results']]
That is, split the item['location'] string into parts using , as the delimiter, then convert each part into a float, and make a tuple of these float coordinates.
The direct translation of your code into a list comprehension is:
positions = [obSet['results'][i]['location'] for i in range(obSet['total_results'])]
The obSet['total_results'] is informative but not needed, you could just loop over obSet['results'] directly and use each resulting dictionary:
positions = [res['location'] for res in obSet['results']]
Now you have a list of strings however, as each 'location' is still the long,lat formatted string you printed before.
Split that string and convert the result into a sequence of floats:
positions = [map(float, res['location'].split(',')) for res in obSet['results']]
Now you have a list of lists with floating point values:
>>> [map(float, res['location'].split(',')) for res in obSet['results']]
[[9.5142456535, -83.8011438905], [10.2335478381, -84.8517773638], [10.3358965682, -84.9964271008], [10.3744851815, -84.9871494128], [10.2468720343, -84.9298072822], [10.3456659939, -84.9451804822], [10.3611732346, -84.9450302597], [10.3174360636, -84.8798676791], [10.325110706, -84.939710318], [9.4098152454, -83.9255607577], [9.4907141714, -83.9240819199], [9.562637289, -83.8170178428], [9.4373885911, -83.8312881263], [9.4766746409, -83.8120952573], [10.2651190176, -84.6360466565], [9.6572995298, -83.8322965118], [9.6997991784, -83.9076919066], [9.6811177044, -83.8487647156], [9.7416717045, -83.929327673], [9.4885099275, -83.9583968683], [10.1233252667, -84.5751029683], [9.4411815757, -83.824401543], [9.4202687169, -83.9550344212], [9.4620656621, -83.665183105], [9.5861809119, -83.8358881552], [9.4508914243, -83.9054016165], [9.4798058284, -83.9362558497], [9.5970449879, -83.8969131893], [9.5855562829, -83.8354434596], [10.2366179555, -84.854847472], [9.718459702, -83.8910277016], [9.4424384874, -83.8880459793], [9.5535916157, -83.9578166199], [10.4124554163, -84.9796942349], [10.0476688795, -84.298227929], [10.2129436252, -84.8384097435], [10.2052632717, -84.6053701877], [10.3835784147, -84.8677930134], [9.6079669672, -83.9084281155], [10.3583643315, -84.8069762134], [10.3975986735, -84.9196996767], [10.2060835381, -84.9698814407], [10.3322929317, -84.8805587129], [9.4756504472, -83.963818143], [10.3997876964, -84.9127311339], [10.1777433853, -84.0673088686], [10.3346128571, -84.9306278215], [9.5193346195, -83.9404786293], [9.421538224, -83.7689452093], [9.430427837, -83.9532672942], [10.3243212895, -84.9653175843], [10.021698503, -83.885674888]]
If you must have tuples rather than lists, add a tuple() call:
positions = [tuple(map(float, res['location'].split(',')))
for res in obSet['results']]
The latter also makes sure the expression works in Python 3 (where map() returns an iterator, not a list); you'd otherwise have to use a nested list comprehension:
# produce a list of lists in Python 3
positions = [[float(p) for p in res['location'].split(',')] for res in obSet['results']]
Another way to get list of [long, lat] without list comprehension:
In [14]: map(lambda x: obSet['results'][x]['location'].split(','), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you would like list of tuples instead:
In [14]: map(lambda x: tuple(obSet['results'][x]['location'].split(',')), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you want to convert to floats too:
In [17]: map(lambda x: tuple(map(float, obSet['results'][x]['location'].split(','))), range(obSet['total_results']))
Out[17]:
[(9.5142456535, -83.8011438905),
(10.2335478381, -84.8517773638),
(10.3358965682, -84.9964271008),
(10.3744851815, -84.9871494128),
(10.2468720343, -84.9298072822),
(10.3456659939, -84.9451804822),
...
You can iterate over the list of results directly:
print([tuple(result['location'].split(',')) for result in obSet['results']])
>> [('9.5142456535', '-83.8011438905'), ('10.2335478381', '-84.8517773638'), ... ]
[tuple(obSet['results'][i]['location'].split(',')) for i in range(n)]
This will return a list of tuple, elements of the tuples are unicode.
If you want that the elements of tuples as floats, do the following:
[tuple(map(float,obSet['results'][i]['location'].split(','))) for i in range(n)]
To correct way to get a list of tuples using list comprehensions would be:
def to_tuple(coords_str):
return tuple(coords_str.split(','))
output_list = [to_tuple(obSet['results'][i]['location']) for i in range(obSet['total_results'])]
You can of course replace to_tuple() with a lambda function, I just wanted to make the example clear. Moreover, you could use map() to have a tuple with floats instead of string: return tuple(map(float,coords_str.split(','))).
Let's try to give this a shot, starting with just 1 location:
>>> (long, lat) = obSet['results'][0]['location']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Alright, so that didn't work, but why? It's because the longitude and latitude coordinates are just 1 string, so you can't unpack it immediately as a tuple. We must first separate it into two different strings.
>>> (long, lat) = obSet['results'][0]['location'].split(",")
From here we will want to iterate through the whole set of results, which we know are indexed from 0 to n. tuple(obSet['results'][i]['location'].split(",")) will give us the tuple of longitude, latitude for the result at index i, so:
>>> [tuple(obSet['results'][i]['location'].split(",")) for i in range(n)]
ought to give us the set of tuples we want.

PYTHON problem with negative decimals

I have a list of negative floats. I want to make a histogram with them. As far as I know, Python can't do operations with negative numbers. Is this correct? The list is like [-0.2923998, -1.2394875, -0.23086493, etc.]. I'm trying to find the maximum and minimum number so I can find out what the range is. My code is giving an error:
setrange = float(maxv) - float(minv)
TypeError: float() argument must be a string or a number
And this is the code:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
lineval = line.split()
print lineval
val.append(lineval)
print val
#val = map(float,val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
All the values that are being put into the 'val' list are negative decimals. What is the error referring to, and how do I fix it?
The input file looks like:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The results of split() are a list of split values, which is probably why you are getting that error.
For example, if you do '-0.2'.split(), you get back a list with a single value ['-0.2'].
EDIT: Aha! With your input file provided, it looks like this is the problem: -0.733029194040.765257900121. I think you mean to make that two separate floats?
Assuming a corrected file like this:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040 -0.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The following code will no longer throw that exception:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
linevals = line.split()
print linevals
val += linevals
print val
val = map(float, val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
I have changed it to take the list result from split() and concatenate it to the list, rather than append it, which will work provided there are valid inputs in your file.
All the values that are being put into the 'val' list are negative decimals.
No, they aren't; they're lists of strings that represent negative decimals, since the .split() call produces a list. maxv and minv are lists of strings, which can't be fed to float().
What is the error referring to, and how do I fix it?
It's referring to the fact that the contents of val aren't what you think they are. The first step in debugging is to verify your assumptions. If you try this code out at the REPL, then you could inspect the contents of maxv and minv and notice that you have lists of strings rather than the expected strings.
I assume you want to put all the lists of strings (from each line of the file) together into a single list of strings. Use val.extend(lineval) rather than val.append(lineval).
That said, you'll still want to map the strings into floats before calling max or min because otherwise you will be comparing the strings as strings rather than floats. (It might well work, but explicit is better than implicit.)
Simpler yet, just read the entire file at once and split it; .split() without arguments splits on whitespace, and a newline is whitespace. You can also do the mapping at the same point as the reading, with careful application of a list comprehension. I would write:
with open('clusters_scores.out') as f:
val = [float(x) for x in f.read().split()]
result = max(val) - min(val)

Categories

Resources