iterating through a user defined list with arcpy - python

taxNo = arcpy.GetParameterAsText(0)
thisMap = arcpy.mapping.MapDocument("CURRENT")
myDF = arcpy.mapping.ListDataFrames(thisMap)[0]
myLayers = arcpy.mapping.ListLayers(myDF)
for lyr in myLayers:
if lyr.name == "Address Numbers":
arcpy.SelectLayerByAttribute_management(lyr,"NEW_SELECTION","EKEY = " + taxNo[0])
for tax in taxNo:
arcpy.SelectLayerByAttribute_management(lyr,"ADD_TO_SELECTION","EKEY = " + tax)
arcpy.AddWarning("Additional Selection " + tax)
I'm trying to build a script in ArcGIS that will select a series of user defined values, in this case I'm trying to select 1784102 and 1784110. When I use arcpy.AddWarning(taxNo) before the loop, I get the output "1784102;1784110" but it's iterating through it one number at a time i.e.
Additional Selection 1
Additional Selection 7
Additional Selection 8
Additional Selection 4
etc.
then pops up an error when it hits the semi-colon.
The parameters for taxNo are set up in ArcMap as a String, Multivalue, Valuelist.

I will just assume you are calling your script like this:
python script.py 1784102;1784110
Your variable taxNo = arcpy.GetParameterAsText(0) then is a single string "1784102;1784110". If you use "array indexes" on strings (for example taxNo[0], taxNo[1] etc.) you are getting single characters out of that string, i.e. "1", "7", "8" ...
Call .split(';') to your arcpy.GetParameterAsText(0) result to split the string "1784102;1784110" into an array of two strings: ["1784102", "1784110"]. If you need a numeric item, i.e. integers, try this too.
taxNo = arcpy.GetParameterAsText(0).split(';')

Related

Need to filter some strings elements but I get TypeError: unsupported operand type(s) for |: 'str' and 'str'

So I'm using pandas to filter a csv and I need to filter three different string elements of a column, but when I use the or (|) I get that mistake. Any other way I can filter many strings without having to name different variables to act like one filter each? This is the code:
# What percentage of people with advanced education (`Bachelors`, `Masters`, or `Doctorate`) make more than 50K?
bdegree = df[(df["education"] == "Bachelors") & (df["salary"] >= "50K")].count()
mdegree = df[(df["education"] == "Masters") & (df["salary"] >= "50K")].count()
phddegree = df[(df["education"] == "Doctorate") & (df["salary"] >= "50K")].count()
all_degrees = bdegree + mdegree + phddegree
print(all_degrees)
percentaje_of_more50 = (all_degrees / df["education" == "Bachelors"|"Masters"|"Doctorate"].count())*100
print("The percentaje of people with bla bla bla is", percentaje_of_more50["education"].round(1))
By the way, I am working in an error in the logic on this code, so just ignore it :).
== looks for an exact match and since no one's "education" includes the string "Bachelors"|"Masters"|"Doctorate", it will return a Series of all Falses
.
You can use isin instead like:
msk = df["education"].isin(["Bachelors","Masters","Doctorate"])
The above will return a boolean Series, so using the .count method on it will just show the length of it, which is probably not something you want. So you need to use it to filter the relevant rows:
df[msk].count()
Then you can write percentage_of_more50 as:
percentage_of_more50 = (all_degrees / df[msk].count())*100
Note that you can also derive all_degrees using isin as well:
all_degrees = df[df["education"].isin(["Bachelors","Masters","Doctorate"]) & (df['salary']>='50K')].count()
Also df["salary"] >= "50K" works as you intend only if all salaries are below "99k" otherwise you'll end up with wrong output because if you check "100k" > "50k" it throws up False, even though it's True. One way to get rid of this problem is to fill the "salary" column data with "0"s until each entry is a certain number of characters long using str.zfill like:
df['salary'] = df['salary'].str.zfill(5)
Then each entry becomes 5 characters long. For example,
s = pd.Series(['100k','50k']).str.zfill(5)
becomes:
0 0100k
1 0050k
dtype: object
Then you can make the correct comparison.

How to automate the format of a text file to have the lenght of the longest string in the list be the the number of characters.for that variable

I know the command to use a certain length, s and it gets me what i want. However i would like to automate it, so that the number of characters will vary from run to run
I have a list:
x1 = ['IMG_0187_1.JPG', 'IMG_0668.JPG', 'IMG_1177.JPG']
I find the my max length of the strings
max_len = -1
for ele in x1:
if len(ele) > max_len:
max_len = int(len(ele))
now i want to use that max length in this command:
with open(sizefile, 'w') as f:
f.write('{:14}'.format(x[1])) -> instead i want this f.write('{:max_len}'.format(x[1]))
'{:14}'.format(x[1]) - this gives me what i want, but I want to script it because the max length is not always going to be 14 in other runs.
I get this error: ValueError: Invalid format specifier
Is it possible to automate this command somehow?
The format itself is a string that you can format with placeholders :
>>> max_len=5
>>> '{:%s}'%(max_len)
'{:5}'
So the following code should work in your case :
s='{:%s}'%(max_len)
f.write(s.format(x[1]))

Inserting random values based on condition

I have the following DataFrame containing various information about a certain product. Input3 is a list of sentences created as shown below:
sentence_list = (['Køb online her','Sammenlign priser her','Tjek priser fra 4 butikker','Se produkter fra 4 butikker', 'Stort udvalg fra 4 butikker','Sammenlign og køb'])
df["Input3"] = np.random.choice(sentence_list, size=len(df))
Full_Input is a string created by joining various columns, its content being something like: "ProductName from Brand - Buy online here - Sitename". It is created like this:
df["Full_Input"] = df['TitleTag'].astype(str) + " " + df['Input2'].astype(str) + " " + df['Input3'].astype(str) + " " + df['Input4'].astype(str) + " " + df['Input5'].astype(str)
The problem here is that Full_Input_Length should be under 55. Therefore I am trying to figure out how to put a condition while randomly generating Input3 so when it adds up with the other columns' strings, the full input length does not go over 55.
This is what I tried:
for col in range(len(df)):
condlist = [df["Full_Input"].apply(len) < 55]
choicelist = [sentence_list]
df['Input3_OK'][col] = np.random.choice.select(condlist, choicelist)
As expected, it doesn't work like that. np.random.choice.select is not a thing and I am getting an AttributeError.
How can I do that instead?
If you are guaranteed to have at least one item in Input3 that will satisfy this condition, you may want to try something like conditioning your random selection ONLY on values in your sentence_list that would be of an acceptable length:
# convert to series to enable use of pandas filtering mechanism:
my_sentences = [s for s in sentence_list if len(s) < MAX_LENGTH]
# randomly select from this filtered list:
np.random.choice(my_sentences)
In other words, perform the filter on each list of strings BEFORE you call random.choice.
You can run this for each row in a dataframe like so:
def choose_string(full_input):
return np.random.choice([
s
for s in sentence_list
if len(s) + len(full_input) < 55
])
df["Input3_OK"] = df.Full_Input.map(choose_string)

Shortest possible generated unique ID

So we can generate a unique id with str(uuid.uuid4()), which is 36 characters long.
Is there another method to generate a unique ID which is shorter in terms of characters?
EDIT:
If ID is usable as primary key then even better
Granularity should be better than 1ms
This code could be distributed, so we can't assume time independence.
If this is for use as a primary key field in db, consider just using auto-incrementing integer instead.
str(uuid.uuid4()) is 36 chars but it has four useless dashes (-) in it, and it's limited to 0-9 a-f.
Better uuid4 in 32 chars:
>>> uuid.uuid4().hex
'b327fc1b6a2343e48af311343fc3f5a8'
Or just b64 encode and slice some urandom bytes (up to you to guarantee uniqueness):
>>> base64.b64encode(os.urandom(32))[:8]
b'iR4hZqs9'
TLDR
Most of the times it's better to work with numbers internally and encode them to short IDs externally. So here's a function for Python3, PowerShell & VBA that will convert an int32 to an alphanumeric ID. Use it like this:
int32_to_id(225204568)
'F2AXP8'
For distributed code use ULIDs: https://github.com/mdipierro/ulid
They are much longer but unique across different machines.
How short are the IDs?
It will encode about half a billion IDs in 6 characters so it's as compact as possible while still using only non-ambiguous digits and letters.
How can I get even shorter IDs?
If you want even more compact IDs/codes/Serial Numbers, you can easily expand the character set by just changing the chars="..." definition. For example if you allow all lower and upper case letters you can have 56 billion IDs within the same 6 characters. Adding a few symbols (like ~!##$%^&*()_+-=) gives you 208 billion IDs.
So why didn't you go for the shortest possible IDs?
The character set I'm using in my code has an advantage: It generates IDs that are easy to copy-paste (no symbols so double clicking selects the whole ID), easy to read without mistakes (no look-alike characters like 2 and Z) and rather easy to communicate verbally (only upper case letters). Sticking to numeric digits only is your best option for verbal communication but they are not compact.
I'm convinced: show me the code
Python 3
def int32_to_id(n):
if n==0: return "0"
chars="0123456789ACEFHJKLMNPRTUVWXY"
length=len(chars)
result=""
remain=n
while remain>0:
pos = remain % length
remain = remain // length
result = chars[pos] + result
return result
PowerShell
function int32_to_id($n){
$chars="0123456789ACEFHJKLMNPRTUVWXY"
$length=$chars.length
$result=""; $remain=[int]$n
do {
$pos = $remain % $length
$remain = [int][Math]::Floor($remain / $length)
$result = $chars[$pos] + $result
} while ($remain -gt 0)
$result
}
VBA
Function int32_to_id(n)
Dim chars$, length, result$, remain, pos
If n = 0 Then int32_to_id = "0": Exit Function
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result$ = ""
remain = n
Do While (remain > 0)
pos = remain Mod length
remain = Int(remain / length)
result$ = Mid(chars$, pos + 1, 1) + result$
Loop
int32_to_id = result
End Function
Function id_to_int32(id$)
Dim chars$, length, result, remain, pos, value, power
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result = 0
power = 1
For pos = Len(id$) To 1 Step -1
result = result + (InStr(chars$, Mid(id$, pos, 1)) - 1) * power
power = power * length
Next
id_to_int32 = result
End Function
Public Sub test_id_to_int32()
Dim i
For i = 0 To 28 ^ 3
If id_to_int32(int32_to_id(i)) <> i Then Debug.Print "Error, i=", i, "int32_to_id(i)", int32_to_id(i), "id_to_int32('" & int32_to_id(i) & "')", id_to_int32(int32_to_id(i))
Next
Debug.Print "Done testing"
End Sub
Yes. Just use the current UTC millis. This number never repeats.
const uniqueID = new Date().getTime();
EDIT
If you have the rather seldom requirement to produce more than one ID within the same millisecond, this method is of no use as this number‘s granularity is 1ms.

Python: Function takes 1 argument for 2 given

I have looked on this website for something similar, and attempted to debug using previous answers, and failed.
I'm testing (I did not write this module) a module that changes the grade value of a course's grades from a B- to say a B, but never going across base grade levels (ie, B+ to an A-).
The original module is called transcript.py
I'm testing it in my own testtranscript.py
I'm testing that module by importing it: 'import transcript' and 'import cornelltest'
I have ensured that all files are in the same folder/directory.
There is the function raise_grade present in transcript.py (there are multiple definitions in this module, but raise_grade is the only one giving me any trouble).
ti is in the form ('class name', 'gradvalue')
There's already another definition converting floats to strings and back (ie 3.0--> B).
def raise_grade(ti):
""""Raise gradeval of transcript line ti by a non-noticeable amount.
"""
# value of the base letter grade, e.g., 4 (or 4.0) for a 4.3
bval = int(ti.gradeval)
print 'bval is:"' + str(bval) + '"'
# part after decimal point in raised grade, e.g., 3 (or 3.0) for a 4.3
newdec = min(int((ti.gradeval + .3)*10) % 10, 3)
print 'newdec is:"' + str(newdec) + '"'
# get result by add the two values together, after shifting newdec one
# decimal place
newval = bval + round(newdec/10.0, 1)
ti.gradeval = newval
print 'newval is:"' + str(newval) + '"'
I will probably get rid of the print later.
When I run testtranscript, which imports transcript:
def test_raise():
"""test raise_grade"""
testobj = transcript.Titem('CS1110','B-')
transcript.raise_grade('CS1110','B-')
cornelltest.assert_floats_equal(3.0,transcript.lettergrade_to_val("B-"))
I get this from the cmd shell:
TypeError: raise_grade takes exactly 1 argument (2 given)
Edit1: So now I see that I am giving it two parameters when raise_grade(ti) is just one, but perhaps it would shed more light if I just put out the rest of the code. I'm still stuck as to why I get a ['str' object has no gradeval error]
LETTER_LIST = ['B', 'A']
# List of valid modifiers to base letter grades.
MODIFIER_LIST = ['-','+']
def lettergrade_to_val(lg):
"""Returns: numerical value of letter grade lg.
The usual numerical scheme is assumed: A+ -> 4.3, A -> 4.0, A- -> 3.7, etc.
Precondition: lg is a 1 or 2-character string consisting of a "base" letter
in LETTER_LIST optionally followed by a modifier in MODIFIER_LIST."""
# if LETTER_LIST or MODIFIER_LIST change, the implementation of
# this function must change.
# get value of base letter. Trick: index in LETTER_LIST is shifted from value
bv = LETTER_LIST.index(lg[0]) + 3
# Trick with indexing in MODIFIER_LIST to get the modifier value
return bv + ((MODIFIER_LIST.index(lg[1]) - .5)*.3/.5 if (len(lg) == 2) else 0)
class Titem(object):
"""A Titem is an 'item' on a transcript, like "CS1110 A+"
Instance variables:
course [string]: course name. Always at least 1 character long.
gradeval [float]: the numerical equivalent of the letter grade.
Valid letter grades are 1 or 2 chars long, and consist
of a "base" letter in LETTER_LIST optionally followed
by a modifier in MODIFIER_LIST.
We store values instead of letter grades to facilitate
calculations of GPA later.
(In "real" life, one would write a function that,
when displaying a Titem, would display the letter
grade even though the underlying representation is
numerical, but we're keeping things simple for this
lab.)
"""
def __init__(self, n, lg):
"""Initializer: A new transcript line with course (name) n, gradeval
the numerical equivalent of letter grade lg.
Preconditions: n is a non-empty string.
lg is a string consisting of a "base" letter in LETTER_LIST
optionally followed by modifier in MODIFIER_LIST.
"""
# assert statements that cause an error when preconditions are violated
assert type(n) == str and type(lg) == str, 'argument type error'
assert (len(n) >= 1 and 0 < len(lg) <= 2 and lg[0] in LETTER_LIST and
(len(lg) == 1 or lg[1] in MODIFIER_LIST)), 'argument value error'
self.course = n
self.gradeval = lettergrade_to_val(lg)
Edit2: I understand the original problem... but it seems that the original writer screwed up the code, since raise_grade doesn't work properly for grade values at 3.7 ---> 4.0, since bval takes the original float and makes it an int, which doesn't work in this case.
You are calling the function incorrectly, you should be passing the testobj:
def test_raise():
"""test raise_grade"""
testobj = transcript.Titem('CS1110','B-')
transcript.raise_grade(testobj)
...
The raise_grade function is expecting a single argument ti which has a gradeval attribute, i.e. a Titem instance.

Categories

Resources