Python: how prevent duplication of randomly chosen key/value pair

Python: how prevent duplication of randomly chosen key/value pair - python

def fill2_line1():
x2 = random.choice(list(twoSyllValues.items()))
line1.append(x2)
magicList = ([x[1] for x in line1])
if 1 in magicList:
fill2_line1()
fill2_line1()
complete_line = line1[0][0] + " " + line1[1][0] + " " +
line1[2][0]
print(complete_line)
This is the part in question - the whole program is over 150 lines. It works perfectly as is, but sometimes I'm getting duplicated words. To refine the code, I want to make sure that the key in the first random tuple selection is not duplicated in the next two selected tuples IN that list, line1. The only way I can think of is having another dictionary created after the first word selection and then exclude that key,value pair from new dictionary and then make the next two random.choice calls. I feel like there must be a simpler way, but I just don't have the experience yet.

I can think of three ways to solve the problem you are facing. I will write a sample code that you can refer to since I do not know your whole code.
Delete the key before the second random.choice calls and adding them back in if you need them. This is somewhat similar to creating another dictionary but more efficient.
def fill2_line1():
x2 = random.choice(list(twoSyllValues.items()))
line1.append(x2)
return x2
magicList = ([x[1] for x in line1])
if 1 in magicList:
deleted_list = []
# run first random.choice
deleted_item = fill2_line1()
deleted_list.append(deleted_item)
del twoSyllValues[deleted_item[0]]
# run second random.choice
deleted_item = fill2_line1()
deleted_list.append(deleted_item)
del twoSyllValues[deleted_item[0]]
complete_line = line1[0][0] + " " + line1[1][0] + " "
+ line1[2][0]
print(complete_line)
# add the deleted items back in after work is done
for k, v in deleted_list:
twoSyllValues[k] = v
Run random.choice until there is no duplicate item.
def fill2_line1(deleted_set):
x2 = random.choice(list(twoSyllValues.items()))
while x2 not in deleted_set:
x2 = random.choice(list(twoSyllValues.items()))
line1.append(x2)
deleted_set.add(x2)
magicList = ([x[1] for x in line1])
if 1 in magicList:
deleted_set = set([])
fill2_line1(deleted_set)
fill2_line1(deleted_set)
complete_line = line1[0][0] + " " + line1[1][0] + " "
+ line1[2][0]
print(complete_line)
Keep another list just for the random.choice
def fill2_line1(items_list):
x2 = random.choice(items_list)
line1.append(x2)
items_list.remove(x2)
magicList = ([x[1] for x in line1])
if 1 in magicList:
items_list = list(twoSyllValues.items())
fill2_line1(items_list)
fill2_line1(items_list)
complete_line = line1[0][0] + " " + line1[1][0] + " "
+ line1[2][0]
print(complete_line)
Hope my answer gave you some ideas!

Related

Confused on how the multiple variables work and how to get all 4 values from 1st item in list

testdata = ["One,For,The,Money", "Two,For,The,Show", "Three,To,Get,Ready", "Now,Go,Cat,Go"]
#My Code:
def chop(string):
x = 0
y = 0
while x < 5:
y = string.find(",") + 1
z = string.find(",", y)
x = x + 1
return y, z
#My Code Ends
for i in range(4):
uno, dos, tres, cuatro = chop(testdata[i])
print(uno + ":" + dos + ":" + tres + ":" + cuatro)
It say I don't have enough values, I previously tried appending similar code to a list and it said I had too many

I cant figure out why You need to do in that way, but maybe it helps.
testdata = ["One,For,The,Money", "Two,For,The,Show",
"Three,To,Get,Ready", "Now,Go,Cat,Go"]
for i in testdata:
uno, dos, tres, cuatro = i.split(',')
print(uno + ":" + dos + ":" + tres + ":" + cuatro)
Result
One:For:The:Money
Two:For:The:Show
Three:To:Get:Ready
Now:Go:Cat:Go
Just iterate through array and spllit words by ,. Result is as expected.

You can search for the position of the , (comma) in the given line and apply sweeping technique to insert the words into a list. You were getting too few values as the function was not returning 4 elements that is being extracted in the for loop.
testdata = ["One,For,The,Money", "Two,For,The,Show", "Three,To,Get,Ready",
"Now,Go,Cat,Go"]
# My Code:
def chop(line):
start_position = 0
words = []
for i, c in enumerate(line):
if c == ",":
words.append(line[start_position:i])
start_position = i+1
words.append(line[start_position:])
return words
# My Code Ends
for i in range(4):
uno, dos, tres, cuatro = chop(testdata[i])
print(uno + ":" + dos + ":" + tres + ":" + cuatro)
Output:
One:For:The:Money
Two:For:The:Show
Three:To:Get:Ready
Now:Go:Cat:Go
Explanation (updated):
Here we are keeping the start position of each word in the start_position variable. enumerate method returns a tuple of index and the value in each iteration. We are using the value for checking if it is equal to , and the index to chop the word from the line, thus using the enumerate method.
References:
Documentation on enumerate

Check result using 4 operations based with Python

I'm struggling to make a Python program that can solve riddles such as:
get 23 using [1,2,3,4] and the 4 basic operations however you'd like.
I expect the program to output something such as
# 23 reached by 4*(2*3)-1
So far I've come up with the following approach as reduce input list by 1 item by checking every possible 2-combo that can be picked and every possible result you can get to.
With [1,2,3,4] you can pick:
[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]
With x and y you can get to:
(x+y),(x-y),(y-x),(x*y),(x/y),(y/x)
Then I'd store the operation computed so far in a variable, and run the 'reducing' function again onto every result it has returned, until the arrays are just 2 items long: then I can just run the x,y -> possible outcomes function.
My problem is this "recursive" approach isn't working at all, because my function ends as soon as I return an array.
If I input [1,2,3,4] I'd get
[(1+2),3,4] -> [3,3,4]
[(3+3),4] -> [6,4]
# [10,2,-2,24,1.5,0.6666666666666666]
My code so far:
from collections import Counter
def genOutputs(x,y,op=None):
results = []
if op == None:
op = str(y)
else:
op = "("+str(op)+")"
ops = ['+','-','*','/','rev/','rev-']
z = 0
#will do every operation to x and y now.
#op stores the last computated bit (of other functions)
while z < len(ops):
if z == 4:
try:
results.append(eval(str(y) + "/" + str(x)))
#yield eval(str(y) + "/" + str(x)), op + "/" + str(x)
except:
continue
elif z == 5:
results.append(eval(str(y) + "-" + str(x)))
#yield eval(str(y) + "-" + str(x)), op + "-" + str(x)
else:
try:
results.append(eval(str(x) + ops[z] + str(y)))
#yield eval(str(x) + ops[z] + str(y)), str(x) + ops[z] + op
except:
continue
z = z+1
return results
def pickTwo(array):
#returns an array with every 2-combo
#from input array
vomit = []
a,b = 0,1
while a < (len(array)-1):
choice = [array[a],array[b]]
vomit.append((choice,list((Counter(array) - Counter(choice)).elements())))
if b < (len(array)-1):
b = b+1
else:
b = a+2
a = a+1
return vomit
def reduceArray(array):
if len(array) == 2:
print("final",array)
return genOutputs(array[0],array[1])
else:
choices = pickTwo(array)
print(choices)
for choice in choices:
opsofchoices = genOutputs(choice[0][0],choice[0][1])
for each in opsofchoices:
newarray = list([each] + choice[1])
print(newarray)
return reduceArray(newarray)
reduceArray([1,2,3,4])

The largest issues when dealing with problems like this is handling operator precedence and parenthesis placement to produce every possible number from a given set. The easiest way to do this is to handle operations on a stack corresponding to the reverse polish notation of the infix notation. Once you do this, you can draw numbers and/or operations recursively until all n numbers and n-1 operations have been exhausted, and store the result. The below code generates all possible permutations of numbers (without replacement), operators (with replacement), and parentheses placement to generate every possible value. Note that this is highly inefficient since operators such as addition / multiplication commute so a + b equals b + a, so only one is necessary. Similarly by the associative property a + (b + c) equals (a + b) + c, but the below algorithm is meant to be a simple example, and as such does not make such optimizations.
def expr_perm(values, operations="+-*/", stack=[]):
solution = []
if len(stack) > 1:
for op in operations:
new_stack = list(stack)
new_stack.append("(" + new_stack.pop() + op + new_stack.pop() + ")")
solution += expr_perm(values, operations, new_stack)
if values:
for i, val in enumerate(values):
new_values = values[:i] + values[i+1:]
solution += expr_perm(new_values, operations, stack + [str(val)])
elif len(stack) == 1:
return stack
return solution
Usage:
result = expr_perm([4,5,6])
print("\n".join(result))

Why does the for loop here print everything twice and then list values out of range?

I have a set of code that sends a request to SnapToRoads Api. The length of the data set in this case is around 27, but the 'originalIndex' shown goes up to 53
results = requests.get("https://roads.googleapis.com/v1/snapToRoads?path=12.919082641601562,77.65169525146484|12.919082641601562,77.65169525146484|12.918915748596191,77.6517105102539|12.918915748596191,77.6517105102539|12.918656349182129,77.65177154541016|12.918656349182129,77.65177154541016|12.918524742126465,77.6517562866211|12.918524742126465,77.6517562866211|12.918295860290527,77.65178680419922|12.918295860290527,77.65178680419922|12.918216705322266,77.65177154541016|12.918216705322266,77.65177154541016|12.918027877807617,77.65178680419922|12.918027877807617,77.65178680419922|12.917914390563965,77.65178680419922|12.917914390563965,77.65178680419922|12.917774200439453,77.65178680419922|12.917774200439453,77.65178680419922|12.917659759521484,77.65179443359375|12.917659759521484,77.65179443359375|12.917553901672363,77.65180969238281|12.917553901672363,77.65180969238281|12.917448043823242,77.6518325805664|12.917448043823242,77.6518325805664|12.917227745056152,77.65177917480469|12.917227745056152,77.65177917480469|12.91706657409668,77.65178680419922|12.91706657409668,77.65178680419922|12.916943550109863,77.65178680419922|12.916943550109863,77.65178680419922|12.916749000549316,77.65178680419922|12.916749000549316,77.65178680419922|12.916621208190918,77.65179443359375|12.916621208190918,77.65179443359375|12.91647720336914,77.65180206298828|12.91647720336914,77.65180206298828|12.91647720336914,77.65180206298828|12.91647720336914,77.65180206298828|12.916269302368164,77.65177154541016|12.916269302368164,77.65177154541016|12.916149139404297,77.65178680419922|12.916149139404297,77.65178680419922|12.916014671325684,77.65177917480469|12.916014671325684,77.65177917480469|12.91580867767334,77.65179443359375|12.91580867767334,77.65179443359375|12.915785789489746,77.65182495117188|12.915785789489746,77.65182495117188|12.915775299072266,77.65180969238281|12.915775299072266,77.65180969238281|12.915729522705078,77.65179443359375|12.915729522705078,77.65179443359375|12.91568374633789,77.65179443359375|12.91568374633789,77.65179443359375&key=AIzaSyAmplaUG26XJGwPrLbky2bHQ-eBmQvZUVU")
snappoints = results.json()['snappedPoints']
snapdata = set()
for point in snappoints:
# this is each individual element in snapPoints array
snapdata.add(point['originalIndex'])
print (snapdata)
length = len(snapdata)
print (length)
I want to correspond the data I have with the original indices by seeing which ones are retained, but the API shows more indices than which the request is sent. please help. Thanks
PS: I'm a noob with APIs
Seemingly the for loop is messed up:
api1 = []
for i in range(0, length-1):
dataPoint = data[i]
dataPoint1 = data [i+1]
coordinate = dataPoint['coordinates']
coordinate1 = dataPoint1['coordinates']
x = coordinate[0]
y = coordinate[1]
x1 = coordinate1[0]
y1 = coordinate1[1]
str1 = str(x)
str2 = str(y)
str3 = '|'
apiData = str1 + ',' + str2 + str3
apiData = apiData+ (str1 + ',' + str2 + str3)
print (apiData)
api1.append(apiData)
i +=1
print (api1)
print (len(api1))

This line is the culprit of doubling Points!
apiData = apiData+ (str1 + ',' + str2 + str3)
apiData has already str1 + ',' + str2 + str3, you assign apiData again.
Removing this Line will fix Doubling.
Your print (len(api1)) count the Doubled String, therefore you see 27.
To see the Real Count of Points, use the following:
print(len(api1.split('|')))

Please resolve python "magic" in this calculation

I created a script that should perform simple math juggling by rearranging numbers.
What it should do:
x = 777.0
y = 5
calc = x / y # 155.4
...
Pseudocode:
Rearrange numbers (last digit + first) = 555.
Difference from 777 and 555 = 222
Add 222 to 555 = 777
Basically it should recreate the original variable without doing a real calculation but instead just rearrange numbers and add.
Because of the design of the script i expected it to work only with 4 digit numbers like 333.3. It turns out that it (seems to) work also with numbers like 2543.6452 wich seems to be impossibe at least from my (non academic) view.
Can someone please tell me what happens here? Is the code working correctly or did i create something i simply dont understand? It looks like a illusion to me. :D
x = 5.0
y = 7345.3297
z= y / x
print "Initial Value = " + str(y)
print "Calculate:"
print str(y) + "/" + str(x)
print z # 177.6
print
a = int(str(z)[0])
print "First Number = " + str(a)
print
b = int(str(z)[1])
c = int(str(z)[2])
print "In between = " + str(b) + str(c)
d = int(str(z)[-1]) # treat z as string, take first string after . from z and format it back to int
print "Last Number = " + str(d)
print
print "Rearrange Numbers"
res = str(a+d) +str(b) +str(c)
to_int = int(res)
dif = y - to_int
add = to_int + dif
print "number = " + str(add)

Let's do some substitution here. The bottom lines read:
dif = y - to_int
add = to_int + dif
This can be written in one line as:
add = y - to_int + to_int
or:
add = y
So you do all this "magic" and then completely ignore it to print what you started with. You could put anything above this, all this code does at the end is print y :-)

Matching unique identifiers between two files of different length, line by line. Is there a better way to write this? [python]

bed = WRDIR + "AGGCAGAA_mm9_AlignedSorted_final_nameSorted_uniq.bed"
sam = WRDIR + "AGGCAGAA_bcs_nameSorted.sam"
out = WRDIR + "AGGCAGAA_joined.txt"
with open(bed) as b, open(sam) as s, open(out, 'w') as o:
for x, b_line in enumerate(b, start=1):
b_line = b_line.strip().split("\t")
b_id = b_line[9]
match_line = 1
if x < 200: # for testing because file is huge..
for s_line in s:
s_line= s_line.strip().split("\t")
s_id = s_line[0]
if b_id == s_id:
output = b_id + "\tGENE:" + b_line[4] + "\tACC:" + b_line[5] + "\tCHR:" + b_id[6] + "\tBC:" + s_line[9][:12] + "\tUMI:" + s_line[9][12:] + "\n"
o.write(output)
match_line+=1
print "MATCH: (BED line " + str(x) + ")\t" + b_id + "\t\t(SAM line " + str(match_line) + ")\t" + s_id
break
match_line+=1
What is this script suppose to do? It's suppose to look at two tab-delimited files which are sorted by a unique identifier in each file. The bed file is essentially a subset of the sam file. If this unique identifier matches between both files, then get data from the lines for which they match and write that information out to a file. Once this is done we move to the next line of bed and continue where we left off in sam (essentially working down the file).
I think this code does what it's intended to but there are a few bugs. match_line is suppose to be the line number of the second file but this is not the case. It seems to be effected by the break such that when the break is hit, the variable re-initializes at 1 (I think). The output variable is a line of tab-separated data that I'd like to write out to a file. I'm also not sure if this is the correct use of a break.
Next steps: This script only accounts for 1 set of files for a sample. There are essentially 9 samples, with each containing two species-specific files. E.g.:
samples = [
"AGGCAGAA",
"CAGAGAGG",
"CGTACTAG",
"CTCTCTAC",
"GCTACGCT",
"GGACTCCT",
"TAAGGCGA",
"TAGGCATG",
"TCCTGAGC"
]
for idx, s in enumerate(samples, start=1):
if idx == 1: ## TESTING
print "Processing: %s (%s/%s)" % (s, str(idx), str(len(samples)))
for sp in ["mm9", "hg19"]:
# if sp == "mm9": ## TESTING
bed = WRDIR + s + "_" + sp + "_AlignedSorted_final_nameSorted.bed"
sam = WRDIR + s + "_bcs_nameSorted.sam"
out = WRDIR + s +"_joined.txt"
I've quickly tired to integrate this into the script but received an error stating: TypeError: cannot concatenate 'str' and 'file' objects.
I appreciate the help!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how prevent duplication of randomly chosen key/value pair - python

Related

Confused on how the multiple variables work and how to get all 4 values from 1st item in list

Check result using 4 operations based with Python

Why does the for loop here print everything twice and then list values out of range?

Please resolve python "magic" in this calculation

Matching unique identifiers between two files of different length, line by line. Is there a better way to write this? [python]

Categories

Resources