Why print the wrong result? - python

I am a new guy in python,Today I write a program to get max value pair from some data sets,but the program I wrote did't give me the right answer,the code is
#!/usr/bin/python
import sys
maxsale = 0
oldKey = None
# Loop around the data
# It will be in the format key\tval
# Where key is the store name, val is the sale amount
#
# All the sales for a particular store will be presented,
# then the key will change and we'll be dealing with the next store
for line in sys.stdin:
data_mapped = line.strip().split("\t")
if len(data_mapped) != 2:
# Something has gone wrong. Skip this line.
continue
thisKey, thisSale = data_mapped
if oldKey and oldKey != thisKey:
print oldKey, "\t", maxsale
oldKey = thisKey;
oldsale = 0
oldKey = thisKey
if maxsale < thisSale:
maxsale = thisSale
if oldKey != None:
print oldKey, "\t", maxsale
the data sets is:
Anchorage 298.86
Anchorage 6.38
Aurora 34.1
Aurora 10.6
Aurora 55.7
Austin 327.75
Austin 379.6
Austin 469.63
Austin 11.6
The result is:
Anchorage 6.38
Aurora 34.1
Austin 469.63
Can anyone help me deal with this issue?thank you in advance!

First, you are not converting the inputs to numbers. This means that any "number" that starts with '6' is greater than any "number" that starts with '2', even for values like '6.38' and '198.86'.
thisKey, thisSale = data_mapped
thisSale = float(thisSale)
Next, you are setting oldSale to 0, but never referring to it. I think you meant to do maxSale = 0 there, to reset the value for a new store.
Lastly, you don't need oldKey = thisKey; in the if block, as you're doing that immediately afterward anyway.
Note that currency calculations work best when you convert the values to the smallest denomination of that currency and use integers, as floating-point calculations aren't always perfectly accurate and you may get rounding errors. It looks like your data aren't guaranteed to have trailing zeros, so you would have to check the string for a decimal point, split on the decimal point if it exists, and so on.
thisKey, thisSale = data_mapped
if '.' not in thisSale:
thisSale = int(thisSale)*100
else:
dollars, cents = thisSale.split('.')
if len(cents) < 2:
cents += '0'
thisSale = int(dollars)*100 + int(cents)
Carry out financial calculations on the integer representing the number of cents, and then format values as dollars and cents when necessary for display purposes:
>>> '%.2f' % (29886/100.)
'298.86'
>>> '{:.02f}'.format(29886/100.)
'298.86'

#!/usr/bin/python
import sys
maxsale = 0
oldKey = None
# Loop around the data
# It will be in the format key\tval
# Where key is the store name, val is the sale amount
#
# All the sales for a particular store will be presented,
# then the key will change and we'll be dealing with the next store
d = dict()
for line in sys.stdin:
data_mapped = line.strip().split("\t")
if len(data_mapped) != 2:
# Something has gone wrong. Skip this line.
continue
key,value = data_mapped
if (key in d) and d[key] < float(value):
d[key] = float(value)
elif not key in d:
d[key] = float(value)
for k,v in d.items():
print k,'\t',v

Related

Searchlight Inefficient Compression Scheme

Write a compression and decompression algorithm for SICS which works as follows: Find the most popular characters, and in order of popularity assign them a hex value 0 to E, F0 to FE, FF0 to FFE, etc. Note, an F indicates that there are more nibbles to follow, anything else is the terminal nibble.
Compress the message by replacing characters with their assigned value. Below are the sample text, but the code should work universally for any given text.
Test case:
text = "Marley was dead: to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it: and Scrooge’s name was good upon ’Change, for anything he chose to put his hand to. Old Marley was as dead as a door-nail. Mind! I don’t mean to say that I know, of my own knowledge, what there is particularly dead about a door-nail. I might have been inclined, myself, to regard a coffin-nail as the deadest piece of ironmongery in the trade. But the wisdom of our ancestors is in the simile; and my unhallowed hands shall not disturb it, or the Country’s done for. You will therefore permit me to repeat, emphatically, that Marley was as dead as a door-nail."
solution = "f826b1d0e2a08128fd0340f61f0750e739f50fe916107a054084cf630e9231ff01602f64c303923f5 0fe91061f07a31604f3097a0f6c672b0e2a0a7f05180f6d03910f2b16f0df125f403910f2b16f9f4039 10c581632f916f4025803910f2971f30f14c6516f50ff1f2644f010a7f0518073fd02580ff1f2644f01fa a052f110e2a0f04480cf7450faff2925f01f40f346025d3975f00910f294a10340f7c3097a09258034f 50ff3b80f826b1d0e2a02a0812802a0208446fb527bf50f8758ff40fc0845fa30f11250340a2d03923 0fc0f954ef404f30f1d04e50f954eb18f01f40e92303916107a0f72637f2cb26bd0812802f64c30208 446fb527bf50fc0f17f093092ff010f6115075f2b7518f40f1da1bf3f4034061f0268020f24f3f375fb52 7b02a0391081281a30f771f2104f307645f145f016d0750391036281f50ff5c303910e7a84f104f30 4c6025f21a346a07a07503910a7f17b1ff602580f1d0c592bb4e1809258a0a92bb0543087a3c6f6 073f404603910ff24c536dfaa084510f346f50ff74c0e7bb039161f34610f716f1730f11034061f7123 f401f1f79237f22bbdf4039230f826b1d0e2a02a0812802a0208446fb527bf5
I am adding a code that can help to solve your problem. It will ask for raw input and you can modify it as per your need.
from collections import Counter
key = []
def initCompute(data): # here i am finding the unique chars and its occurrence to find the popular char
uChar = list(set(data))
xCount = dict(Counter(data))
xCount = dict(sorted(xCount.items(), key=lambda item: item[1], reverse=True))
return len(uChar), xCount
def compute(data, keyList, dataDict): # assigning the hex value(key) to the each characters
j = 0
comDict = {}
decdict = {}
sol = ""
for k in dataDict:
comDict[k] = [keyList[j], dataDict[k]]
decdict[keyList[j]] = k
j += 1
for c in data:
sol += comDict[c][0]
return sol, decdict
def decompression(keyDict,
cData): # this is to find the decompressed data by having the dict of
# hex value assigned to char and compressed data as inputs
sol = ""
fNib = ""
for s in cData:
if s == 'f':
fNib += s
else:
fNib += s
sol += keyDict[fNib]
fNib = ""
# print(sol)
return sol
def compression(): # find the key(hex value) and framing the compressed data
i = 0
fac = 16
pwr = 1
keyLen, dDict = initCompute(text)
while len(key) <= keyLen:
if i <= (fac - 2):
key.append(str(hex(i))[2:]) # finding the hex value and storing in key
i += 1
else:
pwr += 1
fac = pow(16, pwr)
i = fac - 16
sol, newDict = compute(text, key, dDict)
print("Assigned hex values for each character: ")
print(newDict)
return sol, newDict, keyLen
if __name__ == '__main__':
text = input("Input your Data here: ") # input
compressed_data, new_dict, key_len = compression()
print("Compressed data: ",compressed_data)
print(compressed_data)
print("Decompressed data:")
print(decompression(new_dict, compressed_data))

Want to optimize my code for finding out overlapping times in a big amount of records pandas

I have a data table consisting 100000 records with 50 columns, It has a start time and end time value and a equipment key for which records are available. When this nodes are down, their records are stored. so start time is when the node goes down, and end time is when the node is up after getting down. If there are multiple records where we have same equipment key, and start time and end time values which are inside of previous record's start time and end time, then we call it that this new record has overlapping time and we need to ignore them. To find out these overlapping records, I have written a function and apply it on a dataframe, but it's taking a long time. I am not that efficient in optimization, that's why seeking any suggestion regarding this.
sitecode_info = []
def check_overlapping_sitecode(it):
sitecode = it['equipmentkey']
fo = it['firstoccurrence']
ct = it['cleartimestamp']
if len(sitecode_info) == 0:
sitecode_info.append({
'sc': sitecode,
'fo': fo,
'ct': ct
})
return 0
else:
for list_item in sitecode_info:
for item in list_item.keys():
if item == 'sc':
if list_item[item] == sitecode:
# print("matched")
if fo >= list_item['fo'] and ct <= list_item['ct'] or \
fo >= list_item['fo'] and fo <= list_item['ct'] and ct >= list_item['ct'] or \
fo <= list_item['fo'] and ct >= list_item['ct'] or \
fo <= list_item['fo'] and ct >= list_item['fo'] and ct <= list_item['ct']:
return 1
else:
sitecode_info.append({
'sc': sitecode,
'fo': fo,
'ct': ct
})
return 0
else:
sitecode_info.append({
'sc': sitecode,
'fo': fo,
'ct': ct
})
return 0
I am calling this as following.
temp_df['false_alarms'] = temp_df.apply(check_overlapping_sitecode, axis=1)
I think you were just iterating over that list of dictionaries a touch too much.
**EDIT:**Added appending fo's and ct's even if it returns 1 in the method for enhanced accuracy.
'''
setting an empty dictionary.
this will look like: {sc1: [[fo, ct], [fo, ct]],
sc2:[[fo, ct], [fo, ct]]}
the keys are just the site_code,
this way we don't have to iterate over all of the fo's and ct's, just the ones related to that site code.
'''
sitecode_info = {}
# i set up a dataframe with 200000 rows x 50 columns
def check_overlapping_sitecode(site_code, fo, ct):
try:
#try to grab the existing site_code information from sitecode_info dict.
#if it fails then go ahead and make it while also returning 0 for that site_code
my_list = sitecode_info[site_code]
#if it works, go through that site's list.
for fo_old, ct_old in my_list:
#if the first occurence is >= old_first occurenc and <= cleartimestamp
if fo >= fo_old and fo <= ct_old:
sitecode_info[site_code].append([fo, ct])
return 1
#same but for cleartimestamp instead
elif ct <= ct_old and ct >= fo_old:
sitecode_info[site_code].append([fo, ct])
return 1
else:
#if it doesnt overlap at all go ahead and set the key to a list in list
sitecode_info[site_code].append([fo, ct])
return 0
except:
#set the key to a list in list if it fails
sitecode_info[site_code] = [[fo, ct]]
return 0
t = time.time()
"""Here's the real meat and potatoes.
using a lambda function to call method "check_overlapping_sitecode".
lambda: x where x is row
return the output of check_overlapping_sitecode
"""
temp_df['false_alarms'] = temp_df.apply(lambda x: check_overlapping_sitecode(x['equipmentkey'], x['firstoccurrence'], x['cleartimestamp']), axis=1)
print(time.time()-t)
#this code runs nearly 6 seconds for me.
#then you can do whatever you want with your DF.

Python Greedy Algorithm

I am writing a greedy algorithm (Python 3.x.x) for a 'jewel heist'. Given a series of jewels and values, the program grabs the most valuable jewel that it can fit in it's bag without going over the bag weight limit. I've got three test cases here, and it works perfectly for two of them.
Each test case is written in the same way: first line is the bag weight limit, all lines following are tuples(weight, value).
Sample Case 1 (works):
10
3 4
2 3
1 1
Sample Case 2 (doesn't work):
575
125 3000
50 100
500 6000
25 30
Code:
def take_input(infile):
f_open = open(infile, 'r')
lines = []
for line in f_open:
lines.append(line.strip())
f_open.close()
return lines
def set_weight(weight):
bag_weight = weight
return bag_weight
def jewel_list(lines):
jewels = []
for item in lines:
jewels.append(item.split())
jewels = sorted(jewels, reverse= True)
jewel_dict = {}
for item in jewels:
jewel_dict[item[1]] = item[0]
return jewel_dict
def greedy_grab(weight_max, jewels):
#first, we get a list of values
values = []
weights = []
for keys in jewels:
weights.append(jewels[keys])
for item in jewels.keys():
values.append(item)
values = sorted(values, reverse= True)
#then, we start working
max = int(weight_max)
running = 0
i = 0
grabbed_list = []
string = ''
total_haul = 0
# pick the most valuable item first. Pick as many of them as you can.
# Then, the next, all the way through.
while running < max:
next_add = int(jewels[values[i]])
if (running + next_add) > max:
i += 1
else:
running += next_add
grabbed_list.append(values[i])
for item in grabbed_list:
total_haul += int(item)
string = "The greedy approach would steal $" + str(total_haul) + " of
jewels."
return string
infile = "JT_test2.txt"
lines = take_input(infile)
#set the bag weight with the first line from the input
bag_max = set_weight(lines[0])
#once we set bag weight, we don't need it anymore
lines.pop(0)
#generate a list of jewels in a dictionary by weight, value
value_list = jewel_list(lines)
#run the greedy approach
print(greedy_grab(bag_max, value_list))
Does anyone have any clues why it wouldn't work for case 2? Your help is greatly appreciated.
EDIT: The expected outcome for case 2 is $6130. I seem to get $6090.
Your dictionary keys are strings, not integers so they are sorted like string when you try to sort them. So you would get:
['6000', '3000', '30', '100']
instead wanted:
['6000', '3000', '100', '30']
Change this function to be like this and to have integer keys:
def jewel_list(lines):
jewels = []
for item in lines:
jewels.append(item.split())
jewels = sorted(jewels, reverse= True)
jewel_dict = {}
for item in jewels:
jewel_dict[int(item[1])] = item[0] # changed line
return jewel_dict
When you change this it will give you:
The greedy approach would steal $6130 of jewels.
In [237]: %paste
def greedy(infilepath):
with open(infilepath) as infile:
capacity = int(infile.readline().strip())
items = [map(int, line.strip().split()) for line in infile]
bag = []
items.sort(key=operator.itemgetter(0))
while capacity and items:
if items[-1][0] <= capacity:
bag.append(items[-1])
capacity -= items[-1][0]
items.pop()
return bag
## -- End pasted text --
In [238]: sum(map(operator.itemgetter(1), greedy("JT_test1.txt")))
Out[238]: 8
In [239]: sum(map(operator.itemgetter(1), greedy("JT_test2.txt")))
Out[239]: 6130
I think in this piece of code i has to be incremented on the else side too
while running < max:
next_add = int(jewels[values[i]])
if (running + next_add) > max:
i += 1
else:
running += next_add
grabbed_list.append(values[i])
i += 1 #here
this and #iblazevic's answer explains why it behaves this way

Python: keep top Nth results for csv.reader

I am doing some filtering on csv file where for every title there are many duplicate IDs with different prediction values, so the column 2 (pythoniac) is different. I would like to keep only 30 lowest values but with unique ID. I came to this code, but I don't know how to keep lowest 30 entries.
Can you please help with suggestions how to obtain 30 unique by ID entries?
# title1 id1 100 7.78E-25 # example of the line
with open("test.txt") as fi:
cmp = {}
for R in csv.reader(fi, delimiter='\t'):
for L in ligands:
newR = R[0], R[1]
if R[0] == L:
if (int(R[2]) <= int(1000) and int(R[2]) != int(0) and float(R[3]) < float("1.0e-10")):
if newR in cmp:
if float(cmp[newR][3]) > float(R[3]):
cmp[newR] = R[:-2]
else:
cmp[newR] = R[:-2]
Maybe try something along this line...
from bisect import insort
nth_lowest = [very_high_value] * 30
for x in my_loop:
do_stuff()
...
if x < nth_lowest[-1]:
insort(nth_lowest, x)
nth_lowest.pop() # remove the highest element

Sorting and counting matching fields from a model django

I have a class of the form:
class data:
person.models.foreignKey("people.person")
place.models.foreignKey("places.place")
and I'm trying to create a dict containing places that have the same person associated with how many connections they have. IE:
dict={[place1:place2]:count}
so the dict might look like this:
dict={[place1:place2]:3, [place1:place3]:2, ect}
so far I have:
dict={}
datas=data.objects.all()
for data1 in datas:
for data2 in datas:
# if dict is empty
if not dict and data1.person == data2.person and data1.place != data2.place:
dict[(data1.place, data2.place)]=1
elif data1.person == data2.person and data1.place != data2.place:
for assoc, count in dict.items():
if assoc == (data1.place, data2.place) or assoc == (data2.place, data1.place):
count +=1
else:
dict[(data1.place, data2.place)]=1
else:
dict[(data1.place, data2.place)]=1
this is currently returning completely erroneous relations and never increments count. What am i doing wrong?
Do not use predefined names like dict for your variables. Think that your problem is that you try to increase count variable while you have to increase dict[key] - e.g. dict[key] += 1
dct = {}
datas = data.objects.all()
for data1 in datas:
for data2 in datas:
# if dict is empty
if not dct and data1.person == data2.person and data1.place != data2.place:
dct[(data1.place, data2.place)] = 1
elif data1.person == data2.person and data1.place != data2.place:
if (data1.place, data2.place) in dct:
dct[(data1.place, data2.place)] += 1
elif (data2.place, data1.place) in dct:
dct[(data2.place, data1.place)] += 1
else:
dct[(data1.place, data2.place)] = 1
else:
dct[(data1.place, data2.place)] = 1
Use annotations. I don't have you model layout, so this is an approximation of the logic. You'll need to tweak it to map to the correct stuff based on your implementation:
from django.db.models import Count
places = Place.objects.filter(people=thisguy).annotate(connections=Count('people'))
Then you can get the connections count via an attribute on each place:
places[0].connections

Categories

Resources