Unable to insert floats as range_keys in DynamoDB with Boto

Unable to insert floats as range_keys in DynamoDB with Boto - python

When I attempt to insert a range_key that contains a number of more than 2 decimal places, the number stored in the database is truncated to the first 2 decimals.
How do I get around this?
max_number = 1000000.0
random_time = random.randrange(1, max_number-1) / max_number
range_key = int(time.time()) + random_time
data['item_id'] = '12345'
result = db.add(table='media', key=group_id,
range_key = range_key,
data=data)
The resulting range_key of "1347053744.819199" gets inserted as "1347053744.82"

UPDATE:
This is actually a bug in boto, see https://github.com/boto/boto/pull/890#issuecomment-8456495 for tracking
ORIGINAL ANSWER:
Why are you using add to store a new item ? add is the function for atomic increments. You should use put_item instead.
I do not know why your float is rounded but anyway, you really should not try it. It is a bad practice to use float as keys. It is impossible to reliably check floats equality because they are inherently approximation. see http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.html for in depth insight on floats.
Nonetheless, if you really need floats as keys, you need to fully control there representation so as not to be dependent of the server-side implementation when de-serializing the JSON of the request. You need to replace it by a regular string.

Related

Are the values 161137531201111100, 1.611375312011111e+17 equal?

I am trying to manipulate a dataframe. The value of in a list which I use to append a column to the dataframe is 161137531201111100. However, I created a dictionary whose keys are the unique values of this column, and I use this dictionary in further operations. This could used to run perfectly before.
However, after trying this code on another data I had the following error:
KeyError: 1.611375312011111e+17
which means that this value is not the of the dictionary; I tried to trace the code, everything seemed to be okay. However, when I opened the csv file of the dataframe I built I found out that the value that is causing the problem is: 161137531201111000 which is not in the list(and ofc not a key in the dictionary) I used to create this column of dataframe. This seems weird. However, I don't know what is the reason? Is there any reason that a number is saved in another way?
And how can I save it as it is in all phases? Also, why did it change in the csv?

No unfortunately, they are not equal
print(1.611375312011111e+17 == 161137531201111000)` # False.
The problem lies in the way floating numbers are handled by computers, in general, and most programming languages, including Python.
Always use integers (and not "too large") when doing computations if you want exact results.
See Is floating point math broken? for generic explanation that you definitely must know as a programmer, even if it's not specific to Python.
(and be aware that Python tries to do a rather good job at keeping precision on integers, that unfortunately won't work on floating-point numbers).
And just for the sake of "fun" with floating point numbers, 1.611375312011111e+17 is actually equal to the integer 161137531201111104!
print(format (1.611375312011111e+17, ".60g")) # shows 161137531201111104
print(1.611375312011111e+17 == 161137531201111104) # True
a = dict()
a[1.611375312011111e+17] = "hello"
#print(a[161137531201111100]) # Key error, as in question
print(a[161137531201111104]) # This one shows "hello" properly!

Convert two raw values to 32-bit IEEE floating point number

I am attempting to decode some data from a Shark 100 Power Meter via TCP modbus. I have successfully pulled down the registers that I need, and am left with two raw values from the registers like so:
[17138, 59381]
From the manual, I know that I need to convert these two numbers into a 32bit IEEE floating-point number. I also know from the manual that "The lower-addressed register is the
high order half (i.e., contains the exponent)." The first number in the list shown above is the lower-addressed register.
Using Python (any library will do if needed), how would I take these two values and make them into a 32 bit IEEE floating point value.
I have tried to use various online converters and calculators to figure out a non-programmatic way to do this, however, anything I have tried gets me a result that is way out of bounds (I am reading volts in this case so the end result should be around 120-122 from the supplied values above).

Update for Python 3.6+ (f-strings).
I am not sure why the fill in #B.Go's answer was only 2. Also, since the byte order was big-endian, I hardcoded it as such.
import struct
a = 17138
b = 59381
struct.unpack('>f', bytes.fromhex(f"{a:0>4x}" + f"{b:0>4x}"))[0]
Output: 121.45304107666016

The following code works:
import struct
a=17138
b=59381
struct.unpack('!f', bytes.fromhex('{0:02x}'.format(a) + '{0:02x}'.format(b)))
It gives
(121.45304107666016,)
Adapted from Convert hex to float and Integer to Hexadecimal Conversion in Python

I read in the comments, and #Sanju had posted this link: https://github.com/riptideio/pymodbus/blob/master/examples/common/modbus_payload.py
For anyone using pymodbus, the BinaryPayloadDecoder is useful as it's built in. It's very easy to pass a result.registers, as shown in the example. Also, it has a logging integrated, so you can help debug why a conversion isn't working (ex: wrong endianness).
As such, I made a working example for this question (using pymodbus==2.3.0):
from pymodbus.constants import Endian
from pymodbus.payload import BinaryPayloadDecoder
a = 17138
b = 59381
registers = [a, b]
decoder = BinaryPayloadDecoder.fromRegisters(registers, byteorder=Endian.Big)
decoder.decode_32bit_float() # type: float
Output: 121.45304107666016

How to impliment a binary search on a list created from a file

This is my first post, please be gentle. I'm attempting to sort some
files into ascending and descending order. Once I have sorted a file, I am storing it in a list which is assigned to a variable. The user is then to choose a file and search for an item. I get an error message....
TypeError: unorderable types; int() < list()
.....when ever I try to search for an item using the variable of my sorted list, the error occurs on line 27 of my code. From research, I know that an int and list cannot be compared, but I cant for the life of me think how else to search a large (600) list for an item.
At the moment I'm just playing around with binary search to get used to it.
Any suggestions would be appreciated.
year = []
with open("Year_1.txt") as file:
for line in file:
line = line.strip()
year.append(line)
def selectionSort(alist):
for fillslot in range(len(alist)-1,0,-1):
positionOfMax=0
for location in range(1,fillslot+1):
if alist[location]>alist[positionOfMax]:
positionOfMax = location
temp = alist[fillslot]
alist[fillslot] = alist[positionOfMax]
alist[positionOfMax] = temp
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
selectionSort(year)
testlist = []
testlist.append(year)
print(binarySearch(testlist, 2014))
Year_1.txt file consists of 600 items, all years in the format of 2016.
They are listed in descending order and start at 2017, down to 2013. Hope that makes sense.

Is there some reason you're not using the Python: bisect module?
Something like:
import bisect
sorted_year = list()
for each in year:
bisect.insort(sorted_year, each)
... is sufficient to create the sorted list. Then you can search it using functions such as those in the documentation.
(Actually you could just use year.sort() to sort the list in-place ... bisect.insort() might be marginally more efficient for building the list from the input stream in lieu of your call to year.append() ... but my point about using the `bisect module remains).
Also note that 600 items is trivial for modern computing platforms. Even 6,000 won't take but a few milliseconds. On my laptop sorting 600,000 random integers takes about 180ms and similar sized strings still takes under 200ms.
So you're probably not gaining anything by sorting this list in this application at that data scale.
On the other hand Python also includes a number of modules in its standard libraries for managing structured data and data files. For example you could use Python: SQLite3.
Using this you'd use standard SQL DDL (data definition language) to describe your data structure and schema, SQL DML (data manipulation language: INSERT, UPDATE, and DELETE statements) to manage the contents of the data and SQL queries to fetch data from it. Your data can be returned sorted on any column and any mixture of ascending and descending on any number of columns with the standard SQL ORDER BY clauses and you can add indexes to your schema to ensure that the data is stored in a manner to enable efficient querying and traversal (table scans) in any order on any key(s) you choose.
Because Python includes SQLite in its standard libraries, and because SQLite provides SQL client/server semantics over simple local files ... there's almost no downside to using it for structured data. It's not like you have to install and maintain additional software, servers, handle network connections to a remote database server nor any of that.

I'm going to walk through some steps before getting to the answer.
You need to post a [mcve]. Instead of telling us to read from "Year1.txt", which we don't have, you need to put the list itself in the code. Do you NEED 600 entries to get the error in your code? No. This is sufficient:
year = ["2001", "2002", "2003"]
If you really need 600 entries, then provide them. Either post the actual data, or
year = [str(x) for x in range(2017-600, 2017)]
The code you post needs to be Cut, Paste, Boom - reproduces the error on my computer just like that.
selectionSort is completely irrelevant to the question, so delete it from the question entirely. In fact, since you say the input was already sorted, I'm not sure what selectionSort is actually supposed to do in your code, either. :)
Next you say testlist = [].append(year). USE YOUR DEBUGGER before you ask here. Simply looking at the value in your variable would have made a problem obvious.
How to append list to second list (concatenate lists)
Fixing that means you now have a list of things to search. Before you were searching a list to see if 2014 matched the one thing in there, which was a complete list of all the years.
Now we get into binarySearch. If you look at the variables, you see you are comparing the integer 2014 with some string, maybe "1716", and the answer to that is useless, if it even lets you do that (I have python 2.7 so I am not sure exactly what you get there). But the point is you can't find the integer 2014 in a list of strings, so it will always return False.
If you don't have a debugger, then you can place strategic print statements like
print ("debug info: binarySearch comparing ", item, alist[midpoint])
Now here, what VBB said in comments worked for me, after I fixed the other problems. If you are searching for something that isn't even in the list, and expecting True, that's wrong. Searching for "2014" returns True, if you provide the correct list to search. Alternatively, you could force it to string and then search for it. You could force all the years to int during the input phase. But the int 2014 is not the same as the string "2014".

C data structures

Is there a C data structure equatable to the following python structure?
data = {'X': 1, 'Y': 2}
Basically I want a structure where I can give it an pre-defined string and have it come out with an integer.

The data-structure you are looking for is called a "hash table" (or "hash map"). You can find the source code for one here.
A hash table is a mutable mapping of an integer (usually derived from a string) to another value, just like the dict from Python, which your sample code instantiates.
It's called a "hash table" because it performs a hash function on the string to return an integer result, and then directly uses that integer to point to the address of your desired data.
This system makes it extremely extremely quick to access and change your information, even if you have tons of it. It also means that the data is unordered because a hash function returns a uniformly random result and puts your data unpredictable all over the map (in a perfect world).

Also note that if you're doing a quick one-off hash, like a two or three static hash for some lookup: look at gperf, which generates a perfect hash function and generates simple code for that hash.

The above data structure is a dict type.
In C/C++ paralance, a hashmap should be equivalent, Google for hashmap implementation.

There's nothing built into the language or standard library itself but, depending on your requirements, there are a number of ways to do it.
If the data set will remain relatively small, the easiest solution is to probably just have an array of structures along the lines of:
typedef struct {
char *key;
int val;
} tElement;
then use a sequential search to look them up. Have functions which insert keys, delete keys and look up keys so that, if you need to change it in future, the API itself won't change. Pseudo-code:
def init:
create g.key[100] as string
create g.val[100] as integer
set g.size to 0
def add (key,val):
if lookup(key) != not_found:
return already_exists
if g.size == 100:
return no_space
g.key[g.size] = key
g.val[g.size] = val
g.size = g.size + 1
return okay
def del (key):
pos = lookup (key)
if pos == not_found:
return no_such_key
if pos < g.size - 1:
g.key[pos] = g.key[g.size-1]
g.val[pos] = g.val[g.size-1]
g.size = g.size - 1
def find (key):
for pos goes from 0 to g.size-1:
if g.key[pos] == key:
return pos
return not_found
Insertion means ensuring it doesn't already exist then just tacking an element on to the end (you'll maintain a separate size variable for the structure). Deletion means finding the element then simply overwriting it with the last used element and decrementing the size variable.
Now this isn't the most efficient method in the world but you need to keep in mind that it usually only makes a difference as your dataset gets much larger. The difference between a binary tree or hash and a sequential search is irrelevant for, say, 20 entries. I've even used bubble sort for small data sets where a more efficient one wasn't available. That's because it massively quick to code up and the performance is irrelevant.
Stepping up from there, you can remove the fixed upper size by using a linked list. The search is still relatively inefficient since you're doing it sequentially but the same caveats apply as for the array solution above. The cost of removing the upper bound is a slight penalty for insertion and deletion.
If you want a little more performance and a non-fixed upper limit, you can use a binary tree to store the elements. This gets rid of the sequential search when looking for keys and is suited to somewhat larger data sets.
If you don't know how big your data set will be getting, I would consider this the absolute minimum.
A hash is probably the next step up from there. This performs a function on the string to get a bucket number (usually treated as an array index of some sort). This is O(1) lookup but the aim is to have a hash function that only allocates one item per bucket, so that no further processing is required to get the value.
A degenerate case of "all items in the same bucket" is no different to an array or linked list.
For maximum performance, and assuming the keys are fixed and known in advance, you can actually create your own hashing function based on the keys themselves.
Knowing the keys up front, you have extra information that allows you to fully optimise a hashing function to generate the actual value so you don't even involve buckets - the value generated by the hashing function can be the desired value itself rather than a bucket to get the value from.
I had to put one of these together recently for converting textual months ("January", etc) in to month numbers. You can see the process here.
I mention this possibility because of your "pre-defined string" comment. If your keys are limited to "X" and "Y" (as in your example) and you're using a character set with contiguous {W,X,Y} characters (which even covers EBCDIC as well as ASCII though not necessarily every esoteric character set allowed by ISO), the simplest hashing function would be:
char *s = "X";
int val = *s - 'W';
Note that this doesn't work well if you feed it bad data. These are ideal for when the data is known to be restricted to certain values. The cost of checking data can often swamp the saving given by a pre-optimised hash function like this.

C doesn't have any collection classes. C++ has std::map.
You might try searching for C implementations of maps, e.g. http://elliottback.com/wp/hashmap-implementation-in-c/

A 'trie' or a 'hasmap' should do. The simplest implementation is an array of struct { char *s; int i }; pairs.
Check out 'trie' in 'include/nscript.h' and 'src/trie.c' here: http://github.com/nikki93/nscript . Change the 'trie_info' type to 'int'.

Try a Trie for strings, or a Tree of some sort for integer/pointer types (or anything that can be compared as "less than" or "greater than" another key). Wikipedia has reasonably good articles on both, and they can be implemented in C.

Interpreting Excel Currency Values

I am using python to read a currency value from excel. The returned from the range.Value method is a tuple that I don't know how to parse.
For example, the cell appears as $548,982, but in python the value is returned as (1, 1194857614).
How can I get the numerical amount from excel or how can I convert this tuple value into the numerical value?
Thanks!

Try this:
import struct
try: import decimal
except ImportError:
divisor= 10000.0
else:
divisor= decimal.Decimal(10000)
def xl_money(i1, i2):
byte8= struct.unpack(">q", struct.pack(">ii", i1, i2))[0]
return byte8 / divisor
>>> xl_money(1, 1194857614)
Decimal("548982.491")
Money in Microsoft COM is an 8-byte integer; it's fixed point, with 4 decimal places (i.e. 1 is represented by 10000). What my function does, is take the tuple of 4-byte integers, make an 8-byte integer using struct to avoid any issues of sign, and then dividing by the constant 10000. The function uses decimal.Decimal if available, otherwise it uses float.
UPDATE (based on comment): So far, it's only COM Currency values being returned as a two-integer tuple, so you might want to check for that, but there are no guarantees that this will always be successful. However, depending on the library you use and its version, it's quite possible that later on, after some upgrade, you will be receiving decimal.Decimals and not two-integer tuples anymore.

I tried this with Excel 2007 and VBA. It is giving correct value.
1) Try pasting this value in a new excel workbook
2) Press Alt + F11. Gets you to VBA Editor.
3) Press Ctrl + G. Gets you to immediate window.
4) In the immediate window, type ?cells("a1").Value
here "a1" is the cell where you have pasted the value.
I am doubting that the cell has some value or character due to which it is interpreted this way.
Post your observations here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.