Is python uuid1 sequential as timestamps? - python

Python docs states that uuid1 uses current time to form the uuid value. But I could not find a reference that ensures UUID1 is sequential.
>>> import uuid
>>> u1 = uuid.uuid1()
>>> u2 = uuid.uuid1()
>>> u1 < u2
True
>>>

But not always:
>>> def test(n):
... old = uuid.uuid1()
... print old
... for x in range(n):
... new = uuid.uuid1()
... if old >= new:
... print "OOops"
... break
... old = new
... print new
>>> test(1000000)
fd4ae687-3619-11e1-8801-c82a1450e52f
OOops
00000035-361a-11e1-bc9f-c82a1450e52f

UUIDs Not Sequential
No, standard UUIDs are not meant to be sequential.
Apparently some attempts were made with GUIDs (Microsoft's twist on UUIDs) to make them sequential to help with performance in certain database scenarios. But being sequential is not the intent of UUIDs.
http://en.wikipedia.org/wiki/Globally_unique_identifier
MAC Is Last, Not First
No, in standard UUIDs, the MAC address is not the first component. The MAC address is the last component in a Version 1 UUID.
http://en.wikipedia.org/wiki/Universally_unique_identifier
Do Not Assume Which Type Of UUID
The various versions of UUIDs are meant to be compatible with each other. So it may be unreasonable to expect that you always have Version 1 UUIDs. Other programmers may use other versions.
Specification
Read the UUID spec, RFC 4122, by the IETF. Only a dozen pages long.

From the python UUID docs:
Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.
From this, I infer that the MAC address is first, then a (possibly random) sequence number, then the current time. So I would not expect these to be guaranteed to be monotonically increasing, even for UUIDs generated by the same machine/process.

I stumbled upon a probable answer in Cassandra/Python from http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/
Lexicographic TimeUUID ordering
Cassandra provides, among all the primitive types, support for UUID values of type 1 (time and server based) and type 4 (random).
The primary use of UUID (Unique Universal IDentifier) is to obtain a really unique identifier in a potentially distributed environment.
Cassandra does support version 1 UUID. It gives you an unique identifier by combining the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar.
As you can see the precision is only 100 nanoseconds, but fortunately it is mixed with a clock sequence to add randomness. Furthermore the MAC address is also used to compute the UUID so it’s very unlikely that you face collision on one cluster of machine, unless you need to process a really really huge volume of data (don’t forget, not everyone is Twitter or Facebook).
One of the most relevant use case for UUID, and espcecially TimeUUID, is to use it as column key. Since Cassandra column keys are sorted, we can take advantage of this feature to have a natural ordering for our column families.
The problem with the default com.eaio.uuid.UUID provided by the Hector client is that it’s not easy to work with. As an ID you may need to bring this value from the server up to the view layer, and that’s the gotcha.
Basically, com.eaio.uuid.UUID overrides the toString() to gives a String representation of the UUID. However this String formatting cannot be sorted lexicographically…
Below are some TimeUUID generated consecutively:
8e4cab00-c481-11e1-983b-20cf309ff6dc at some t1
2b6e3160-c482-11e1-addf-20cf309ff6dc at some t2 with t2 > t1
“2b6e3160-c482-11e1-addf-20cf309ff6dc”.compareTo(“8e4cab00-c481-11e1-983b-20cf309ff6dc”) gives -6 meaning that “2b6e3160-c482-11e1-addf-20cf309ff6dc” is less/before “8e4cab00-c481-11e1-983b-20cf309ff6dc” which is incorrect.
The current textual display of TimeUUID is split as follow:
time_low – time_mid – time_high_and_version – variant_and_sequence – node
If we re-order it starting with time_high_and_version, we can then sort it lexicographically:
time_high_and_version – time_mid – time_low – variant_and_sequence – node
The utility class is given below:
public static String reorderTimeUUId(String originalTimeUUID)
{
StringTokenizer tokens = new StringTokenizer(originalTimeUUID, "-");
if (tokens.countTokens() == 5)
{
String time_low = tokens.nextToken();
String time_mid = tokens.nextToken();
String time_high_and_version = tokens.nextToken();
String variant_and_sequence = tokens.nextToken();
String node = tokens.nextToken();
return time_high_and_version + '-' + time_mid + '-' + time_low + '-' + variant_and_sequence + '-' + node;
}
return originalTimeUUID;
}
The TimeUUIDs become:
11e1-c481-8e4cab00-983b-20cf309ff6dc
11e1-c482-2b6e3160-addf-20cf309ff6dc
Now we get:
"11e1-c481-8e4cab00-983b-20cf309ff6dc".compareTo("11e1-c482-2b6e3160-addf-20cf309ff6dc") = -1

Argumentless use of uuid.uuid1() gives non-sequential results (see answer by #basil-bourque), but it can be easily made sequential if you set clock_seq or node arguments (because in this case uuid1 uses python implementation that guarantees to have unique and sequential timestamp part of the UUID in current process):
import time
from uuid import uuid1, getnode
from random import getrandbits
_my_clock_seq = getrandbits(14)
_my_node = getnode()
def sequential_uuid(node=None):
return uuid1(node=node, clock_seq=_my_clock_seq)
def alt_sequential_uuid(clock_seq=None):
return uuid1(node=_my_node, clock_seq=clock_seq)
if __name__ == '__main__':
from itertools import count
old_n = uuid1() # "Native"
old_s = sequential_uuid() # Sequential
native_conflict_index = None
t_0 = time.time()
for x in count():
new_n = uuid1()
new_s = sequential_uuid()
if old_n > new_n and not native_conflict_index:
native_conflict_index = x
if old_s >= new_s:
print("OOops: non-sequential results for `sequential_uuid()`")
break
if (x >= 10*0x3fff and time.time() - t_0 > 30) or (native_conflict_index and x > 2*native_conflict_index):
print('No issues for `sequential_uuid()`')
break
old_n = new_n
old_s = new_s
print(f'Conflicts for `uuid.uuid1()`: {bool(native_conflict_index)}')
print(f"Tries: {x}")
Multiple processes issues
BUT if you are running some parallel processes on the same machine, then:
node which defaults to uuid.get_node() will be the same for all the processes;
clock_seq has small chance to be the same for some processes (chance of 1/16384)
That might lead to conflicts! That is general concern for using
uuid.uuid1 in parallel processes on the same machine unless you have access to SafeUUID from Python3.7.
If you make sure to also set node to unique value for each parallel process that runs this code, then conflicts should not happen.
Even if you are using SafeUUID, and set unique node, it's still possible to have non-sequential ids if they are generated in different processes.
If some lock-related overhead is acceptable, then you can store clock_seq in some external atomic storage (for example in "locked" file) and increment it with each call: this allows to have same value for node on all parallel processes and also will make id-s sequential. For cases when all parallel processes are subprocesses created using multiprocessing: clock_seq can be "shared" using multiprocessing.Value

Related

Is there a way to run a transaction multiple time at once?

First, I would like to apologize in advance. I just started learning this a few months ago, so I need stuff broken down completely. I have a project using python and datajoint (makes sql code shorter to write) where I need to create an airport that has at least 7 airports, different planes and what not. Then I need to populate the tables with passenger reservations. Here is what I have so far.
#schema
class Seat(dj.Lookup):
definition = """
aircraft_seat : varchar(25)
"""
contents = [["F_Airbus_1A"],["F_Airbus_1B"],["F_Airbus_2A"],["F_Airbus_2B"],["F_Airbus_3A"],
["F_Airbus_3B"],["F_Airbus_4A"],["F_Airbus_4B"],["F_Airbus_5A"],["F_Airbus_5B"],
["B_Airbus_6A"],["B_Airbus_6B"],["B_Airbus_6C"],["B_Airbus_6D"],["B_Airbus_7A"],
["B_Airbus_7B"],["B_Airbus_7C"],["B_Airbus_7D"],["B_Airbus_8A"],["B_Airbus_8B"],
["B_Airbus_8C"],["B_Airbus_8D"],["B_Airbus_9A"],["B_Airbus_9B"],
This keeps going leaving me with a total of 144 seats on each plane.
#schema
class Flight(dj.Manual):
definition = """
flight_no : int
---
economy_price : decimal(6,2)
departure : datetime
arrival : datetime
---
origin_code : int
dest_code : int
"""
#schema
class Passenger(dj.Manual):
definition = """
passenger_id : int
---
full_name : varchar(40)
ssn : varchar(20)
"""
#schema
class Reservation(dj.Manual):
definition = """
-> Flight
-> Seat
---
-> Passenger
"""
Then I populate flights and passengers:
Flight.insert((dict(flight_no = i,
economy_price = round(random.randint(100, 1000), 2),
departure = faker.date_time_this_month(),
arrival = faker.date_time_this_month(),
origin_code = random.randint(1,7),
dest_code = random.randint(1,7)))
for i in range(315))
Passenger.insert(((dict(passenger_id=i, full_name=faker.name(),
ssn = faker.ssn()))
for i in range(10000)), skip_duplicates = True)
Lastly I create the transaction:
def reserve(passenger_id, origin_code, dest_code, departure):
with dj.conn().transaction:
available_seats = ((Seat * Flight - Reservation) & Passenger &
{'passenger_id':passenger_id}).fetch(as_dict=True)
try:
choice = random.choice(available_seats)
except IndexError:
raise IndexError(f'Sorry, no seats available for {departure}')
name = (Passenger & {'passenger_id': passenger_id}).fetch1('full_name')
print('Success. Reserving seat {aircraft_seat} at ticket_price {economy_price} for
{name}'.format(name=name, **choice))
Reservation.insert1(dict(choice, passenger_id=passenger_id), ignore_extra_fields=True)
reserve(random.randint(1,1000), random.randint(1,7),
random.randint(1,7),random.choice('departure'))
Output[]: Success. Reserving seat E_Yak242_24A at ticket_price 410.00 for Cynthia Erickson
Reservation()
Output[]: flight_no aircraft_seat passenger_id
66 B_Yak242_7A 441
So I am required to have 10.5 flights a day with the planes at least 75% full which leaves me needing over 30000 reservations. Is there a way to do this like 50 at a time? I have been searching for an answer and have not been able to find a solution. Thank you.
One of the maintainers for DataJoint here. First off, I'd like to say thanks for trying out DataJoint; curious as to how you found out about the project.
Forewarning, this will be a long post but I feel it is a good opportunity to clear up a few things. Regarding the problem in question, not sure if I fully understand the nature of your problem but let me follow on several points. I recommend reading this answer in its entirety before determining how best to proceed for your case.
TL;DR: Compute tables are your friend.
Multi-threading
Since it has come up in the comments it is worth addressing that as of 2021-01-06, DataJoint is not completely thread-safe (at least from the perspective of sharing connections). It is unfortunate but it is mainly due to a standing issue with PyMySQL which is a principal dependency of DataJoint. That said, if you initiate a new connection on each thread or process you should not run into any issues. However, this is an expensive workaround and can't be combined with transactions since they require that operations be conducted within a single connection. Speaking of which...
Compute Tables and Job Reservation
Compute tables is one noticeable omission from your above attempt at a solution. Compute tables provide a mechanism to associate its entities to those in an upstream parent table with addional processing prior to insert (defined in a make method in your Compute table class) where it may be inoked by calling the populate method which calls the make method for each new entry. Calls to your make method are transaction-constrained and should achieve what you are looking for. See here in the docs for more details in its use.
Also, for additional performance gains, there is another feature called Job Reservation which provides a means to pool together multiple workers to process large data sets (using populate) in an organized, distributed manner. I don't feel it is required here but worth mentioning and ultimately up to how you view the results below. You may find out more on this feature here in our docs.
Schema Design
Based on my understanding of your initial design, I have some suggestions how we can improve the flow of the data to increase clarity, performance, and also to provide specific examples on how we can use the power of Compute tables. Running as illustrated below on my local setup, I was able to process your requirement of 30k reservations in 29m54s with 2 different plane model types, 7 airports, 10k possible passengers, 550 available flights. Minimum 75% seating capacity was not verified only because I didn't see you attempt this yet, though if you see how I am assigning seats you will notice that it is almost there. :)
Disclaimer: I should note that the below design is still a large oversimplification of the actual real-world challenge to orchestrate proper travel reservations. Considerable assumptions were taken mainly for the benefit of education as opposed to submitting a full, drop-in solution. As such, I have explicitly chosen to avoid using longblob for the below solution so that it is easier to follow along. In reality, a proper solution would likely include more advanced topics for further performance gains e.g. longblob, _update, etc.
That said, let's begin by considering the following:
import datajoint as dj # conda install -c conda-forge datajoint or pip install datajoint
import random
from faker import Faker # pip install Faker
faker = Faker()
Faker.seed(0) # Pin down randomizer between runs
schema = dj.Schema('commercial_airtravel') # instantiate a workable database
#schema
class Plane(dj.Lookup):
definition = """
# Defines manufacturable plane model types
plane_type : varchar(25) # Name of plane model
---
plane_rows : int # Number of rows in plane model i.e. range(1, plane_rows + 1)
plane_columns : int # Number of columns in plane model; to extract letter we will need these indices
"""
contents = [('B_Airbus', 37, 4), ('F_Airbus', 40, 5)] # Since new entries to this table should happen infrequently, this is a good candidate for a Lookup table
#schema
class Airport(dj.Lookup):
definition = """
# Defines airport locations that can serve as origin or destination
airport_code : int # Airport's unique identifier
---
airport_city : varchar(25) # Airport's city
"""
contents = [(i, faker.city()) for i in range(1, 8)] # Also a good candidate for Lookup table
#schema
class Passenger(dj.Manual):
definition = """
# Defines users who have registered accounts with airline i.e. passenger
passenger_id : serial # Passenger's unique identifier; serial simply means an auto-incremented, unsigned bigint
---
full_name : varchar(40) # Passenger's full name
ssn : varchar(20) # Passenger's Social Security Number
"""
Passenger.insert((dict(full_name=faker.name(),
ssn = faker.ssn()) for _ in range(10000))) # Insert a random set of passengers
#schema
class Flight(dj.Manual):
definition = """
# Defines specific planes assigned to a route
flight_id : serial # Flight's unique identifier
---
-> Plane # Flight's plane model specs; this will simply create a relation to Plane table but not have the constraint of uniqueness
flight_economy_price : decimal(6,2) # Flight's fare price
flight_departure : datetime # Flight's departure time
flight_arrival : datetime # Flight's arrival time
-> Airport.proj(flight_origin_code='airport_code') # Flight's origin; by using proj in this way we may rename the relation in this table
-> Airport.proj(flight_dest_code='airport_code') # Flight's destination
"""
plane_types = Plane().fetch('plane_type') # Fetch available plane model types
Flight.insert((dict(plane_type = random.choice(plane_types),
flight_economy_price = round(random.randint(100, 1000), 2),
flight_departure = faker.date_time_this_month(),
flight_arrival = faker.date_time_this_month(),
flight_origin_code = random.randint(1, 7),
flight_dest_code = random.randint(1, 7))
for _ in range(550))) # Insert a random set of flights; for simplicity we are not verifying that flight_departure < flight_arrival
#schema
class BookingRequest(dj.Manual):
definition = """
# Defines one-way booking requests initiated by passengers
booking_id : serial # Booking Request's unique identifier
---
-> Passenger # Passenger who made request
-> Airport.proj(flight_origin_code='airport_code') # Booking Request's desired origin
-> Airport.proj(flight_dest_code='airport_code') # Booking Request's desired destination
"""
BookingRequest.insert((dict(passenger_id = random.randint(1, 10000),
flight_origin_code = random.randint(1, 7),
flight_dest_code = random.randint(1, 7))
for i in range(30000))) # Insert a random set of booking requests
#schema
class Reservation(dj.Computed):
definition = """
# Defines booked reservations
-> BookingRequest # Association to booking request
---
flight_id : int # Flight's unique identifier
reservation_seat : varchar(25) # Reservation's assigned seat
"""
def make(self, key):
# Determine booking request's details
full_name, flight_origin_code, flight_dest_code = (BookingRequest * Passenger & key).fetch1('full_name',
'flight_origin_code',
'flight_dest_code')
# Determine possible flights to satisfy booking
possible_flights = (Flight * Plane *
Airport.proj(flight_dest_city='airport_city',
flight_dest_code='airport_code') &
dict(flight_origin_code=flight_origin_code,
flight_dest_code=flight_dest_code)).fetch('flight_id',
'plane_rows',
'plane_columns',
'flight_economy_price',
'flight_dest_city',
as_dict=True)
# Iterate until we find a vacant flight and extract details
for flight_meta in possible_flights:
# Determine seat capacity
all_seats = set((f'{r}{l}' for rows, letters in zip(*[[[n if i==0 else chr(n + 64)
for n in range(1, el + 1)]]
for i, el in enumerate((flight_meta['plane_rows'],
flight_meta['plane_columns']))])
for r in rows
for l in letters))
# Determine unavailable seats
taken_seats = set((Reservation & dict(flight_id=flight_meta['flight_id'])).fetch('reservation_seat'))
try:
# Randomly choose one of the available seats
reserved_seat = random.choice(list(all_seats - taken_seats))
# You may uncomment the below line if you wish to print the success message per processed record
# print(f'Success. Reserving seat {reserved_seat} at ticket_price {flight_meta["flight_economy_price"]} for {full_name}.')
# Insert new reservation
self.insert1(dict(key, flight_id=flight_meta['flight_id'], reservation_seat=reserved_seat))
return
except IndexError:
pass
raise IndexError(f'Sorry, no seats available departing to {flight_meta["flight_dest_city"]}')
Reservation.populate(display_progress=True) # This is how we process new booking requests to assign a reservation; you may invoke this as often as necessary
Syntax and Convention Nits
Lastly, just some minor feedback in your provided code. Regarding table definitions, you should only use --- once in the definition to identify a clear distinction between primary key attributes and secondary attributes (See your Flight table). Unexpectedly, this did not throw an error in your case but should have done so. I will file an issue since this appears to be a bug.
Though transaction is exposed on dj.conn(), it is quite rare to need to invoke it directly. DataJoint provides the benefit of handling this internally to reduce the management overhead of this from the user. However, the option is still available should it be needed for corner-cases. For your case, I would avoid invoking it directly and reccomend using Computed (or also Imported) tables instead.

Suggesting the next available IP network block

I was wondering if there's a good way to find the next available gap to create a network block given a list of existing ones?
For example, I have these networks in my list:
[
'10.0.0.0/24',
'10.0.0.0/20',
'10.10.0.0/20',
]
and then someone comes along and ask: "Do you have have enough space for 1 /22 for me?"
I'd like to be able to suggest something along the line:
"Here's a space: x.x.x.x/22" (x.x.x.x is something that comes before 10.0.0.0)
or
"Here's a space: x.x.x.x/22" (x.x.x.x is something in between 10.0.0.255 and 10.10.0.0)
or
"Here's a space: x.x.x.x/22" (x.x.x.x is something that comes after 10.10.15.255)
I'd really appreciate any suggestions.
The ipaddress library is good for this sort of use case. You can use the IPv4Network class to define subnet ranges, and the IPv4Address objects it can return can be converted into integers for comparison.
What I do below:
Establish your given list as a list of IPv4Networks
Determine the size of the block we're looking for
Iterate through the list, computing the amount of space between consecutive blocks, and checking if our wanted block fits.
You could also return an IPv4Network with the subnet built into it, instead of an IPv4Address, but I'll leave that as an exercise to the reader.
from ipaddress import IPv4Network, IPv4Address
networks = [
IPv4Network('10.0.0.0/24')
IPv4Network('10.0.0.0/20')
IPv4Network('10.0.10.0/20')
]
wanted = 22
wanted_size = 2 ** (32 - wanted) # number of addresses in a /22
space_found = None
for i in range(1, len(networks):
previous_network_end = int(networks[i-1].network_address + int(networks[i-1].hostmask))
next_network_start = int(networks[i].network_address)
free_space_size = next_network_start - previous_network_end
if free_space_size >= wanted_size:
return IPv4Address(networks[i-1] + 1) # first available address

PyModbus efficiency with out of order registers and different functions

I've recently started using PyModbus and have found it very easy to do basic polling with their ModbusTCPClient and read_holding_registers function.
I'm interested now in the best ways to structure a more complex logger - non-consecutive registers, different function codes, different Endian encoding, etc.
For example - to avoid a separate 'read_holding_registers' call for each tag of a device, I have built a function that groups all consecutive tag registers to reduce the number of calls.
I'm planning to implement a similar thing for BinaryPayloadDecoders - group by registers with the same byteorder and wordorder to reduce the number of decoder instances.
def polldevicesfast(client, device, taglist):
#loop through tags, order by address, group consecutive addresses in single reads, merge resulting lists, decode
orderedtaglist = sorted(taglist, key = lambda i: i['address'])
callgroups = sorttogroups(orderedtaglist)
allreturns = []
results = []
for acall in callgroups:
areturn = client.read_holding_registers(acall['start'], (1 + (acall['end'] - acall['start'])), unit=device['device_id'])
allreturns = allreturns + areturn.registers
decoder = BinaryPayloadDecoder.fromRegisters(allreturns, byteorder=Endian.Big, wordorder=Endian.Big)
for tag in orderedtaglist:
results.append({'tagname': tag['name'], 'value': str(tag['autoScaling']['slope'] * mydecoder(tag['dataType'], decoder)), 'unit': tag['unit']})
client.close()
return results
None of this is extremely complicated - it just seems like there has to already be an accepted standard or template for this somewhere that I can't seem to find in any of their documentation online.

Shortest possible generated unique ID

So we can generate a unique id with str(uuid.uuid4()), which is 36 characters long.
Is there another method to generate a unique ID which is shorter in terms of characters?
EDIT:
If ID is usable as primary key then even better
Granularity should be better than 1ms
This code could be distributed, so we can't assume time independence.
If this is for use as a primary key field in db, consider just using auto-incrementing integer instead.
str(uuid.uuid4()) is 36 chars but it has four useless dashes (-) in it, and it's limited to 0-9 a-f.
Better uuid4 in 32 chars:
>>> uuid.uuid4().hex
'b327fc1b6a2343e48af311343fc3f5a8'
Or just b64 encode and slice some urandom bytes (up to you to guarantee uniqueness):
>>> base64.b64encode(os.urandom(32))[:8]
b'iR4hZqs9'
TLDR
Most of the times it's better to work with numbers internally and encode them to short IDs externally. So here's a function for Python3, PowerShell & VBA that will convert an int32 to an alphanumeric ID. Use it like this:
int32_to_id(225204568)
'F2AXP8'
For distributed code use ULIDs: https://github.com/mdipierro/ulid
They are much longer but unique across different machines.
How short are the IDs?
It will encode about half a billion IDs in 6 characters so it's as compact as possible while still using only non-ambiguous digits and letters.
How can I get even shorter IDs?
If you want even more compact IDs/codes/Serial Numbers, you can easily expand the character set by just changing the chars="..." definition. For example if you allow all lower and upper case letters you can have 56 billion IDs within the same 6 characters. Adding a few symbols (like ~!##$%^&*()_+-=) gives you 208 billion IDs.
So why didn't you go for the shortest possible IDs?
The character set I'm using in my code has an advantage: It generates IDs that are easy to copy-paste (no symbols so double clicking selects the whole ID), easy to read without mistakes (no look-alike characters like 2 and Z) and rather easy to communicate verbally (only upper case letters). Sticking to numeric digits only is your best option for verbal communication but they are not compact.
I'm convinced: show me the code
Python 3
def int32_to_id(n):
if n==0: return "0"
chars="0123456789ACEFHJKLMNPRTUVWXY"
length=len(chars)
result=""
remain=n
while remain>0:
pos = remain % length
remain = remain // length
result = chars[pos] + result
return result
PowerShell
function int32_to_id($n){
$chars="0123456789ACEFHJKLMNPRTUVWXY"
$length=$chars.length
$result=""; $remain=[int]$n
do {
$pos = $remain % $length
$remain = [int][Math]::Floor($remain / $length)
$result = $chars[$pos] + $result
} while ($remain -gt 0)
$result
}
VBA
Function int32_to_id(n)
Dim chars$, length, result$, remain, pos
If n = 0 Then int32_to_id = "0": Exit Function
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result$ = ""
remain = n
Do While (remain > 0)
pos = remain Mod length
remain = Int(remain / length)
result$ = Mid(chars$, pos + 1, 1) + result$
Loop
int32_to_id = result
End Function
Function id_to_int32(id$)
Dim chars$, length, result, remain, pos, value, power
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result = 0
power = 1
For pos = Len(id$) To 1 Step -1
result = result + (InStr(chars$, Mid(id$, pos, 1)) - 1) * power
power = power * length
Next
id_to_int32 = result
End Function
Public Sub test_id_to_int32()
Dim i
For i = 0 To 28 ^ 3
If id_to_int32(int32_to_id(i)) <> i Then Debug.Print "Error, i=", i, "int32_to_id(i)", int32_to_id(i), "id_to_int32('" & int32_to_id(i) & "')", id_to_int32(int32_to_id(i))
Next
Debug.Print "Done testing"
End Sub
Yes. Just use the current UTC millis. This number never repeats.
const uniqueID = new Date().getTime();
EDIT
If you have the rather seldom requirement to produce more than one ID within the same millisecond, this method is of no use as this number‘s granularity is 1ms.

Find out largest string value size for keys in Redis database

I need to look at our Redis cache and see what the size of our largest stored value is. I am experienced with Python or can use the redis-cli directly. Is there a way to iterate all the keys in the database so I can then inspect the size of each value?
It looks like SCAN is the way to iterate through the keys, but I'm still working out how to put this to use to get the sizes of the values and store a maximum as I go.
Since you mentioned redis-cli as an option, it has a build it function that does pretty much what you ask for (and much more).
redis-cli --bigkeys
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type. You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).
Here is a sample of the summery output:
Sampled 343799 keys in the keyspace!
Total key length in bytes is 9556361 (avg len 27.80)
Biggest string found '530f8dc7c7b3b:39:string:87929' has 500 bytes
Biggest list found '530f8d5a17b26:9:list:11211' has 500 items
Biggest set found '530f8da856e1e:75:set:65939' has 500 members
Biggest hash found '530f8d619a0af:86:hash:16911' has 500 fields
Biggest zset found '530f8d4a9ce31:45:zset:315' has 500 members
68559 strings with 17136672 bytes (19.94% of keys, avg size 249.96)
68986 lists with 17326343 items (20.07% of keys, avg size 251.16)
68803 sets with 17236635 members (20.01% of keys, avg size 250.52)
68622 hashs with 17272144 fields (19.96% of keys, avg size 251.70)
68829 zsets with 17241902 members (20.02% of keys, avg size 250.50)
You can view a complete output example here
Here is a solution which uses redis's built-in scripting capabilities:
local longest
local cursor = "0"
repeat
local ret = redis.call("scan", cursor)
cursor = ret[1]
for _, key in ipairs(ret[2]) do
local length = redis.pcall("strlen", key)
if type(length) == "number" then
if longest == nil or length > longest then
longest = length
end
end
end
until cursor == "0"
return longest
This should run faster than the Python code you provide, Ben Roberts, particularly because the Lua script uses STRLEN over GET + Python's len.
I think I've figured out a basic idea, using the redis-py library
import redis
r= redis.StrictRedis(...)
max_len = 0
for k in r.scan_iter():
try:
val_len = r.strlen(k)
except:
continue
if val_len > max_len:
max_len = val_len
print max_len

Categories

Resources