OBD-Python unable to get VIN number - python

I am using the library OBD-Python and when I tried to get a VIN number from my vehicle even following the Custom Commands documentation, I received this message:
[obd.obd] 'b'0902': VIN NUMBER' is not supported
Date: 2018-07-09 14:48:30.428588 -- VIN NUMBER: None.
def vin(messages):
""" decoder for RPM messages """
d = messages[0].data # only operate on a single message
d = d[2:] # chop off mode and PID bytes
v = bytes_to_int(d) / 4.0 # helper function for converting byte arrays to ints
return v * Unit.VIN # construct a Pint Quantity
c = OBDCommand("VIN", # name
"VIN NUMBER", # description
b"0902", # command
17, # number of return bytes to expect
vin, # decoding function
ECU.ENGINE, # (optional) ECU filter
True) # (optional) allow a "01" to be added for speed
o = obd.OBD()
o.supported_commands.add(c)
o.query(c)
print('Data: ' + str(datetime.datetime.now()) + ' -- VIN NUMBER: '+str(connection.query(c)))
What I am doing wrong?

You're not doing anything wrong. Almost all commands as defined by SAE J1979 are optional – vendors can chose to implement them or not. In the case of your vehicle, it looks like the vendor decided against it.

Some vehicle manufactures respond with all 0xFF in the bytes. They do this, maybe, to thwart the 3rd party OBD2 scan tools providers that only offer a limited number of vehicles that the tool can be used on, noting that to increase that number requires the purchase of more licenses. By filling the VIN with all 0xFF means that this trick no longer works. In doing this their service centers can use 3rd party OBD2 scan tools without having to keep buying additional VIN licenses as their fleet of vehicle they service increases. Just my thoughts.

Related

Is there a way to run a transaction multiple time at once?

First, I would like to apologize in advance. I just started learning this a few months ago, so I need stuff broken down completely. I have a project using python and datajoint (makes sql code shorter to write) where I need to create an airport that has at least 7 airports, different planes and what not. Then I need to populate the tables with passenger reservations. Here is what I have so far.
#schema
class Seat(dj.Lookup):
definition = """
aircraft_seat : varchar(25)
"""
contents = [["F_Airbus_1A"],["F_Airbus_1B"],["F_Airbus_2A"],["F_Airbus_2B"],["F_Airbus_3A"],
["F_Airbus_3B"],["F_Airbus_4A"],["F_Airbus_4B"],["F_Airbus_5A"],["F_Airbus_5B"],
["B_Airbus_6A"],["B_Airbus_6B"],["B_Airbus_6C"],["B_Airbus_6D"],["B_Airbus_7A"],
["B_Airbus_7B"],["B_Airbus_7C"],["B_Airbus_7D"],["B_Airbus_8A"],["B_Airbus_8B"],
["B_Airbus_8C"],["B_Airbus_8D"],["B_Airbus_9A"],["B_Airbus_9B"],
This keeps going leaving me with a total of 144 seats on each plane.
#schema
class Flight(dj.Manual):
definition = """
flight_no : int
---
economy_price : decimal(6,2)
departure : datetime
arrival : datetime
---
origin_code : int
dest_code : int
"""
#schema
class Passenger(dj.Manual):
definition = """
passenger_id : int
---
full_name : varchar(40)
ssn : varchar(20)
"""
#schema
class Reservation(dj.Manual):
definition = """
-> Flight
-> Seat
---
-> Passenger
"""
Then I populate flights and passengers:
Flight.insert((dict(flight_no = i,
economy_price = round(random.randint(100, 1000), 2),
departure = faker.date_time_this_month(),
arrival = faker.date_time_this_month(),
origin_code = random.randint(1,7),
dest_code = random.randint(1,7)))
for i in range(315))
Passenger.insert(((dict(passenger_id=i, full_name=faker.name(),
ssn = faker.ssn()))
for i in range(10000)), skip_duplicates = True)
Lastly I create the transaction:
def reserve(passenger_id, origin_code, dest_code, departure):
with dj.conn().transaction:
available_seats = ((Seat * Flight - Reservation) & Passenger &
{'passenger_id':passenger_id}).fetch(as_dict=True)
try:
choice = random.choice(available_seats)
except IndexError:
raise IndexError(f'Sorry, no seats available for {departure}')
name = (Passenger & {'passenger_id': passenger_id}).fetch1('full_name')
print('Success. Reserving seat {aircraft_seat} at ticket_price {economy_price} for
{name}'.format(name=name, **choice))
Reservation.insert1(dict(choice, passenger_id=passenger_id), ignore_extra_fields=True)
reserve(random.randint(1,1000), random.randint(1,7),
random.randint(1,7),random.choice('departure'))
Output[]: Success. Reserving seat E_Yak242_24A at ticket_price 410.00 for Cynthia Erickson
Reservation()
Output[]: flight_no aircraft_seat passenger_id
66 B_Yak242_7A 441
So I am required to have 10.5 flights a day with the planes at least 75% full which leaves me needing over 30000 reservations. Is there a way to do this like 50 at a time? I have been searching for an answer and have not been able to find a solution. Thank you.
One of the maintainers for DataJoint here. First off, I'd like to say thanks for trying out DataJoint; curious as to how you found out about the project.
Forewarning, this will be a long post but I feel it is a good opportunity to clear up a few things. Regarding the problem in question, not sure if I fully understand the nature of your problem but let me follow on several points. I recommend reading this answer in its entirety before determining how best to proceed for your case.
TL;DR: Compute tables are your friend.
Multi-threading
Since it has come up in the comments it is worth addressing that as of 2021-01-06, DataJoint is not completely thread-safe (at least from the perspective of sharing connections). It is unfortunate but it is mainly due to a standing issue with PyMySQL which is a principal dependency of DataJoint. That said, if you initiate a new connection on each thread or process you should not run into any issues. However, this is an expensive workaround and can't be combined with transactions since they require that operations be conducted within a single connection. Speaking of which...
Compute Tables and Job Reservation
Compute tables is one noticeable omission from your above attempt at a solution. Compute tables provide a mechanism to associate its entities to those in an upstream parent table with addional processing prior to insert (defined in a make method in your Compute table class) where it may be inoked by calling the populate method which calls the make method for each new entry. Calls to your make method are transaction-constrained and should achieve what you are looking for. See here in the docs for more details in its use.
Also, for additional performance gains, there is another feature called Job Reservation which provides a means to pool together multiple workers to process large data sets (using populate) in an organized, distributed manner. I don't feel it is required here but worth mentioning and ultimately up to how you view the results below. You may find out more on this feature here in our docs.
Schema Design
Based on my understanding of your initial design, I have some suggestions how we can improve the flow of the data to increase clarity, performance, and also to provide specific examples on how we can use the power of Compute tables. Running as illustrated below on my local setup, I was able to process your requirement of 30k reservations in 29m54s with 2 different plane model types, 7 airports, 10k possible passengers, 550 available flights. Minimum 75% seating capacity was not verified only because I didn't see you attempt this yet, though if you see how I am assigning seats you will notice that it is almost there. :)
Disclaimer: I should note that the below design is still a large oversimplification of the actual real-world challenge to orchestrate proper travel reservations. Considerable assumptions were taken mainly for the benefit of education as opposed to submitting a full, drop-in solution. As such, I have explicitly chosen to avoid using longblob for the below solution so that it is easier to follow along. In reality, a proper solution would likely include more advanced topics for further performance gains e.g. longblob, _update, etc.
That said, let's begin by considering the following:
import datajoint as dj # conda install -c conda-forge datajoint or pip install datajoint
import random
from faker import Faker # pip install Faker
faker = Faker()
Faker.seed(0) # Pin down randomizer between runs
schema = dj.Schema('commercial_airtravel') # instantiate a workable database
#schema
class Plane(dj.Lookup):
definition = """
# Defines manufacturable plane model types
plane_type : varchar(25) # Name of plane model
---
plane_rows : int # Number of rows in plane model i.e. range(1, plane_rows + 1)
plane_columns : int # Number of columns in plane model; to extract letter we will need these indices
"""
contents = [('B_Airbus', 37, 4), ('F_Airbus', 40, 5)] # Since new entries to this table should happen infrequently, this is a good candidate for a Lookup table
#schema
class Airport(dj.Lookup):
definition = """
# Defines airport locations that can serve as origin or destination
airport_code : int # Airport's unique identifier
---
airport_city : varchar(25) # Airport's city
"""
contents = [(i, faker.city()) for i in range(1, 8)] # Also a good candidate for Lookup table
#schema
class Passenger(dj.Manual):
definition = """
# Defines users who have registered accounts with airline i.e. passenger
passenger_id : serial # Passenger's unique identifier; serial simply means an auto-incremented, unsigned bigint
---
full_name : varchar(40) # Passenger's full name
ssn : varchar(20) # Passenger's Social Security Number
"""
Passenger.insert((dict(full_name=faker.name(),
ssn = faker.ssn()) for _ in range(10000))) # Insert a random set of passengers
#schema
class Flight(dj.Manual):
definition = """
# Defines specific planes assigned to a route
flight_id : serial # Flight's unique identifier
---
-> Plane # Flight's plane model specs; this will simply create a relation to Plane table but not have the constraint of uniqueness
flight_economy_price : decimal(6,2) # Flight's fare price
flight_departure : datetime # Flight's departure time
flight_arrival : datetime # Flight's arrival time
-> Airport.proj(flight_origin_code='airport_code') # Flight's origin; by using proj in this way we may rename the relation in this table
-> Airport.proj(flight_dest_code='airport_code') # Flight's destination
"""
plane_types = Plane().fetch('plane_type') # Fetch available plane model types
Flight.insert((dict(plane_type = random.choice(plane_types),
flight_economy_price = round(random.randint(100, 1000), 2),
flight_departure = faker.date_time_this_month(),
flight_arrival = faker.date_time_this_month(),
flight_origin_code = random.randint(1, 7),
flight_dest_code = random.randint(1, 7))
for _ in range(550))) # Insert a random set of flights; for simplicity we are not verifying that flight_departure < flight_arrival
#schema
class BookingRequest(dj.Manual):
definition = """
# Defines one-way booking requests initiated by passengers
booking_id : serial # Booking Request's unique identifier
---
-> Passenger # Passenger who made request
-> Airport.proj(flight_origin_code='airport_code') # Booking Request's desired origin
-> Airport.proj(flight_dest_code='airport_code') # Booking Request's desired destination
"""
BookingRequest.insert((dict(passenger_id = random.randint(1, 10000),
flight_origin_code = random.randint(1, 7),
flight_dest_code = random.randint(1, 7))
for i in range(30000))) # Insert a random set of booking requests
#schema
class Reservation(dj.Computed):
definition = """
# Defines booked reservations
-> BookingRequest # Association to booking request
---
flight_id : int # Flight's unique identifier
reservation_seat : varchar(25) # Reservation's assigned seat
"""
def make(self, key):
# Determine booking request's details
full_name, flight_origin_code, flight_dest_code = (BookingRequest * Passenger & key).fetch1('full_name',
'flight_origin_code',
'flight_dest_code')
# Determine possible flights to satisfy booking
possible_flights = (Flight * Plane *
Airport.proj(flight_dest_city='airport_city',
flight_dest_code='airport_code') &
dict(flight_origin_code=flight_origin_code,
flight_dest_code=flight_dest_code)).fetch('flight_id',
'plane_rows',
'plane_columns',
'flight_economy_price',
'flight_dest_city',
as_dict=True)
# Iterate until we find a vacant flight and extract details
for flight_meta in possible_flights:
# Determine seat capacity
all_seats = set((f'{r}{l}' for rows, letters in zip(*[[[n if i==0 else chr(n + 64)
for n in range(1, el + 1)]]
for i, el in enumerate((flight_meta['plane_rows'],
flight_meta['plane_columns']))])
for r in rows
for l in letters))
# Determine unavailable seats
taken_seats = set((Reservation & dict(flight_id=flight_meta['flight_id'])).fetch('reservation_seat'))
try:
# Randomly choose one of the available seats
reserved_seat = random.choice(list(all_seats - taken_seats))
# You may uncomment the below line if you wish to print the success message per processed record
# print(f'Success. Reserving seat {reserved_seat} at ticket_price {flight_meta["flight_economy_price"]} for {full_name}.')
# Insert new reservation
self.insert1(dict(key, flight_id=flight_meta['flight_id'], reservation_seat=reserved_seat))
return
except IndexError:
pass
raise IndexError(f'Sorry, no seats available departing to {flight_meta["flight_dest_city"]}')
Reservation.populate(display_progress=True) # This is how we process new booking requests to assign a reservation; you may invoke this as often as necessary
Syntax and Convention Nits
Lastly, just some minor feedback in your provided code. Regarding table definitions, you should only use --- once in the definition to identify a clear distinction between primary key attributes and secondary attributes (See your Flight table). Unexpectedly, this did not throw an error in your case but should have done so. I will file an issue since this appears to be a bug.
Though transaction is exposed on dj.conn(), it is quite rare to need to invoke it directly. DataJoint provides the benefit of handling this internally to reduce the management overhead of this from the user. However, the option is still available should it be needed for corner-cases. For your case, I would avoid invoking it directly and reccomend using Computed (or also Imported) tables instead.

Instructables open source code: Python IndexError: list index out of range

I've seen this error on several other questions but couldn't find the answer.
{I'm a complete stranger to Python, but I'm following the instructions from a site and I keep getting this error once I try to run the script:
IndexError: list index out of range
Here's the script:
##//txt to stl conversion - 3d printable record
##//by Amanda Ghassaei
##//Dec 2012
##//http://www.instructables.com/id/3D-Printed-Record/
##
##/*
## * This program is free software; you can redistribute it and/or modify
## * it under the terms of the GNU General Public License as published by
## * the Free Software Foundation; either version 3 of the License, or
## * (at your option) any later version.
##*/
import wave
import math
import struct
bitDepth = 8#target bitDepth
frate = 44100#target frame rate
fileName = "bill.wav"#file to be imported (change this)
#read file and get data
w = wave.open(fileName, 'r')
numframes = w.getnframes()
frame = w.readframes(numframes)#w.getnframes()
frameInt = map(ord, list(frame))#turn into array
#separate left and right channels and merge bytes
frameOneChannel = [0]*numframes#initialize list of one channel of wave
for i in range(numframes):
frameOneChannel[i] = frameInt[4*i+1]*2**8+frameInt[4*i]#separate channels and store one channel in new list
if frameOneChannel[i] > 2**15:
frameOneChannel[i] = (frameOneChannel[i]-2**16)
elif frameOneChannel[i] == 2**15:
frameOneChannel[i] = 0
else:
frameOneChannel[i] = frameOneChannel[i]
#convert to string
audioStr = ''
for i in range(numframes):
audioStr += str(frameOneChannel[i])
audioStr += ","#separate elements with comma
fileName = fileName[:-3]#remove .wav extension
text_file = open(fileName+"txt", "w")
text_file.write("%s"%audioStr)
text_file.close()
Thanks a lot,
Leart
Leart - check these it may help:
Is your input file in correct format? As I see it, you need to produce that file before hand before you can use it in this program... Post that file in here as well.
Check if your bitrate and frame rates are correct
Just for debugging purposes (if the code is correct, this may not produce correct results, but good for testing). You are accessing frameInt[4*i+1], with index i multiplied by 4 then adding 1 (going beyond the frameInt index eventually).
Add an 'if' to check size before accessing the array element in frameInt:
if len(frameInt)>=(4*i+1):
Add that statement right after the first occurence of "for i in range(numframes):" and just before "frameOneChannel[i] = frameInt[4*i+1]*2**8+frameInt[4*i]#separate channels and store one channel in new list"
*watch tab spaces

Force scapy to re-dissect a layer after changes

I am working with a fork of scapy (a Python packet manipulation tool) called scapy-com. This implements 802.15.4 and Zigbee parsing/manipulation, amongst other protocols.
A quirk of the Zigbee protcol is found in the network level security header. Initially, the security level (which defines the encryption and length of message integrity code) is set correctly, but is then set to 0 (no encryption) before it is sent. From the spec:
The security level sub-field of the security control field shall be
over-written by the 3-bit all-zero string '000'
The spec can be found here. The relevant section is "4.3.1.1 Security Processing of Outgoing Frames".
This means that packet captures indicate that no encryption or message integrity code is in use. The security level must be communicated out-of-band.
scapy-com doesn't deal with this. It naively parses the security level and sets the length of the MIC to 0. The code that does this is:
def util_mic_len(pkt):
''' Calculate the length of the attribute value field '''
# NWK security level 0 seems to implicitly be same as 5
if ( pkt.nwk_seclevel == 0 ): # no encryption, no mic
return 0
elif ( pkt.nwk_seclevel == 1 ): # MIC-32
return 4
elif ( pkt.nwk_seclevel == 2 ): # MIC-64
return 8
elif ( pkt.nwk_seclevel == 3 ): # MIC-128
return 16
elif ( pkt.nwk_seclevel == 4 ): # ENC
return 0
elif ( pkt.nwk_seclevel == 5 ): # ENC-MIC-32
return 4
elif ( pkt.nwk_seclevel == 6 ): # ENC-MIC-64
return 8
elif ( pkt.nwk_seclevel == 7 ): # ENC-MIC-128
return 16
else:
return 0
The project that uses scapy-com attempts to deal with this by setting the security level to 5:
#TODO: Investigate and issue a different fix:
# https://code.google.com/p/killerbee/issues/detail?id=30
# This function destroys the packet, therefore work on a copy - #cutaway
pkt = pkt.copy() #this is hack to fix the below line
pkt.nwk_seclevel=5 #the issue appears to be when this is set
mic = pkt.mic
However, this doesn't work - the message integrity code has already been set. I have worked around this by simply altering the util_mic_len function to set the mic length correctly.
The question is, how should the Zigbee parser be changed so that altering the nwk_seclevel after the initial dissection causes the mic length to be updated?
I can see two solutions:
Change the scapy-com code so that changing nwk_seclevel automatically changes the mic length.
Re-dissect the packets from outside scapy-com as they are changed.
The issue with 1 is I have no idea about how to go about it.
The issue with 2 is that I have some idea but can't get it to work - I can't work out how to call dissect on a packet after it has been loaded. Calling pkt.dissect(pkt) seems to not work and looks odd.
What is the best or recommended solution here?
Fixing scapy sounds right solution.
scapy-com is quite old. Zigbee specific code in scapy-com is 1244 lines of code, which in large part are enumerations and field lists. So, it should not be too hard to migrate it to scapy-python3. If you would assist in migrating it to scapy-python3 http://github.com/phaethon/scapy , I could help with fixing the issue.
The project you are referring to is KillerBee and I had this exact problem with decryption. I simply "fixed" the code thusly:
from struct import pack
f = pkt.getlayer(ZigbeeSecurityHeader).fields
pkt.nwk_seclevel = 5
nwk_mic = pkt.mic
nwk_encrypted = f['data'][:-6]
ext_source = f['ext_source']
nwk_sec_ctrl_byte = str(pkt.getlayer(ZigbeeSecurityHeader))[0]
nwk_nonce = struct.pack('Q',ext_source) + struct.pack('I',f['fc']) + nwk_sec_ctrl_byte
nwk_crop_size = 4 + 2 + len(pkt.getlayer(ZigbeeSecurityHeader).fields['data']) # The length of the encrypted data, mic and FCS
# the Security Control Field flags have to be adjusted before this is calculated, so we store their original values so we can reset them later
zigbeeData = pkt.getlayer(ZigbeeNWK).do_build()
zigbeeData = zigbeeData[:-nwk_crop_size]
(nwk_payload, nwk_micCheck) = zigbee_crypt.decrypt_ccm(nkey, nwk_nonce, nwk_mic, nwk_encrypted, zigbeeData)

TMC222 Stepper Controller, motor busy function

I am currently working on a robot that has to traverse a maze.
For the robot I am using a TMC222 Stepper controller and the software is coded in Python.
I am in need of a function which can tell me when the motors are busy so that the robot will seize all other activity while the motors are running.
My idea is to check the current position on the motors and compare it to the target position, but i haven't gotten it to work yet.
My current attempt:
def isRunning(self):
print("IS RUNNING TEST")
fullstatus=self.getFullStatus2()
#print("FULL STATUS: " + str(fullstatus[0]) + " 2 " + str(fullstatus[1]))
actLeft=fullstatus[0][1]<<8 | fullstatus[0][2]<<0
actRight=fullstatus[1][1]<<8 | fullstatus[1][2]<<0
tarLeft=fullstatus[0][3]<<8 | fullstatus[0][4]<<0
tarRight=fullstatus[1][3]<<8 | fullstatus[1][4]<<0
value = (actLeft==tarLeft) and (actRight==tarRight)
value = not value
# print("isbusy="+str(value))
print 'ActPos = ' + str(actLeft)
print 'TarPos = ' + str(tarLeft)
return value
It would be helpful to see your getFullStatus2() code as well, since it's unclear to me how you're getting a multidimensional output.
In general, you can form a 16-bit "word" from two 8-bit bytes just as you have it:
Word = HB << 8 | LB << 0
Where HB and LB are the high (bits 15-8) and low (bits 7-0) bytes.
That being said, there are multiple ways to detect motor stall. The ideal way would be an external pressure switch that closed when it hit a wall. Another would be to monitor the motor's current, when the motor faces resistance (when accelerating or in stall), the current will rise.
Since it looks like neither of these are possible, I'd use still a different approach, monitoring the motor's position (presumably from some sort of encoder) over time.
Lets say you have a function get_position() that returns an unsigned 16-bit integer. You should be able to write something like:
class MotorPosition(object):
def __init__(self):
readings = []
def poll(self):
p = get_position()
self.readings.append(readings)
# If the list is now too long, remove the oldest entries
if len(self.readings) > 5:
self.readings.pop(0)
def get_deltas():
deltas = []
for x,y in zip(self.readings[1:4], self.readings[0:3]):
d = x - y
# Wraparound detection
if (d < -THRESHOLD): d += 65536
elif(d > THRESHOLD): d -= 65536
deltas.append(d)
return deltas
def get_average_delta():
deltas = self.get_deltas()
return sum(deltas) / float(len(deltas))
Note that this assumes you're polling the encoder fast enough and with consistent frequency.
You could then monitor the average delta (from get_average_delta()) and if it drops below some value, you consider the motor stalled.
Assumptions:
This is the datasheet for the controller you're using
Your I²C code is working correctly

Is python uuid1 sequential as timestamps?

Python docs states that uuid1 uses current time to form the uuid value. But I could not find a reference that ensures UUID1 is sequential.
>>> import uuid
>>> u1 = uuid.uuid1()
>>> u2 = uuid.uuid1()
>>> u1 < u2
True
>>>
But not always:
>>> def test(n):
... old = uuid.uuid1()
... print old
... for x in range(n):
... new = uuid.uuid1()
... if old >= new:
... print "OOops"
... break
... old = new
... print new
>>> test(1000000)
fd4ae687-3619-11e1-8801-c82a1450e52f
OOops
00000035-361a-11e1-bc9f-c82a1450e52f
UUIDs Not Sequential
No, standard UUIDs are not meant to be sequential.
Apparently some attempts were made with GUIDs (Microsoft's twist on UUIDs) to make them sequential to help with performance in certain database scenarios. But being sequential is not the intent of UUIDs.
http://en.wikipedia.org/wiki/Globally_unique_identifier
MAC Is Last, Not First
No, in standard UUIDs, the MAC address is not the first component. The MAC address is the last component in a Version 1 UUID.
http://en.wikipedia.org/wiki/Universally_unique_identifier
Do Not Assume Which Type Of UUID
The various versions of UUIDs are meant to be compatible with each other. So it may be unreasonable to expect that you always have Version 1 UUIDs. Other programmers may use other versions.
Specification
Read the UUID spec, RFC 4122, by the IETF. Only a dozen pages long.
From the python UUID docs:
Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.
From this, I infer that the MAC address is first, then a (possibly random) sequence number, then the current time. So I would not expect these to be guaranteed to be monotonically increasing, even for UUIDs generated by the same machine/process.
I stumbled upon a probable answer in Cassandra/Python from http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/
Lexicographic TimeUUID ordering
Cassandra provides, among all the primitive types, support for UUID values of type 1 (time and server based) and type 4 (random).
The primary use of UUID (Unique Universal IDentifier) is to obtain a really unique identifier in a potentially distributed environment.
Cassandra does support version 1 UUID. It gives you an unique identifier by combining the computer’s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar.
As you can see the precision is only 100 nanoseconds, but fortunately it is mixed with a clock sequence to add randomness. Furthermore the MAC address is also used to compute the UUID so it’s very unlikely that you face collision on one cluster of machine, unless you need to process a really really huge volume of data (don’t forget, not everyone is Twitter or Facebook).
One of the most relevant use case for UUID, and espcecially TimeUUID, is to use it as column key. Since Cassandra column keys are sorted, we can take advantage of this feature to have a natural ordering for our column families.
The problem with the default com.eaio.uuid.UUID provided by the Hector client is that it’s not easy to work with. As an ID you may need to bring this value from the server up to the view layer, and that’s the gotcha.
Basically, com.eaio.uuid.UUID overrides the toString() to gives a String representation of the UUID. However this String formatting cannot be sorted lexicographically…
Below are some TimeUUID generated consecutively:
8e4cab00-c481-11e1-983b-20cf309ff6dc at some t1
2b6e3160-c482-11e1-addf-20cf309ff6dc at some t2 with t2 > t1
“2b6e3160-c482-11e1-addf-20cf309ff6dc”.compareTo(“8e4cab00-c481-11e1-983b-20cf309ff6dc”) gives -6 meaning that “2b6e3160-c482-11e1-addf-20cf309ff6dc” is less/before “8e4cab00-c481-11e1-983b-20cf309ff6dc” which is incorrect.
The current textual display of TimeUUID is split as follow:
time_low – time_mid – time_high_and_version – variant_and_sequence – node
If we re-order it starting with time_high_and_version, we can then sort it lexicographically:
time_high_and_version – time_mid – time_low – variant_and_sequence – node
The utility class is given below:
public static String reorderTimeUUId(String originalTimeUUID)
{
StringTokenizer tokens = new StringTokenizer(originalTimeUUID, "-");
if (tokens.countTokens() == 5)
{
String time_low = tokens.nextToken();
String time_mid = tokens.nextToken();
String time_high_and_version = tokens.nextToken();
String variant_and_sequence = tokens.nextToken();
String node = tokens.nextToken();
return time_high_and_version + '-' + time_mid + '-' + time_low + '-' + variant_and_sequence + '-' + node;
}
return originalTimeUUID;
}
The TimeUUIDs become:
11e1-c481-8e4cab00-983b-20cf309ff6dc
11e1-c482-2b6e3160-addf-20cf309ff6dc
Now we get:
"11e1-c481-8e4cab00-983b-20cf309ff6dc".compareTo("11e1-c482-2b6e3160-addf-20cf309ff6dc") = -1
Argumentless use of uuid.uuid1() gives non-sequential results (see answer by #basil-bourque), but it can be easily made sequential if you set clock_seq or node arguments (because in this case uuid1 uses python implementation that guarantees to have unique and sequential timestamp part of the UUID in current process):
import time
from uuid import uuid1, getnode
from random import getrandbits
_my_clock_seq = getrandbits(14)
_my_node = getnode()
def sequential_uuid(node=None):
return uuid1(node=node, clock_seq=_my_clock_seq)
def alt_sequential_uuid(clock_seq=None):
return uuid1(node=_my_node, clock_seq=clock_seq)
if __name__ == '__main__':
from itertools import count
old_n = uuid1() # "Native"
old_s = sequential_uuid() # Sequential
native_conflict_index = None
t_0 = time.time()
for x in count():
new_n = uuid1()
new_s = sequential_uuid()
if old_n > new_n and not native_conflict_index:
native_conflict_index = x
if old_s >= new_s:
print("OOops: non-sequential results for `sequential_uuid()`")
break
if (x >= 10*0x3fff and time.time() - t_0 > 30) or (native_conflict_index and x > 2*native_conflict_index):
print('No issues for `sequential_uuid()`')
break
old_n = new_n
old_s = new_s
print(f'Conflicts for `uuid.uuid1()`: {bool(native_conflict_index)}')
print(f"Tries: {x}")
Multiple processes issues
BUT if you are running some parallel processes on the same machine, then:
node which defaults to uuid.get_node() will be the same for all the processes;
clock_seq has small chance to be the same for some processes (chance of 1/16384)
That might lead to conflicts! That is general concern for using
uuid.uuid1 in parallel processes on the same machine unless you have access to SafeUUID from Python3.7.
If you make sure to also set node to unique value for each parallel process that runs this code, then conflicts should not happen.
Even if you are using SafeUUID, and set unique node, it's still possible to have non-sequential ids if they are generated in different processes.
If some lock-related overhead is acceptable, then you can store clock_seq in some external atomic storage (for example in "locked" file) and increment it with each call: this allows to have same value for node on all parallel processes and also will make id-s sequential. For cases when all parallel processes are subprocesses created using multiprocessing: clock_seq can be "shared" using multiprocessing.Value

Categories

Resources