How to generate a unique random number when insert in MySQL? - python

I have a database for articles and may want to generate a unique random integer for each articles so that they can be visited through URL like https://blablabla.com/articles/8373734 etc.
I could achieve that in python backend, but how do we achieve this in MySQL sentences?
For example, a new article was done, and inserted into database:
INSERT into article_table (title, date, url_id) VALUES ('asdf', '11/11/1111', 8373734)
the url_id here is the unique random integer (1000~10000000) that automatically generated.
I believe The primary key ID and auto-increasment are good way to solve this. But my question is:
In practical scenario, do they (companies) literally use primary ID or auto-increasment? This may expose how piece of data you (ever) have in database. Take this https://www.zhihu.com/question/41490222 for example, I tried hundreds of number around 41490222, all returns 404 not found. it seems that the number are recorded very sparsely, not very possible achieved by auto-increasement.
Are there any efficient way to generate such random number without checking duplication for every loop?

Use mysql function RAND()
-------------------------
select FLOOR(RAND() * 999999)

You can use UUID(), or if it has to be numeric UUID_SHORT() for that.

albeit my sql skills are a bit rusty, I think you might want to create a function using the RAND function.
CREATE PROCEDURE GetRandomValue()
BEGIN
DECLARE newUrlId INT DEFAULT 0;
WHILE (
newUrlId = 0
OR IF EXISTS(SELECT 1 FROM yourTable WHERE url_id = newUrlId)
)
DO
SET newUrlId = SELECT FLOOR(RAND() * 999999)
END WHILE
RETURN newUrlId
END
Then again, why creating such a fuss while you could use other ways to create "bigger random numbers"
for example:
function createBiggerNumber(id) {
return (id * constants.MySecretMultiplyValue) + constants.MySecretAddedValue;
}
function extractIdFromBiggerNumber(number) {
return (number - constants.MySecretAddedValue) / constants.MySecretMultiplyValue
}

the logic is combine with their primary key | id , so we dont need re check if the data is exist or not.
DELIMITER $$
DROP TRIGGER IF EXISTS `auto_number`$$
CREATE TRIGGER `auto_number` BEFORE INSERT on users
FOR EACH ROW BEGIN
SET new.auto_number = CONCAT(new.id, LEFT(UUID(), 8));
END$$
DELIMITER ;
https://gist.github.com/yogithesymbian/698b27138a5ba89d2a32e3fc7ddd3cfb

Related

pyhon: repeatedly mysql query on_message in websocket not getting latest results [duplicate]

I'm wondering why my MySQL COUNT(*) query always results in ->num_rows to be equal 1.
$result = $db->query("SELECT COUNT( * ) FROM u11_users");
print $result->num_rows; // prints 1
Whereas fetching "real data" from the database works fine.
$result = $db->query("SELECT * FROM u11_users");
print $result->num_rows; // prints the correct number of elements in the table
What could be the reason for this?
Because Count(*) returns just one line with the number of rows.
Example:
Using Count(*) the result's something like the following.
array('COUNT(*)' => 20);
echo $result['COUNT(*)']; // 20
Reference
It should return one row*. To get the count you need to:
$result = $db->query("SELECT COUNT(*) AS C FROM u11_users");
$row = $result->fetch_assoc();
print $row["C"];
* since you are using an aggregate function and not using GROUP BY
that's why COUNT exists, it always returns one row with number of selected rows
http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
Count() is an aggregate function which means it returns just one row that contains the actual answer. You'd see the same type of thing if you used a function like max(id); if the maximum value in a column was 142, then you wouldn't expect to see 142 records but rather a single record with the value 142. Likewise, if the number of rows is 400 and you ask for the count(*), you will not get 400 rows but rather a single row with the answer: 400.
So, to get the count, you'd run your first query, and just access the value in the first (and only) row.
By the way, you should go with this count(*) approach rather than querying for all the data and taking $result->num_rows; because querying for all rows will take far longer since you're pulling back a bunch of data you do not need.

Order by name with number Django

I have approximately 500 device objects in an sqlite db, with a name field such as:
Device-0
Device-1
Device-2
Device-3
...
...
Device-500
When listing these with django, I want it to list based on the number after the semicolon in the name, as shown above.
I tried:
queryset = Device.objects.all().order_by('name')
Also from this question:
queryset = Device.objects.annotate(int_sort=Cast("name", IntegerField())).order_by("int_sort", "name")
Both of these produce this result:
Device-0
Device-1
Device-10
Device-100
Device-101
...
Any help would be greatly appreciated.
You're looking for a "natural sort" ("dictionary sort") order.
That's not built-in to SQLite (nor any other database I know of).
If all of your rows do follow a XYZ-123 format, you could
add an .extra() where= column with an expression that splits the column by a dash, then casts the second part to a number
and then order_by that extra column.
Example
Here's an example you can run in your SQLite shell:
sqlite> create table device (name text);
sqlite> insert into device (name) values ('Device-1'),('Device-2'),('Device-3'),('Device-4'),('Device-5'),('Device-6'),('Device-7'),('Device-8'),('Device-9'),('Device-10'),('Device-11'),('Device-12'),('Device-13'),('Device-14'),('Device-15'),('Device-16'),('Device-17'),('Device-18'),('Device-19'),('Device-20'),('Device-21'),('Device-22'),('Device-23'),('Device-24'),('Device-25'),('Device-26'),('Device-27'),('Device-28'),('Device-29'),('Device-30'),('Device-31'),('Device-32'),('Device-33'),('Device-34'),('Device-35'),('Device-36'),('Device-37'),('Device-38'),('Device-39');
sqlite> select * from device order by name limit 10;
Device-1
Device-10
Device-11
Device-12
Device-13
Device-14
Device-15
Device-16
Device-17
Device-18
sqlite> select *, cast(substr(name,instr(name, '-')+1) as number) number from device order by number limit 10;
Device-1|1
Device-2|2
Device-3|3
Device-4|4
Device-5|5
Device-6|6
Device-7|7
Device-8|8
Device-9|9
Device-10|10
With this example, you should (but I didn't verify since I don't have a suitable Django app on my hands) be able to do
Device.objects.all().extra(
select={'device_number': "cast(substr(name,instr(name, '-')+1) as number)"},
order_by='device_number',
)

How to solve: Dialect 'default' does not support sequence increments in SQLAlchemy

I have the following queries:
seq = select([tab_setup.columns.ID]).order_by(tab_setup.columns.ID).limit(1)
sel = select([tab_Global.columns.ID_UNIQUE.label('DL_ID'), tab_Global.columns.CV_CNV.label('DL_Conv')]) \
.where(tab_Global.columns.CV_CNV.isnot(None))
stmt = tab_setup.insert().from_select(['DL_ID', 'DL_Conv',next_value(Sequence(seq))] , sel)
As far as I have understood, the problem is related to autofill the ID (autoincrement) field within the table "tab_setup".
Which is the correct way to pass the values?
The ID field is a normal auto-increment field of 1 per row.
Using the Sequence function only, it is raising an error that suggest to use "next_value"
Thanks
The solution I found was to add in the select query:
func.row_number().over(order_by=tab_Global.columns.ID_UNIQUE)).label('ID')
This will generate a sequential number for any row; then if the starting number is not 1 I create the variable start_from = 10 and complete the above with: (start_from + func.row_number()...)

Read optimisation cassandra using python

I have a table with the following model:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
pseudo text,
importance float,
is_friend_following bigint,
is_friend boolean,
is_following boolean,
PRIMARY KEY ((user_id), is_friend_following)
);
I also have a table containing my seeds. Those (20) users are the starting point of my graph. So I select their ID and search in the table above to get their Followers and friends, and from there I build my graph (networkX).
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
friend_ids = []
follower_ids = []
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
friend_ids.append(row.friend_follower_id)
if row.is_follower:
follower_ids.append(row.friend_follower_id)
if friend_ids:
for friend_id in friend_ids:
obj.graph.add_edge(seed, friend_id)
if follower_ids:
for follower_id in follower_ids:
obj.graph.add_edge(follower_id, seed)
return obj
The problem is that the time it takes to build the graph is too long and I would like to optimize it.
I've got approximately 5 millions rows in my table 'network_table'.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
I suspect the real issue may not be the queries but rather the processing time.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
There should not be any problem with doing a single query on the whole table if you enable paging (https://datastax.github.io/python-driver/query_paging.html - using fetch_size). Cassandra will return up to the fetch_size and will fetch additional results as you read them from the result_set.
Please note that if you have many rows in the table that are non seed related then a full scan may be slower as you will receive rows that will not include a "seed"
Disclaimer - I am part of the team building ScyllaDB - a Cassandra compatible database.
ScyllaDB have published lately a blog on how to efficiently do a full scan in parallel http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/ which applies to Cassandra as well - if a full scan is relevant and you can build the graph in parallel than this may help you.
It seems like you can get rid of the last 2 if statements, since you're going through data that you already have looped through once:
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
obj.graph.add_edge(seed, row.friend_follower_id)
elif row.is_follower:
obj.graph.add_edge(row.friend_follower_id, seed)
return obj
This also gets rid of many append operations on lists that you're not using, and should speed up this function.

filter SqlAlchemy column value by number of resulting characters

How can I filter SqlAlchemy column by number of resulting Characters,
Here is a kind of implementation I am looking at,
query = query.filter(Take_Last_7_Characters(column_1) == '0321334')
Where "Take_Last_7_Characters" fetches the last 7 characters from the resulting value of column_1
So How can I implement Take_Last_7_Characters(column_1) ??
use sqlalchemy.sql.expression.func , to generate SQL functions.
check for more info
Please use the func to generate SQL functions as directed by #tuxuday.
Note that the code is RDBMS-dependant. The code below runs for SQLite, which offers SUBSTR and LENGTH functions. Your actual database might have different names for those (LEN, SUSBSTRING, LEFT, RIGHT, etc).
qry = session.query(Test)
qry = qry.filter(func.SUBST(Test.column_1, func.LENGTH(Test.column_1) - 6, 7) == '0321334')

Categories

Resources