I have a database for articles and may want to generate a unique random integer for each articles so that they can be visited through URL like https://blablabla.com/articles/8373734 etc.
I could achieve that in python backend, but how do we achieve this in MySQL sentences?
For example, a new article was done, and inserted into database:
INSERT into article_table (title, date, url_id) VALUES ('asdf', '11/11/1111', 8373734)
the url_id here is the unique random integer (1000~10000000) that automatically generated.
I believe The primary key ID and auto-increasment are good way to solve this. But my question is:
In practical scenario, do they (companies) literally use primary ID or auto-increasment? This may expose how piece of data you (ever) have in database. Take this https://www.zhihu.com/question/41490222 for example, I tried hundreds of number around 41490222, all returns 404 not found. it seems that the number are recorded very sparsely, not very possible achieved by auto-increasement.
Are there any efficient way to generate such random number without checking duplication for every loop?
Use mysql function RAND()
-------------------------
select FLOOR(RAND() * 999999)
You can use UUID(), or if it has to be numeric UUID_SHORT() for that.
albeit my sql skills are a bit rusty, I think you might want to create a function using the RAND function.
CREATE PROCEDURE GetRandomValue()
BEGIN
DECLARE newUrlId INT DEFAULT 0;
WHILE (
newUrlId = 0
OR IF EXISTS(SELECT 1 FROM yourTable WHERE url_id = newUrlId)
)
DO
SET newUrlId = SELECT FLOOR(RAND() * 999999)
END WHILE
RETURN newUrlId
END
Then again, why creating such a fuss while you could use other ways to create "bigger random numbers"
for example:
function createBiggerNumber(id) {
return (id * constants.MySecretMultiplyValue) + constants.MySecretAddedValue;
}
function extractIdFromBiggerNumber(number) {
return (number - constants.MySecretAddedValue) / constants.MySecretMultiplyValue
}
the logic is combine with their primary key | id , so we dont need re check if the data is exist or not.
DELIMITER $$
DROP TRIGGER IF EXISTS `auto_number`$$
CREATE TRIGGER `auto_number` BEFORE INSERT on users
FOR EACH ROW BEGIN
SET new.auto_number = CONCAT(new.id, LEFT(UUID(), 8));
END$$
DELIMITER ;
https://gist.github.com/yogithesymbian/698b27138a5ba89d2a32e3fc7ddd3cfb
Related
I'm wondering why my MySQL COUNT(*) query always results in ->num_rows to be equal 1.
$result = $db->query("SELECT COUNT( * ) FROM u11_users");
print $result->num_rows; // prints 1
Whereas fetching "real data" from the database works fine.
$result = $db->query("SELECT * FROM u11_users");
print $result->num_rows; // prints the correct number of elements in the table
What could be the reason for this?
Because Count(*) returns just one line with the number of rows.
Example:
Using Count(*) the result's something like the following.
array('COUNT(*)' => 20);
echo $result['COUNT(*)']; // 20
Reference
It should return one row*. To get the count you need to:
$result = $db->query("SELECT COUNT(*) AS C FROM u11_users");
$row = $result->fetch_assoc();
print $row["C"];
* since you are using an aggregate function and not using GROUP BY
that's why COUNT exists, it always returns one row with number of selected rows
http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
Count() is an aggregate function which means it returns just one row that contains the actual answer. You'd see the same type of thing if you used a function like max(id); if the maximum value in a column was 142, then you wouldn't expect to see 142 records but rather a single record with the value 142. Likewise, if the number of rows is 400 and you ask for the count(*), you will not get 400 rows but rather a single row with the answer: 400.
So, to get the count, you'd run your first query, and just access the value in the first (and only) row.
By the way, you should go with this count(*) approach rather than querying for all the data and taking $result->num_rows; because querying for all rows will take far longer since you're pulling back a bunch of data you do not need.
I have approximately 500 device objects in an sqlite db, with a name field such as:
Device-0
Device-1
Device-2
Device-3
...
...
Device-500
When listing these with django, I want it to list based on the number after the semicolon in the name, as shown above.
I tried:
queryset = Device.objects.all().order_by('name')
Also from this question:
queryset = Device.objects.annotate(int_sort=Cast("name", IntegerField())).order_by("int_sort", "name")
Both of these produce this result:
Device-0
Device-1
Device-10
Device-100
Device-101
...
Any help would be greatly appreciated.
You're looking for a "natural sort" ("dictionary sort") order.
That's not built-in to SQLite (nor any other database I know of).
If all of your rows do follow a XYZ-123 format, you could
add an .extra() where= column with an expression that splits the column by a dash, then casts the second part to a number
and then order_by that extra column.
Example
Here's an example you can run in your SQLite shell:
sqlite> create table device (name text);
sqlite> insert into device (name) values ('Device-1'),('Device-2'),('Device-3'),('Device-4'),('Device-5'),('Device-6'),('Device-7'),('Device-8'),('Device-9'),('Device-10'),('Device-11'),('Device-12'),('Device-13'),('Device-14'),('Device-15'),('Device-16'),('Device-17'),('Device-18'),('Device-19'),('Device-20'),('Device-21'),('Device-22'),('Device-23'),('Device-24'),('Device-25'),('Device-26'),('Device-27'),('Device-28'),('Device-29'),('Device-30'),('Device-31'),('Device-32'),('Device-33'),('Device-34'),('Device-35'),('Device-36'),('Device-37'),('Device-38'),('Device-39');
sqlite> select * from device order by name limit 10;
Device-1
Device-10
Device-11
Device-12
Device-13
Device-14
Device-15
Device-16
Device-17
Device-18
sqlite> select *, cast(substr(name,instr(name, '-')+1) as number) number from device order by number limit 10;
Device-1|1
Device-2|2
Device-3|3
Device-4|4
Device-5|5
Device-6|6
Device-7|7
Device-8|8
Device-9|9
Device-10|10
With this example, you should (but I didn't verify since I don't have a suitable Django app on my hands) be able to do
Device.objects.all().extra(
select={'device_number': "cast(substr(name,instr(name, '-')+1) as number)"},
order_by='device_number',
)
I have the following queries:
seq = select([tab_setup.columns.ID]).order_by(tab_setup.columns.ID).limit(1)
sel = select([tab_Global.columns.ID_UNIQUE.label('DL_ID'), tab_Global.columns.CV_CNV.label('DL_Conv')]) \
.where(tab_Global.columns.CV_CNV.isnot(None))
stmt = tab_setup.insert().from_select(['DL_ID', 'DL_Conv',next_value(Sequence(seq))] , sel)
As far as I have understood, the problem is related to autofill the ID (autoincrement) field within the table "tab_setup".
Which is the correct way to pass the values?
The ID field is a normal auto-increment field of 1 per row.
Using the Sequence function only, it is raising an error that suggest to use "next_value"
Thanks
The solution I found was to add in the select query:
func.row_number().over(order_by=tab_Global.columns.ID_UNIQUE)).label('ID')
This will generate a sequential number for any row; then if the starting number is not 1 I create the variable start_from = 10 and complete the above with: (start_from + func.row_number()...)
I have a table with the following model:
CREATE TABLE IF NOT EXISTS {} (
user_id bigint ,
pseudo text,
importance float,
is_friend_following bigint,
is_friend boolean,
is_following boolean,
PRIMARY KEY ((user_id), is_friend_following)
);
I also have a table containing my seeds. Those (20) users are the starting point of my graph. So I select their ID and search in the table above to get their Followers and friends, and from there I build my graph (networkX).
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
friend_ids = []
follower_ids = []
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
friend_ids.append(row.friend_follower_id)
if row.is_follower:
follower_ids.append(row.friend_follower_id)
if friend_ids:
for friend_id in friend_ids:
obj.graph.add_edge(seed, friend_id)
if follower_ids:
for follower_id in follower_ids:
obj.graph.add_edge(follower_id, seed)
return obj
The problem is that the time it takes to build the graph is too long and I would like to optimize it.
I've got approximately 5 millions rows in my table 'network_table'.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
I suspect the real issue may not be the queries but rather the processing time.
I'm wondering if it would be faster for me if instead of doing a query with a where clauses to just do a single query on whole table? Will it fit in memory? Is that a good Idea? Are there better way?
There should not be any problem with doing a single query on the whole table if you enable paging (https://datastax.github.io/python-driver/query_paging.html - using fetch_size). Cassandra will return up to the fetch_size and will fetch additional results as you read them from the result_set.
Please note that if you have many rows in the table that are non seed related then a full scan may be slower as you will receive rows that will not include a "seed"
Disclaimer - I am part of the team building ScyllaDB - a Cassandra compatible database.
ScyllaDB have published lately a blog on how to efficiently do a full scan in parallel http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/ which applies to Cassandra as well - if a full scan is relevant and you can build the graph in parallel than this may help you.
It seems like you can get rid of the last 2 if statements, since you're going through data that you already have looped through once:
def build_seed_graph(cls, name):
obj = cls()
obj.name = name
query = "SELECT twitter_id FROM {0};"
seeds = obj.session.execute(query.format(obj.seed_data_table))
obj.graph.add_nodes_from(obj.seeds)
for seed in seeds:
query = "SELECT friend_follower_id, is_friend, is_follower FROM {0} WHERE user_id={1}"
statement = SimpleStatement(query.format(obj.network_table, seed), fetch_size=1000)
for row in obj.session.execute(statement):
if row.friend_follower_id in obj.seeds:
if row.is_friend:
obj.graph.add_edge(seed, row.friend_follower_id)
elif row.is_follower:
obj.graph.add_edge(row.friend_follower_id, seed)
return obj
This also gets rid of many append operations on lists that you're not using, and should speed up this function.
How can I filter SqlAlchemy column by number of resulting Characters,
Here is a kind of implementation I am looking at,
query = query.filter(Take_Last_7_Characters(column_1) == '0321334')
Where "Take_Last_7_Characters" fetches the last 7 characters from the resulting value of column_1
So How can I implement Take_Last_7_Characters(column_1) ??
use sqlalchemy.sql.expression.func , to generate SQL functions.
check for more info
Please use the func to generate SQL functions as directed by #tuxuday.
Note that the code is RDBMS-dependant. The code below runs for SQLite, which offers SUBSTR and LENGTH functions. Your actual database might have different names for those (LEN, SUSBSTRING, LEFT, RIGHT, etc).
qry = session.query(Test)
qry = qry.filter(func.SUBST(Test.column_1, func.LENGTH(Test.column_1) - 6, 7) == '0321334')