pysqlite, query for duplicate entries with swapped columns - python

Currently I have a pysqlite db that I am using to store a list of road conditions. The source this list is generated from however is buggy and sometimes generates duplicates. Some of these duplicates will have the start and end points swapped but everything else the same.
The method i currently have looks like this:
def getDupes(self):
'''This method is used to return a list of dupilicate entries
'''
self.__curs.execute('SELECT * FROM roadCond GROUP BY road, start, end, cond, reason, updated, county, timestmp HAVING count(*)>1')
result = self.__curs.fetchall()
def getSwaps():
'''This method is used to grab the duplicates with swapped columns
'''
self.__curs.execute('SELECT * FROM roadCond WHERE ')
extra = self.__curs.fetchall()
return extrac
result.extend(getSwaps())
return result
The the initial query works but I am suspicious of it (I think there is a better way, I just don't know) but I am not all to sure how to make the inner method work.
Thank you ahead of time. :-D

Instead of the first query, you could use
SELECT DISTINCT * FROM roadCond
which will retrieve all the records from the table, removing any duplicates.
As for the inner method, this query will return all the records which have "duplicates" with start and end swapped. Note that, for each record with "duplicates", this query will return both the "original" and the "copy".
SELECT DISTINCT * FROM roadCond WHERE EXISTS (
SELECT * FROM roadCond rc2 WHERE
roadCond.road = rc2.road AND
roadCond.end = rc2.start AND roadCond.start = rc2.end AND
roadCond.cond = rc2.cond AND
... AND
roadCond.timestamp = rc2.timestamp)
Edit: To detect and remove "duplicates" with start and end swapped, you could make sure that your data always contains these values laid out in the same order:
UPDATE roadCond SET start = end, end = start WHERE end < start;
But this approach only works if it doesn't matter which is which.

Related

pyhon: repeatedly mysql query on_message in websocket not getting latest results [duplicate]

I'm wondering why my MySQL COUNT(*) query always results in ->num_rows to be equal 1.
$result = $db->query("SELECT COUNT( * ) FROM u11_users");
print $result->num_rows; // prints 1
Whereas fetching "real data" from the database works fine.
$result = $db->query("SELECT * FROM u11_users");
print $result->num_rows; // prints the correct number of elements in the table
What could be the reason for this?
Because Count(*) returns just one line with the number of rows.
Example:
Using Count(*) the result's something like the following.
array('COUNT(*)' => 20);
echo $result['COUNT(*)']; // 20
Reference
It should return one row*. To get the count you need to:
$result = $db->query("SELECT COUNT(*) AS C FROM u11_users");
$row = $result->fetch_assoc();
print $row["C"];
* since you are using an aggregate function and not using GROUP BY
that's why COUNT exists, it always returns one row with number of selected rows
http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
Count() is an aggregate function which means it returns just one row that contains the actual answer. You'd see the same type of thing if you used a function like max(id); if the maximum value in a column was 142, then you wouldn't expect to see 142 records but rather a single record with the value 142. Likewise, if the number of rows is 400 and you ask for the count(*), you will not get 400 rows but rather a single row with the answer: 400.
So, to get the count, you'd run your first query, and just access the value in the first (and only) row.
By the way, you should go with this count(*) approach rather than querying for all the data and taking $result->num_rows; because querying for all rows will take far longer since you're pulling back a bunch of data you do not need.

How to remove NULL values from SELECT SQL query

I have been working on this query for some time and have been unable to determine whether or not it can or cannot be done from my research. To date I have tried many versions of the query below with AND, OR, NOT IN - EXISTS but always get a sub-query error ERROR: sub-select returns 3 columns - expected 1.
I have also tried running them as separate queries, then into a for loop, appending both to a list and using the sorted function in the python program. Except it prints out as two separate lists in long form instead of with a \n after each result.
I have also tried in a for loop using if 'middle' or i[1] = None: print i[0], i[2] and i4 but, I get an error which i cannot find or recall. I also tried turning the data into a dict and parsing it with the field-names i.e if 'middle' == None: print('first', 'last', 'birth'.
Anyhow its 6:53am. I'm exhausted and I've been up all night with this and have to be at work in an hour or so. I don't want the answer but would hope Ito learn if this is at least possible to do with the SELECT query or if I have to split it into part SELECT query and part parsing the None value out through python script.
SELECT first, middle, last, birth
FROM students
WHERE house = 'Gryffindor' AND middle IS NOT NULL
OR (SELECT first, last, birth
FROM students
WHERE house = 'Gryffindor' AND middle IS NULL)
ORDER BY last, first '''

Python/SQL - How the cursor functions

In my code I have a function that needs to return either a string or None depending on what is present in the database. However at the moment the result is a list with the string answer inside, or None. Is there any change that could be made that would result in just a string or None being returned, rather than having to index the list?
Here is the code:
def retrieve_player_name(username):
param = [(username)]
command = ("""
SELECT username FROM players
WHERE username = ?
""")
result = cur.execute(command, param).fetchone()
if result is not None:
return result[0]
Thanks in advance.
A database cursors fetches entire rows, not values.
Even a row with a single value inside is still a row.
If you don't want to write row[0] multiple times, create a helper function execute_and_return_a_single_value_from_query().

Adding parameters to SQLITE3 SELECT column queries python

I am trying to streamLine queries to SQLITE3. I use it for financial price modelling and so am re-using the same basic query alot, but have to keep changing the hard coding to get out different column queries each time. So I want a generic query where I just write in what I want once, then it spits out the columns as lists. This is a basic version of what I want but basically still hard coded so you can see what I am trying to create.
dbName = 'NASDAQ_Equities'
ticker = 'AAPL'
def pullDataTest(dbPathName, ticker, *args):
datep = []
openp = []
highp = []
db = sqlite3.connect(dbPathName + '.mydb', detect_types=sqlite3.PARSE_DECLTYPES | sqlite3.PARSE_COLNAMES, timeout=3)
cursor = db.cursor()
cursor.execute('''SELECT ''' + str(args) + ''' FROM ''' + ticker)
for row in cursor:
datep.append(row[0])
openp.append(row[1])
highp.append(row[2])
pullData(dbName, ticker, 'datep', 'openp', 'highp')
At the moment I am lost on how to enter an *args into the select statement as it rejects it because of the () brackets. Also what will be an issue is creating empty lists and appending to those lists from from an *args. Would it be better to create a order dict to append to, then brake that into lists at the end somehow? On returning values for use later down the track I was thinking of making them globals? Any suggestions? Thanks

'Don't care' for a column in SQLite queries?

I've got a SQLite query, which depends on 2 variables, gender and hand. Each of these can have 3 values, 2 which actually mean something (so male/female and left/right) and the third is 'all'. If a variable has a value of 'all' then I don't care what the particular value of that column is.
Is it possible to achieve this functionality with a single query, and just changing the variable? I've had a look for a wildcard or don't care operator but haven't been able to find any except for % which doesn't work in this situation.
Obviously I could make a bunch of if statements and have different queries to use for each case but that's not very elegant.
Code:
select_sql = """ SELECT * FROM table
WHERE (gender = ? AND hand = ?)
"""
cursor.execute(select_sql, (gender_var, hand_var))
I.e. this query works if gender_val = 'male' and hand_var = 'left', but not if gender_val or hand_var = 'all'
You can indeed do this with a single query. Simply compare each variable to 'all' in your query.
select_sql = """ SELECT * FROM table
WHERE ((? = 'all' OR gender = ?) AND (? = 'all' OR hand = ?))
"""
cursor.execute(select_sql, (gender_var, gender_var, hand_var, hand_var))
Basically, when gender_var or hand_var is 'all', the first part of each OR expression is always true, so that branch of the AND is always true and matches all records, i.e., it is a no-op in the query.
It might be better to build a query dynamically in Python, however, that has just the fields you actually need to test. It might be noticeably faster, but you'd have to benchmark that to be sure.

Categories

Resources