How to remove NULL values from SELECT SQL query - python

I have been working on this query for some time and have been unable to determine whether or not it can or cannot be done from my research. To date I have tried many versions of the query below with AND, OR, NOT IN - EXISTS but always get a sub-query error ERROR: sub-select returns 3 columns - expected 1.
I have also tried running them as separate queries, then into a for loop, appending both to a list and using the sorted function in the python program. Except it prints out as two separate lists in long form instead of with a \n after each result.
I have also tried in a for loop using if 'middle' or i[1] = None: print i[0], i[2] and i4 but, I get an error which i cannot find or recall. I also tried turning the data into a dict and parsing it with the field-names i.e if 'middle' == None: print('first', 'last', 'birth'.
Anyhow its 6:53am. I'm exhausted and I've been up all night with this and have to be at work in an hour or so. I don't want the answer but would hope Ito learn if this is at least possible to do with the SELECT query or if I have to split it into part SELECT query and part parsing the None value out through python script.
SELECT first, middle, last, birth
FROM students
WHERE house = 'Gryffindor' AND middle IS NOT NULL
OR (SELECT first, last, birth
FROM students
WHERE house = 'Gryffindor' AND middle IS NULL)
ORDER BY last, first '''

Related

pyhon: repeatedly mysql query on_message in websocket not getting latest results [duplicate]

I'm wondering why my MySQL COUNT(*) query always results in ->num_rows to be equal 1.
$result = $db->query("SELECT COUNT( * ) FROM u11_users");
print $result->num_rows; // prints 1
Whereas fetching "real data" from the database works fine.
$result = $db->query("SELECT * FROM u11_users");
print $result->num_rows; // prints the correct number of elements in the table
What could be the reason for this?
Because Count(*) returns just one line with the number of rows.
Example:
Using Count(*) the result's something like the following.
array('COUNT(*)' => 20);
echo $result['COUNT(*)']; // 20
Reference
It should return one row*. To get the count you need to:
$result = $db->query("SELECT COUNT(*) AS C FROM u11_users");
$row = $result->fetch_assoc();
print $row["C"];
* since you are using an aggregate function and not using GROUP BY
that's why COUNT exists, it always returns one row with number of selected rows
http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
Count() is an aggregate function which means it returns just one row that contains the actual answer. You'd see the same type of thing if you used a function like max(id); if the maximum value in a column was 142, then you wouldn't expect to see 142 records but rather a single record with the value 142. Likewise, if the number of rows is 400 and you ask for the count(*), you will not get 400 rows but rather a single row with the answer: 400.
So, to get the count, you'd run your first query, and just access the value in the first (and only) row.
By the way, you should go with this count(*) approach rather than querying for all the data and taking $result->num_rows; because querying for all rows will take far longer since you're pulling back a bunch of data you do not need.

Cypher query problem when trying to find max of a returned column under certain relation id

I am facing a very strange problem I am calling the same function get_objects() 4 times and getting the max from the returned column, the item 10172 which should be returned as a maximum still present in the result list but instead of that it returns me another item 9998 which is not a maximum. While for other two calls to the same function with another parameter it gives me correct results.
I have run and tested the statement into Neo4j browser, it gives me the same problem behaves like just that node doesn't exist, but when I individually search for that node 10172 which should be returned as a maximum it does exist in the database but why it is not returning me as maximum in final result?
I also extracted the CSV file from the Neo4j to double check the relation and presence of that specific node. It exists. Where I am going wrong?
I have a data stored in a graph database as 4 types of nodes and they are connected with different 4 relations and the relation id attribute as (1,2,3,4) In cypher query I am trying to get the maximum paper id against relation 1. The problem seems to be exists with relation 1 and relation 4 calls. But I rechecked into database these nodes are present under these particular relations.
Here is what i have tried so far.
def get_objects(x):
par = str(x)
query = ''' MATCH (p)-[r]->(a) WHERE r.id = $par RETURN a.id '''
resultNodes = session.run(query, par = par)
df = DataFrame(resultNodes)
return df[0]
def find_max_1():
authors,terms,venues,papers=0,0,0,0
authors=get_objects(1).max()
terms=get_objects(2).max()
venues=get_objects(3).max()
papers=get_objects(4).max()
return authors,terms,venues,papers
def main():
m = find_max_1()
if __name__ == "__main__":
main()
The output is:
[9998, 14669, 10190, 9999]
Expected output:
[10172, 14669, 10190, 15648]
Any kind of help would be appreciated!
Thanks in advance.
The problem was returned result was string type and max() was calculating maximum between strings instead of int.

Python MySQLdb TypeError("not all arguments converted during string formatting")

I know this is a popular topic but I searched the various answers and didn't see a clear answer to my issue. I have a function that I want to use to insert records into my NDBC database that is giving me the error I mentioned in the title. The function is below:
def insertStdMet(station,cursor,data):
# This function takes in a station id, database cursor and an array of data. At present
# it assumes the data is a pandas dataframe with the datetime value as the index
# It may eventually be modified to be more flexible. With the parameters
# passed in, it goes row by row and builds an INSERT INTO SQL statement
# that assumes each row in the data array represents a new record to be
# added.
fields=list(data.columns) # if our table has been constructed properly, these column names should map to the fields in the data table
# Building the SQL string
strSQL1='REPLACE INTO std_met (station_id,date_time,'
strSQL2='VALUES ('
for f in fields:
strSQL1+=f+','
strSQL2+='%s,'
# trimming the last comma
strSQL1=strSQL1[:-1]
strSQL2=strSQL2[:-1]
strSQL1+=") " + strSQL2 + ")"
# Okay, now we have our SQL string. Now we need to build the list of tuples
# that will be passed along with it to the .executemany() function.
tuplist=[]
for i in range(len(data)):
r=data.iloc[i][:]
datatup=(station,r.name)
for f in r:
datatup+=(f,)
tuplist.append(datatup)
cursor.executemany(strSQL1,tuplist)
When we get to the cursor.executemany() call, strSQL looks like this:
REPLACE INTO std_met (station_id,date_time,WDIR,WSPD,GST,WVHT,DPD,APD,MWD,PRES,ATMP,WTMP,DEWP,VIS) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'
I'm using % signs throughout and I am passing a list of tuples (~2315 tuples). Every value being passed is either a string,datetime, or number. I still have not found the issue. Any insights anyone cares to pass along would be sincerely appreciated.
Thanks!
You haven't given your SQL query a value for either station_id or date_time, so when it goes to unpack your arguments, there are two missing.
I suspect you want the final call to be something like:
REPLACE INTO std_met
(station_id,date_time,WDIR,WSPD,GST,WVHT,DPD,APD,MWD,
PRES,ATMP,WTMP,DEWP,VIS) VALUES (%s, %s, %s,%s,%s,%s,
%s,%s,%s,%s,%s,%s,%s,%s)'
Note the extra two %s. It looks like your tuple already contains values for station_id and date_time, so you could try this change:
strSQL1='REPLACE INTO std_met (station_id,date_time,'
strSQL2='VALUES (%s, %s, '

'Don't care' for a column in SQLite queries?

I've got a SQLite query, which depends on 2 variables, gender and hand. Each of these can have 3 values, 2 which actually mean something (so male/female and left/right) and the third is 'all'. If a variable has a value of 'all' then I don't care what the particular value of that column is.
Is it possible to achieve this functionality with a single query, and just changing the variable? I've had a look for a wildcard or don't care operator but haven't been able to find any except for % which doesn't work in this situation.
Obviously I could make a bunch of if statements and have different queries to use for each case but that's not very elegant.
Code:
select_sql = """ SELECT * FROM table
WHERE (gender = ? AND hand = ?)
"""
cursor.execute(select_sql, (gender_var, hand_var))
I.e. this query works if gender_val = 'male' and hand_var = 'left', but not if gender_val or hand_var = 'all'
You can indeed do this with a single query. Simply compare each variable to 'all' in your query.
select_sql = """ SELECT * FROM table
WHERE ((? = 'all' OR gender = ?) AND (? = 'all' OR hand = ?))
"""
cursor.execute(select_sql, (gender_var, gender_var, hand_var, hand_var))
Basically, when gender_var or hand_var is 'all', the first part of each OR expression is always true, so that branch of the AND is always true and matches all records, i.e., it is a no-op in the query.
It might be better to build a query dynamically in Python, however, that has just the fields you actually need to test. It might be noticeably faster, but you'd have to benchmark that to be sure.

pysqlite, query for duplicate entries with swapped columns

Currently I have a pysqlite db that I am using to store a list of road conditions. The source this list is generated from however is buggy and sometimes generates duplicates. Some of these duplicates will have the start and end points swapped but everything else the same.
The method i currently have looks like this:
def getDupes(self):
'''This method is used to return a list of dupilicate entries
'''
self.__curs.execute('SELECT * FROM roadCond GROUP BY road, start, end, cond, reason, updated, county, timestmp HAVING count(*)>1')
result = self.__curs.fetchall()
def getSwaps():
'''This method is used to grab the duplicates with swapped columns
'''
self.__curs.execute('SELECT * FROM roadCond WHERE ')
extra = self.__curs.fetchall()
return extrac
result.extend(getSwaps())
return result
The the initial query works but I am suspicious of it (I think there is a better way, I just don't know) but I am not all to sure how to make the inner method work.
Thank you ahead of time. :-D
Instead of the first query, you could use
SELECT DISTINCT * FROM roadCond
which will retrieve all the records from the table, removing any duplicates.
As for the inner method, this query will return all the records which have "duplicates" with start and end swapped. Note that, for each record with "duplicates", this query will return both the "original" and the "copy".
SELECT DISTINCT * FROM roadCond WHERE EXISTS (
SELECT * FROM roadCond rc2 WHERE
roadCond.road = rc2.road AND
roadCond.end = rc2.start AND roadCond.start = rc2.end AND
roadCond.cond = rc2.cond AND
... AND
roadCond.timestamp = rc2.timestamp)
Edit: To detect and remove "duplicates" with start and end swapped, you could make sure that your data always contains these values laid out in the same order:
UPDATE roadCond SET start = end, end = start WHERE end < start;
But this approach only works if it doesn't matter which is which.

Categories

Resources