sqlalchemy showing complete row (all columns) from a query - python

I want to log a complete row from a sqlalchemy query (ORM), when a specific bug appears (in my example this is when multiple rows where found, but this has nothing to do with the question).
At the moment i adress every column like this.
try:
result = query.one_or_none()
except MultipleResultsFound:
self.logger.info('MultipleResultsFound!!')
for row in query.all():
self.logger.info('column1:{}, column2:{}, column3:{}'.
format(row.column1, row.column2, row.column3))
But there must be a better way without adressing every column to show every column in the log.
How can i display all columns from a row with one simple command ?

Try this method
Also note you can use .label to naming your func in query.
e.g. db.session.query(func.sum(SomeModel.something).label('total'))
for row in query.all():
print(row._asdict())

Related

How to only get only errors from insert_rows_from_dataframe method in Bigquery Client?

I am using client.insert_rows_from_dataframe method to insert data into my table.
obj = client.insert_rows_from_dataframe(table=TableRef, dataframe=df)
If there is no errors, obj will be an empty list of lists like
> print(obj)
[[] [] []]
But I want to know how to get the error messages out, if there are some errors while inserting?
I tried
obj[["errors"]] ?
but that is not correct. Please help.
To achieve the results that you want, you must set to your DataFrame a header identical to the one in your schema. For example, if you schema in BigQuery has the fields index and name, your DataFrame should have these two columns.
Lets take a look in the example below:
I created an table in BigQuery named insert_from_dataframe which contains the fields index, name and number, respectively INTEGER, STRING and INTEGER, all of them REQUIRED.
In the image below you can see that the insertion cause no errors.In the second image, we can see that the data was inserted.
No erros raised
Data inserted successfully
After that, I removed the column number for the last row of the same data. As you can see below, when I tried to push it to BigQuery, I got an error.
Given that, I would like to reinforce two points:
The error structured that is returned is a list of lists ( [],[],[],...]). The reason for that is because your data is supposed to be pushed in chunks (subsets of your data). In the function used you can specify how many rows each chunk will have using the parameter chunk_size=<number_of_rows>. Lets suppose that your data has 1600 rows and your chunk size is 500. You data will be divided into 4 chunks. The object returned after the insert request, hence, will consist of 4 lists inside a list, where each one of the four lists is related to one chunk. Its also important to say that if a row fails the process, all the rows inside the same chunk will not be inserted in the table.
If you are using string fields you should pay attention in the data inserted. Sometimes Pandas read null values as empty strings and it leads to a misinterpretation of the data by the insertion mechanism. In other words, its possible that you have empty strings inserted in your table while the expected result would be an error saying that the field can not be null.
Finally, I would like to post here some useful links for this problem:
BigQuery client documentation
Working with missing values in Pandas
I hope it helps.

Swapping dataframe column data without changing the index for the table

While compiling a pandas table to plot certain activity on a tool I have encountered a rare error in the data that creates an extra 2 columns for certain entries. This means that one of my computed column data goes into the table 2 cells further on that the other and kills the plot.
I was hoping to find a way to pull the contents of a single cell in a row and swap it into the other cell beside it, which contains irrelevant information in the error case, but which is used for the plot of all the other pd data.
I've tried a couple of different ways to swap the data around but keep hitting errors.
My attempts to fix it include:
for rows in df['server']:
if '%USERID' in line:
df['server'] = df[7] # both versions of this and below
df['server'].replace(df['server'],df[7])
else:
pass
if '%USERID' in df['server']: # Attempt to fix missing server name
df['server'] = df[7];
else:
pass
if '%USERID' in df['server']:
return row['7'], row['server']
else:
pass
I'd like the data from column '7' to be replicated in 'server', only in the case of the error - where the data in the cell contains a string starting with '%USERID'
Turns out I was over-thinking this one. I took a step back, worked the code a bit and solved it.
Rather than trying to smash a one-size fits all bit of code for the all data I built separate lists for the general data and 2 exception I found, by writing a nested loop and created 3 data frames. These were easy enough to then manipulate individually, and finally concatenate together. All working fine now.

ibm_db.execute- how to get the result set

Newbie working with db2. Developing a python script using ibm_db package. I have a select query where i am binding params using ibm_db.bind_param(stmt, 1,param1). and then doing a result = ibm_db.execute(stmt). How can I get the results from the query. the documentation is scarce on this topic. Would appreciate any example code.
After ibm_db.execute(stmt) you need to fetch data from a result
try this:
data = ibm_db.fetch_assoc(stmt)
Fetch data from a result set by calling one of the fetch functions.
ibm_db.fetch_tuple: Returns a tuple, which is indexed by column position, representing a row in a result set. The columns are 0-indexed.
ibm_db.fetch_assoc: Returns a dictionary, which is indexed by column name, representing a row in a result set.
ibm_db.fetch_both: Returns a dictionary, which is indexed by both column name and position, representing a row in a result set.
ibm_db.fetch_row: Sets the result set pointer to the next row or requested row. Use this function to iterate through a result set.
Study the examples for fetching result-sets in Python with ibm_db, that are in the Db2 knowledge center online at this link

Get only the first 10 columns of a row using happybase

Is it possible to get only a limited number of columns for a column family from a row? Lets say I just want to fetch the first 10 values for ['cf1': 'col1'] for a particular row.
This is the same question as https://github.com/wbolster/happybase/issues/93
The answer is:
I think the only way to do this is a scan with a server side filter. I think the one you're after is the ColumnCountGetFilter:
ColumnCountGetFilter - takes one argument, a limit. It returns the first limit number of columns in the table. Syntax: ColumnCountGetFilter (‘’) Example: ColumnCountGetFilter (4)
Source: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/admin_hbase_filtering.html
With Happybase that would look like this (untested):
for row_key, data in table.scan(columns=['cf1'], filter='ColumnCountGetFilter(10)'):
print(row_key, data)
use limit to get specific row in hbase
table.scan(limit=int(limit)

appending non-unique rows to another database using python

Hey all,
I have two databases. One with 145000 rows and approx. 12 columns. I have another database with around 40000 rows and 5 columns. I am trying to compare based on two columns values. For example if in CSV#1 column 1 says 100-199 and column two says Main St(meaning that this row is contained within the 100 block of main street), how would I go about comparing that with a similar two columns in CSV#2. I need to compare every row in CSV#1 to each single row in CSV#2. If there is a match I need to append the 5 columns of each matching row to the end of the row of CSV#2. Thus CSV#2's number of columns will grow significantly and have repeat entries, doesnt matter how the columns are ordered. Any advice on how to compare two columns with another two columns in a separate database and then iterate across all rows. I've been using python and the import csv so far with the rest of the work, but this part of the problem has me stumped.
Thanks in advance
-John
A csv file is NOT a database. A csv file is just rows of text-chunks; a proper database (like PostgreSQL or Mysql or SQL Server or SQLite or many others) gives you proper data types and table joins and indexes and row iteration and proper handling of multiple matches and many other things which you really don't want to rewrite from scratch.
How is it supposed to know that Address("100-199")==Address("Main Street")? You will have to come up with some sort of knowledge-base which transforms each bit of text into a canonical address or address-range which you can then compare; see Where is a good Address Parser but be aware that it deals with singular addresses (not address ranges).
Edit:
Thanks to Sven; if you were using a real database, you could do something like
SELECT
User.firstname, User.lastname, User.account, Order.placed, Order.fulfilled
FROM
User
INNER JOIN Order ON
User.streetnumber=Order.streetnumber
AND User.streetname=Order.streetname
if streetnumber and streetname are exact matches; otherwise you still need to consider point #2 above.

Categories

Resources