I wrote a method to get the status of a csv file in a SQL Server table. The table has column named CSV_STATUS, and for the particular csv, I'd like my method to give me the value of the CSV status. I wrote the following function:
def return_csv_status_db(db_instance, name_of_db_instance_tabledict, csvfile_path):
table_dict = db_instance[name_of_db_instance_tabledict]
csvfile_name = csvfile_path.name
sql = db.select([table_dict['table'].c.CSV_STATUS]).where(table_dict['table'].c.CSV_FILENAME == csvfile_name)
result = table_dict['engine'].execute(sql)
print(result)
Whenever I print result, it returns: <sqlalchemy.engine.result.ResultProxy object at 0x0000005E642256C8>
How can I extract the value of the select statement?
Take a look at [1].
As I understand it, you need to do the following:
for row in result:
# do what you need to do for each row.
[1] - https://docs.sqlalchemy.org/en/13/core/connections.html
I've been querying a few API's with Python to individually create CSV's for a table.
I would like to try and instead of recreating the table each time, update the existing table with any new API data.
At the moment the way the Query is working, I have a table that looks like this,
From this I am taking the suburbs of each state and copying them into a csv for each different state.
Then using this script I am cleaning them into a list (the api needs the %20 for any spaces),
"%20"
#suburbs = ["want this", "want this (meh)", "this as well (nope)"]
suburb_cleaned = []
#dont_want = frozenset( ["(meh)", "(nope)"] )
for urb in suburbs:
cleaned_name = []
name_parts = urb.split()
for part in name_parts:
if part in dont_want:
continue
cleaned_name.append(part)
suburb_cleaned.append('%20'.join(cleaned_name))
Then taking the suburbs for each state and putting them into this API to return a csv,
timestr = time.strftime("%Y%m%d-%H%M%S")
Name = "price_data_NT"+timestr+".csv"
url_price = "http://mwap.com/api"
string = 'gxg&state='
api_results = {}
n = 0
y = 2
for urbs in suburb_cleaned:
url = url_price + urbs + string + "NT"
print(url)
print(urbs)
request = requests.get(url)
api_results[urbs] = pd.DataFrame(request.json())
n = n+1
if n == y:
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
y = y+2
continue
print("made it through"+urbs)
# print(request.json())
# print(api_results)
dfs = pd.concat(api_results).reset_index(level=1, drop=True).rename_axis(
'key').reset_index().set_index(['key'])
dfs.to_csv(Name, sep='\t', encoding='utf-8')
Then adding the states manually in excel, and combining and cleaning the suburb names.
# use pd.concat
df = pd.concat([act, vic,nsw,SA,QLD,WA]).reset_index().set_index(['key']).rename_axis('suburb').reset_index().set_index(['state'])
# apply lambda to clean the %20
f = lambda s: s.replace('%20', ' ')
df['suburb'] = df['suburb'].apply(f)
and then finally inserting it into a db
engine = create_engine('mysql://username:password#localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Price_historic', conn, if_exists='replace',index=False)
Leading this this sort of output
Now, this is a hek of a process. I would love to simplify it and make the database only update the values that are needed from the API, and not have this much complexity in getting the data.
Would love some helpful tips on achieving this goal - I'm thinking I could do an update on the mysql database instead of insert or something? and with the querying of the API, I feel like I'm overcomplicating it.
Thanks!
I don't see any reason why you would be creating CSV files in this process. It sounds like you can just query the data and then load it into a MySql table directly. You say that you are adding the states manually in excel? Is that data not available through your prior api calls? If not, could you find that information and save it to a CSV, so you can automate that step by loading it into a table and having python look up the values for you?
Generally, you wouldn't want to overwrite the mysql table every time. When you have a table, you can identify the column or columns that uniquely identify a specific record, then create a UNIQUE INDEX for them. For example if your street and price values designate a unique entry, then in mysql you could run:
ALTER TABLE `Price_historic` ADD UNIQUE INDEX(street, price);
After this, your table will not allow duplicate records based on those values. Then, instead of creating a new table every time, you can insert your data into the existing table, with instructions to either update or ignore when you encounter a duplicate. For example:
final_str = "INSERT INTO Price_historic (state, suburb, property_price_id, type, street, price, date) " \
"VALUES (%s, %s, %s, %s, %s, %s, %s, %s) " \
"ON DUPLICATE KEY UPDATE " \
"state = VALUES(state), date = VALUES(date)"
con = pdb.connect(db_host, db_user, db_pass, db_name)
with con:
try:
cur = con.cursor()
cur.executemany(final_str, insert_list)
If the setup you are trying is something for longer term , I would suggest running 2 diff processes in parallel-
Process 1:
Query API 1, obtain required data and insert into DB table, with binary / bit flag that would specify only API 1 has been called.
Process 2:
Run query on DB to obtain all records needed for API call 2 based on binary/bit flag that we set in process 1--> For corresponding data run call 2 and update data back to DB table based on primary Key
Database : I would suggest adding Primary Key as well as [Bit Flag][1] that gives status of different API call statuses. Bit Flag also helps you
- in case you want to double confirm if specific API call has been made for specific record not.
- Expand your project to additional API calls and can still track status of each API call at record level
[1]: Bit Flags: https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions014.htm#SQLRF00612
I am working on a script to read from an oracle table with about 75 columns in one environment and load it into same table definition in a different environment. Till now I have been using cx_Oracle cur.execute() method to 'INSERT INTO TABLENAME VALUES(:1,:2,:3..:8);' and then load the data using 'cur.execute(sql, conn)' method.
However,this table that I'm trying to load has about 75+ columns and writing (:1, :2 ... :75) would be tedious and I'm guessing not part of best practice.
Is there an automated way to loop over the number of columns and automatically fill the values() portion of the SQL query.
user = 'username'
pass = getpass.getpass()
connection_prod = cx_Oracle.makedsn(host, port, service_name = '')
cursor_prod = connection_prod.cursor()
connection_dev = cx_Oracle.makedsn(host, port, service_name = '')
cursor_dev = connection_dev.cursor()
SQL_Read = """Select * from Table_name_Prod"""
Data = cur.execute(SQL_Read, connection_prod)
for row in Data:
SQL_Load = "INSERT INTO TABLE_NAME_DEV VALUES(:1, :2,:3, :4 ...:75);" --This part is ugly and tedious.
cursor_dev.execute(SQL_LOAD, row)
This is where I need Help
connection_Prod.commit()
cursor_Prod.close()
connection_Prod.close()
You can do the following which should help not only in reducing code but also in improving performance:
connection_prod = cx_Oracle.connect(...)
cursor_prod = connection_prod.cursor()
# set array size for source cursor to some reasonable value
# increasing this value reduces round-trips but increases memory usage
cursor_prod.arraysize = 500
connection_dev = cx_Oracle.connect(...)
cursor_dev = connection_dev.cursor()
cursor_prod.execute("select * from table_name_prod")
bind_names = ",".join(":" + str(i + 1) \
for i in range(len(cursor_prod.description)))
sql_load = "insert into table_name_dev values (" + bind_names + ")"
while True:
rows = cursor_prod.fetchmany()
if not rows:
break
cursor_dev.executemany(sql_load, rows)
# can call connection_dev.commit() here if you want to commit each batch
The use of cursor.executemany() will significantly help in terms of performance. Hope this helps you out!
I'm trying to save a column value into a python variable; I query my DB for a int in a columns called id_mType and id_meter depending of a value that I get from a XML. To test it I do the next (I'm new using databases):
m = 'R1'
id_cont1 = 'LGZ0019800712'
xdb = cursor.execute("SELECT id_mType FROM mType WHERE m_symbol = %s", m)
xdb1 = cursor.execute("select id_meter from meter where nombre = %s",
id_cont1)
print (xdb)
print (xdb1)
I get every time the value "1" where the id_mType for 'R1' = 3 and id_meter= 7 for id_cont1 value. I need this to insert in another table (where there are both FK: id_meter and id_mType. Dont know if there is an easiest way)
You can store it in a list. Is that okay?
results=cursor.fetchall()
my_list=[]
for result in results:
my_list.append(result[0])
Now my_list should hold the SQL column you get returned with your query.
Use the fetchone() method to fetch a row from a cursor.
row = xdb.fetchone()
if row:
mtype = row[0]
row = xdb1.fetchone()
if row:
meter = row[0]
I am retrieving a value from a SQLITE table and then using this value to retrieve data from another table using a SELECT FROM WHERE statement. I cannot use the retrieved value to query the other table even though the values appear to match when I retrieve this value independently from the table I am querying. I get Error binding parameter 0 - probably unsupported type. I am passing the value I believe correctly from what I've read in the docs, but there is obviously something wrong.
Edit: The ulitmate goal is to select the Name and EndTime then insert the EndTime value in another table in the same db if a column value = Name in that table. Added Update code below that gives an idea of what I'm attempting to accomplish.
When I print nameItem it appears this is a unicode string, (u'Something') is how it appears. I don't know if this is an encoding issue, but I have used similar queries before and not run into this issue.
I have tried to use the text I am searching for directly in the SELECT query and this works, but when passing it as a variable I still get unsupported type.
c.execute('SELECT Name FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for i in range(len(timeList)):
nameItem = timeList[i]
c.execute('SELECT "EndTime" FROM Expected WHERE "Name" = ?', (nameItem,))
end = c.fetchone()
conn = sqlite3.connect(otherDb)
c = conn.cursor()
c.execute('UPDATE Individuals SET Ending = end WHERE NAME = NameItem')
I expect this to retrieve a time associated with the current value of nameItem.
The first fix to try is to use string instead of tuple (note timeList is list of tuples):
nameItem = timeList[i][0]
But better is to not use len, something like:
c.execute('SELECT Name FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for elem in timeList:
nameItem = elem[0]
c.execute('SELECT "EndTime" FROM Expected WHERE "Name" = ?', (nameItem,))
end = c.fetchone()
And even better is to fetch EndTime in the first query (looks acceptable in given context):
c.execute('SELECT Name, EndTime FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for elem in timeList:
name = elem[0]
end_time = elem[1]