Related
I have a list called
fullpricelist=[373.97, 381.0, 398.98, 402.98, 404.98, 457.97, 535.99, 550.97, 566.98]
I would like to write this list into a slqlite database column, I found the following code from another question and changed it to my situation.
cursor.executemany("""INSERT INTO cardata (fullprice) VALUES (?)""",
zip(fullpricelist))
My current script is this
for name, name2, image in zip(car_names, car_names2, images):
cursor.execute(
"insert into cardata (carname, carmodel, imageurl, location, Fro, T, companyid) values (?, ?, ?, ?, ?, ?, ?)",
(name.text, name2.text, image.get_attribute('src'), location, pickup_d, return_d, Rental_ID)
)
But now I am confused how to add these codes together
In your second piece of code, execute() is called and one specific object is stored in the database each loop iteration. This is slow and inefficient.
for price in fullpricelist:
cursor.execute("""INSERT INTO cardata (fullprice) VALUES (?)""", price)
executemany() reads from an iterable and adds each element of the iterable to the database as a distinct row. If you add many elements to a database and care about efficiency, you want to use executemany()
cursor.executemany("""INSERT INTO cardata (fullprice) VALUES (?)""", fullpricelist)
If you want to include the other columns in your question, your code will be
cursor.executemany("""INSERT INTO cardata (carname, carmodel, imageurl, location, Fro, T, companyid) values (?, ?, ?, ?, ?, ?, ?)""",
[
[name.text for name in car_names],
[name.text for name in car_names2],
[image.get_attribute('src') for image in images],
[location]*len(car_names),
[pickup_d]*len(car_names),
[return_d]*len(car_names),
[Rental_ID]*len(car_names)
]
)
This assumes all values for location, pickup_d, return_d and Rental_ID are the same, as you did not provide a list of the values.
From this answer:
cursor.execute("INSERT INTO booking_meeting (room_name,from_date,to_date,no_seat,projector,video,created_date,location_name) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (rname, from_date, to_date, seat, projector, video, now, location_name ))
I'd like to shorten it to something like:
simple_insert(booking_meeting, rname, from_date, to_date, seat, projector, video, now, location_name)
The first parameter is the table name which can be read to get list of column names to format the first section of the SQLite3 statement:
cursor.execute("INSERT INTO booking_meeting (room_name,from_date,to_date,no_seat,projector,video,created_date,location_name)
Then the values clause (second part of the insert statement):
VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
can be formatted by counting the number of column names in the table.
I hope I explained the question properly and you can appreciate the time savings of such a function. How to write this function in python? ...is my question.
There may already a simple_insert() function in SQLite3 but I just haven't stumbled across it yet.
If you're inserting into all the columns, then you don't need to specify the column names in the INSERT query. For that scenario, you could write a function like this:
def simple_insert(cursor, table, *args):
query = f'INSERT INTO {table} VALUES (' + '?, ' * (len(args)-1) + '?)'
cursor.execute(query, args)
For your example, you would call it as:
simple_insert(cursor, 'booking_meeting', rname, from_date, to_date, seat, projector, video, now, location_name)
Note I've chosen to pass cursor to the function, you could choose to just rely on it as a global variable instead.
I'm writing a script that mines baseball statistics from a website and inputs them into SQLite database. This piece of code is where I'm dumping from one of my dataframes (hit_stats) into the largest of my tables in the DB (Hitting Stats):
d = 0
for row in hit_stats:
curs.execute('INSERT INTO Hitting_Stats (player_id, season_h, games_h, plate_appearances, at_bats,
runs, hits, doubles, triples, homeruns, total_bases, runs_batted_in, stolen_bases, caught_stealing,
sacrifice_hits, sacrifice_flies, walks, hits_by_pitch, strikeouts_h, grounded_into_double_plays,
batting_average, slugging_percentage, on_base_percentage) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?)', (temp_player_id, hit_stats['Year'][d], hit_stats['G'][d],
hit_stats['PA'][d], hit_stats['AB'][d], hit_stats['R'][d], hit_stats['H'][d], hit_stats['2B'][d],
hit_stats['3B'][d], hit_stats['HR'][d], hit_stats['TB'][d], hit_stats['RBI'][d], hit_stats['SB'][d],
hit_stats['CS'][d], hit_stats['SH'][d], hit_stats['SF'][d], hit_stats['BB'][d], hit_stats['HP'][d],
hit_stats['SO'][d], hit_stats['GDP'][d], hit_stats['AVG'][d], hit_stats['SLG'][d], hit_stats['OBP'][d]))
d += 1
(The indentations are correct in my actual code)
When I run the script, it says: "KeyError: 3"
Strangely, when I open up a Jupyter Notebook and run the following smaller bit of code, I get the correct output all the way through the single row/Series, but then I get the KeyError at the end AFTER the output:
d = 0
for row in hit_stats:
print(hit_stats['Year'][d])
d += 1
So it seems to be fine with everything until the end of the loop. Any ideas?
In case it isn't obvious, I'm a beginner, so if there's simply a better way to do this, feel free to just send me to a resource. Thanks in advance for your time!
After banging my head against the wall for a while, I realized that for loops iterate through columns of dataframes by default, not rows. I re-wrote the loop using .iterrows and it worked.
I've got an MS Access table (SearchAdsAccountLevel) which needs to be updated frequently from a python script. I've set up the pyodbc connection and now I would like to UPDATE/INSERT rows from my pandas df to the MS Access table based on whether the Date_ AND CampaignId fields match with the df data.
Looking at previous examples I've built the UPDATE statement which uses iterrows to iterate through all rows within df and execute the SQL code as per below:
connection_string = (
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"
r"c:\AccessDatabases\Database2.accdb;"
)
cnxn = pyodbc.connect(connection_string, autocommit=True)
crsr = cnxn.cursor()
for index, row in df.iterrows():
crsr.execute("UPDATE SearchAdsAccountLevel SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?, [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, [RowUpdatedTime]=? WHERE [Date_]=? AND [CampaignId]=?",
row['OrgId'],
row['CampaignName'],
row['CampaignStatus'],
row['Storefront'],
row['AppName'],
row['AppId'],
row['TotalBudgetAmount'],
row['TotalBudgetCurrency'],
row['DailyBudgetAmount'],
row['DailyBudgetCurrency'],
row['Impressions'],
row['Taps'],
row['Conversions'],
row['ConversionsNewDownloads'],
row['ConversionsRedownloads'],
row['Ttr'],
row['LocalSpendAmount'],
row['LocalSpendCurrency'],
row['ConversionRate'],
row['Week_'],
row['Month_'],
row['Year_'],
row['Quarter'],
row['FinancialYear'],
row['RowUpdatedTime'],
row['Date_'],
row['CampaignId'])
crsr.commit()
I would like to iterate through each row within my df (around 3000) and if the ['Date_'] AND ['CampaignId'] match I UPDATE all other fields. Otherwise I want to INSERT the whole df row in my Access Table (create new row). What's the most efficient and effective way to achieve this?
Consider DataFrame.values and pass list into an executemany call, making sure to order columns accordingly for the UPDATE query:
cols = ['OrgId', 'CampaignName', 'CampaignStatus', 'Storefront',
'AppName', 'AppId', 'TotalBudgetAmount', 'TotalBudgetCurrency',
'DailyBudgetAmount', 'DailyBudgetCurrency', 'Impressions',
'Taps', 'Conversions', 'ConversionsNewDownloads', 'ConversionsRedownloads',
'Ttr', 'LocalSpendAmount', 'LocalSpendCurrency', 'ConversionRate',
'Week_', 'Month_', 'Year_', 'Quarter', 'FinancialYear',
'RowUpdatedTime', 'Date_', 'CampaignId']
sql = '''UPDATE SearchAdsAccountLevel
SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?,
[AppName]=?, [AppId]=?, [TotalBudgetAmount]=?,
[TotalBudgetCurrency]=?, [DailyBudgetAmount]=?,
[DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?,
[ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?,
[LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?,
[Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?,
[RowUpdatedTime]=?
WHERE [Date_]=? AND [CampaignId]=?'''
crsr.executemany(sql, df[cols].values.tolist())
cnxn.commit()
For the insert, use a temp, staging table with exact structure as final table which you can create with make-table query: SELECT TOP 1 * INTO temp FROM final. This temp table will be regularly cleaned out and inserted with all data frame rows. The final query migrates only new rows from temp into final with NOT EXISTS, NOT IN, or LEFT JOIN/NULL. You can run this query anytime and never worry about duplicates per Date_ and CampaignId columns.
# CLEAN OUT TEMP
sql = '''DELETE FROM SearchAdsAccountLevel_Temp'''
crsr.executemany(sql)
cnxn.commit()
# APPEND TO TEMP
sql = '''INSERT INTO SearchAdsAccountLevel_Temp (OrgId, CampaignName, CampaignStatus, Storefront,
AppName, AppId, TotalBudgetAmount, TotalBudgetCurrency,
DailyBudgetAmount, DailyBudgetCurrency, Impressions,
Taps, Conversions, ConversionsNewDownloads, ConversionsRedownloads,
Ttr, LocalSpendAmount, LocalSpendCurrency, ConversionRate,
Week_, Month_, Year_, Quarter, FinancialYear,
RowUpdatedTime, Date_, CampaignId)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?);'''
crsr.executemany(sql, df[cols].values.tolist())
cnxn.commit()
# MIGRATE TO FINAL
sql = '''INSERT INTO SearchAdsAccountLevel
SELECT t.*
FROM SearchAdsAccountLevel_Temp t
LEFT JOIN SearchAdsAccountLevel f
ON t.Date_ = f.Date_ AND t.CampaignId = f.CampaignId
WHERE f.OrgId IS NULL'''
crsr.executemany(sql)
cnxn.commit()
Background:
I'm ultimately trying to join two tables that reside in completely different databases. I've loaded the data into dataframes with the intent of executing a SQL join through sqldf (both because the join is on an inequality and because I'm very comfortable in SQL, but less so in python).
Environment:
Jupyter notebook on Anaconda
PandaSQL 0.7.3
Numpy 1.14.3
SQLAlchemy 1.2.7
Python 3.6.5
Windows 10
Issue:
I can reproduce this problem using a simple built-in dataset and a very simple query. The following code:
from pandasql import sqldf, load_meat
meat = load_meat()
print(sqldf("SELECT * FROM meat;",locals()))
returns:
OperationalError: too many SQL variables
Ultimately, this traces to sqlite3 where an "INSERT INTO" statement is using an 8-tuple set of parameters for each row and then passing values into those parameters:
SQL: 'INSERT INTO meat (date, beef, veal, pork, lamb_and_mutton, broilers, other_chicken, turkey)
VALUES (?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?),
....
(?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?)']
[parameters: ('1944-01-01 00:00:00.000000', 751.0, 85.0, 1280.0, 89.0, None, None, None, '1944-02-01 00:00:00.000000', 713.0, 77.0, 1169.0, 72.0, None, None, None,
....
I've also reproduced this error by loading a simple CSV into a dataframe (500,2). If I reduce the CSV/df from 500 rows to 499, then sqldf works fine.
In search of a solution, I've read plenty about the 999-parameter limit of SQLite. However, I've seen plenty of examples using the built-in datasets. The last example, in particular, is directly from the pandasql repository. Running the code in that example, the section dealing with the iris data (150x6) runs fine while the meat data (827x8) causes the parameter error described above.
I found one other reference to this problem on StackOverflow, but there's been no activity there.