Python: Save the previously scraped data in database (already created) - python
I have done the scraping with Selenium correctly and the results are printed perfectly. I don't know how to save them in a database I have already created the tables and fields with Sqlite. In general I know how to save the data in a database, but I don't know how to save the scraping data. What I don't know is how to write the code to save the scraping data.
Now I would just like to add the scraped data into the database and save. The database table name is TableExample1. The fields are: ID, Product_NameDB, Product_DescriptionDB, VendorDB, PriceDB.
I have 2 problems
The text that is saved in the database like this: [<selenium.webdriver.firefox.webelement.FirefoxWebElement (session = "226bc9f3
Only one (1) row is added to the database. Instead, dozens or hundreds of them should be added (each like this: Product_Name, Product_Description, Vendor, Price), depending on the scraped data.
The Python code with the data printed by the scraping is as follows:
#Name of the scraped data
Product_Name = (driver.find_element_by_class_name ("tablet-desktop-only"). Text)
Product_Description = (driver.find_element_by_class_name ("h-text-left"). Text)
Vendor = (driver.find_element_by_class_name ("in-vendor"). Text)
Price = (driver.find_element_by_class_name ("h-text-center"). Text)
for seq in Product_Name + Product_Description + Vendor + Price:
print(seq.text)
UPDATE N.1
Current code, following "Jeremy Kahan"'s answers to this question
and in THIS (it is different, I asked to print in the console all the results of the scraping and not just 1)
This is the current most stable and functioning code. Scraping works fine, but that Num_Groups and for i in range (Num_Groups) you suggested in the other question, print only one group, not all.
I still have the same 2 problems
The text that is saved in the database like this: [<selenium.webdriver.firefox.webelement.FirefoxWebElement (session = "226bc9f3
Only one (1) row is added to the database. Instead, dozens or hundreds of them should be added (each like this: Product_Name, Product_Description, Vendor, Price), depending on the scraped data.
import sqlite3
from datetime import datetime
#SCRAPING
Product_Name=driver.find_elements_by_class_name("tablet-desktop-only")
Product_Description=driver.find_elements_by_class_name("h-text-left")
Vendor=driver.find_elements_by_class_name("in-match")
Price=driver.find_elements_by_class_name("h-text-center")
# How do I print the other data always with the same html name?
# This is one row data. This is the code you wrote me in the
# other question. Print only one group. What to write to print
# all groups?
Num_Groups = min(len(Product_Name),len(Product_Description),len(Vendor), len(Price))
for i in range(Num_Groups):
print(Product_Name[i].text)
print(Product_Description[i].text)
print(Vendor[i].text)
print(Price[i].text)
#INSERT DATA IN DATABASE
con = sqlite3.connect('/home/mypc/Desktop/aaaaa/Database.db')
cursor = con.cursor()
ID=datetime.today().strftime('%Y%m%d%H%M%S')
Values = f"VALUES({ID},'{Product_Name}','{Product_Description}','{Vendor}','{Price}')"
sqlite_insert_query = 'INSERT INTO TableExample1 (ID, Product_Name, Product_Description, Vendor, Price)' + Values
count = cursor.execute(sqlite_insert_query)
con.commit()
print("Record inserted successfully ", cursor.rowcount)
cursor.close()
2° UPLOAD (final?)
PROBLEM: found 12 groups, Record inserted successfully 1, Added a total of 1 records How do I get 12 staves inserted in the database?
Num_Groups = min(len(Product_Name),len(Product_Description),len(Vendor), len(Price))
records_added = 0
for i in range(Num_Groups):
print(Product_Name[i].text)
print(Product_Description[i].text)
print(Vendor[i].text)
print(Price[i].text)
con = sqlite3.connect('/home/mypc/Scrivania/aaaa/Database.db')
cursor = con.cursor()
Values = f" VALUES ('{Product_Name[i].text}','{Product_Description[i].text}','{Vendor[i].text}','{Price[i].text}')"
sqlite_insert_query = 'INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price)' + Values
print("Qui, all'interno del ciclo, eseguiresti" + sqlite_insert_query)
count = cursor.execute(sqlite_insert_query)
con.commit()
print("Record inserted successfully ", cursor.rowcount)
records_added = records_added + 1
cursor.close()
print("")
print(f'Added a total of {records_added} records')
print(f"found {Num_Groups} groups") # should be more than 1
Okay. So you are telling me you are good with connecting to the database, getting a cursor, executing sql with the cursor and the connection, and committing changes and closing the cursor. What you need is the sql string to execute.
Values = f"VALUES('{Product_Name}','{Product_Description}','{Vendor}','{Price}')"
sqlite_insert_query = 'INSERT INTO TableExample1(Product_NameDB, Product_DescriptionDB, VendorDB, PriceDB) ' + Values
Then you should be able to, assuming cursor is the cursor set up:
count = cursor.execute(sqlite_insert_query)
I am assuming you set things up so ID is a unique key that will be generated if you do not specify it.
EDIT: So it sounds like the database was struggling with the ID and not making more than one new one. Assuming your IDs just need to be sequential (and only you are using it and scraping takes more than a second), you could handle IDs yourself as follows. After from datetime import datetime earlier
ID=datetime.today().strftime('%Y%m%d%H%M%S')
Values = f"VALUES({ID},'{Product_Name}','{Product_Description}','{Vendor}','{Price}')"
sqlite_insert_query = 'INSERT INTO TableExample1(ID, Product_NameDB, Product_DescriptionDB, VendorDB, PriceDB) ' + Values
Also, you mentioned getting the text describing the element in Product_Name as opposed to just the Product_Name (and similarly for the other fields). You need:
Product_Name = driver.find_element_by_class_name("tablet-desktop-only").text
(or to put (Product_Name.text) in my f-string, but that seems confusing)
2ND Edit:
What #Prophet was trying to say at https://stackoverflow.com/questions/68110293/problems-saving-scraped-data-to-database-python/68111109?noredirect=1#comment120399103_68111109 is as follows.
When you say:
`Product_Name = driver.find_element_by_class_name("tablet-desktop-only").text`
Product_Name is a string, and your for loop on Product_Name + Product_Description + Vendor + Price
is looping over a single concatenated string, and seq then has the characters of the string, taken one at a time. Then seq.text fails, as you experienced. That is why I commented out the print command over there, and put one in later to print the Values string instead. That is an approach that should work.
If you leave things the way you had them originally
`Product_Name = driver.find_element_by_class_name("tablet-desktop-only")
Product_Name is an element, which when merged into the Values string, gets converted to a string representation of the element, which is why you see all that irrelevant text in the database. I understand your leaving it so that the for loop will work, but then you should do something like this:
ID=datetime.today().strftime('%Y%m%d%H%M%S')
Values = f"VALUES({ID},'{Product_Name.text}','{Product_Description.text}','{Vendor.text}','{Price.text}')"
or
Product_Name_Text = Product_Name.Text
Product_Description_Text = Product_Description.Text
Vendor_Text = Vendor.Text
Price_Text = Price.Text
and then
Values = f"VALUES({ID},'{Product_Name_Text}','{Product_Description_Text}','{Vendor_Text}','{Price_Text}')"
or
ID=datetime.today().strftime('%Y%m%d%H%M%S')
Values = f"VALUES({ID}"
for seq in Product_Name + Product_Description + Vendor + Price:
print(seq.text)
Values = Values + "," + "'" + seq.text + "'"
Values = Values + ")"
I would definitely recommend, at least for debugging, printing Values or sqlite_insert_query. If you share those results back with us, we may be able to help, if it's still not working.
In the above options you can leave out my assigning the ID if that turns out not be an issue after all.
I do not see any loop in your code over different groups of elements, so I am not sure what you are expecting in terms of inserting more than one new entry.
3rd (final?) Edit:
There are 2 issues you are having. That you are getting extra stuff in the database is because you are putting the description of each element into the Values part of the SQL as opposed to the text. I have illustrated how to fix that by formmating the text into the Values string.
The second issue you are having is that your locator is finding only one thing when you say find_elements (I cannot debug your locators because I don't know the page). But the code here should tell you how many things matched. To test that hypothesis, I wrote my own version below, that works (if the page is permanent) grabbing data from Amazon on cosmetics. The details are not important, but the code should illustrate what has to happen. Also, I believe what I did earlier with IDs in unnecessary, the database will handle that.
Since I am not actually a database, I commented that out. You will uncomment it.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
# launch and go to site, yours will vary
driver = webdriver.Firefox(executable_path='/usr/bin/geckodriver')
driver.maximize_window()
wait = WebDriverWait(driver, 15)
driver.get("https://www.amazon.com/b?node=18505451011&pd_rd_w=gVvMZ&pf_rd_p=b6363b44-58dd-4354-979f-1446a1c45f7a&pf_rd_r=5FYAS41AJR7GPQ5Q74J4&pd_rd_r=9bfd1639-2256-4c17-928c-99cf03e00d63&pd_rd_wg=UPUHV")
wait.until(EC.presence_of_element_located((By.LINK_TEXT, "See all results")))
# SCRAPING
# you will want to change back to your locators
Product_Name = driver.find_elements_by_class_name("apb-line-clamp-3")
# actually number of reviewers, picks up see all at the bottom, too, but it's ok because of min
Product_Description = driver.find_elements_by_class_name("a-color-link")
Vendor = driver.find_elements_by_class_name(
"apb-browse-searchresults-product-byline")
Price = driver.find_elements_by_class_name("a-price") # prices
# if the locators work well, these match multiple groups of products not just one
Num_Groups = min(len(Product_Name), len(
Product_Description), len(Vendor), len(Price))
print(f"found {Num_Groups} groups") # should be more than 1
# I have commented out the database code, because I am not actually using one
#con = sqlite3.connect('/home/mypc/Desktop/aaaaa/Database.db')
# connect to database outside the loop, not for each item
# I removed the code about generating ID's, which the database should handle
records_added = 0
for i in range(Num_Groups): # should cause i to count from 0 up to and including Num_Groups-1
print("") # skip a line between stuff
print(Product_Name[i].text)
print(Product_Description[i].text)
print(Vendor[i].text)
print(Price[i].text)
# note below I need to format the .text into the values string, not the text description of the element
Values = f" VALUES ('{Product_Name[i].text}','{Product_Description[i].text}','{Vendor[i].text}','{Price[i].text}')"
sqlite_insert_query = 'INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price)' + Values
print("Here, inside the loop, you would execute " + sqlite_insert_query)
#cursor = con.cursor()
#count = cursor.execute(sqlite_insert_query)
# con.commit()
#print("Record inserted successfully ", cursor.rowcount)
records_added = records_added + 1
# cursor.close()
print("")
print(f'Added a total of {records_added} records')
output was:
found 12 groups
Crest 3D White Professional Effects Whitestrips 20 Treatments + Crest 3D White 1 Hour Express Whitestrips 2 Treatments - Teeth Whitening Kit
46,020
by Crest
$47
88
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Crest 3D White Professional Effects Whitestrips 20 Treatments + Crest 3D White 1 Hour Express Whitestrips 2 Treatments - Teeth Whitening Kit','46,020','by Crest','$47
88')
REVLON One-Step Hair Dryer And Volumizer Hot Air Brush, Black, Packaging May Vary
274,196
by REVLON
$41
99
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('REVLON One-Step Hair Dryer And Volumizer Hot Air Brush, Black, Packaging May Vary','274,196','by REVLON','$41
99')
Waterpik WP-660 Water Flosser Electric Dental Countertop Professional Oral Irrigator For Teeth, Aquarius, White
72,786
by Waterpik
$59.99
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Waterpik WP-660 Water Flosser Electric Dental Countertop Professional Oral Irrigator For Teeth, Aquarius, White','72,786','by Waterpik','$59.99')
Schick Hydro Silk Touch-Up Multipurpose Exfoliating Dermaplaning Tool, Eyebrow Razor, and Facial Razor with Precision Cover, 3 Count (Packaging May Vary)
113,269
by Schick Hydro Silk
$68
27
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Schick Hydro Silk Touch-Up Multipurpose Exfoliating Dermaplaning Tool, Eyebrow Razor, and Facial Razor with Precision Cover, 3 Count (Packaging May Vary)','113,269','by Schick Hydro Silk','$68
27')
Neutrogena Makeup Remover Cleansing Face Wipes, Daily Cleansing Facial Towelettes to Remove Waterproof Makeup and Mascara, Alcohol-Free, Value Twin Pack, 25 Count, 2 Pack
62,881
by Neutrogena
$4
99
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Neutrogena Makeup Remover Cleansing Face Wipes, Daily Cleansing Facial Towelettes to Remove Waterproof Makeup and Mascara, Alcohol-Free, Value Twin Pack, 25 Count, 2 Pack','62,881','by Neutrogena','$4
99')
Gillette Fusion Power Men's Razor Blades - 8 Refills
32,435
by Gillette
$6.99
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Gillette Fusion Power Men's Razor Blades - 8 Refills','32,435','by Gillette','$6.99')
Softsoap Moisturizing Liquid Hand Soap, Soothing Clean Aloe Vera - 7.5 Fluid Ounces (6 Pack)
47,238
by Softsoap
$8
12
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Softsoap Moisturizing Liquid Hand Soap, Soothing Clean Aloe Vera - 7.5 Fluid Ounces (6 Pack)','47,238','by Softsoap','$8
12')
Neutrogena Lightweight Body Oil for Dry Skin, Sheer Body Moisturizer in Light Sesame Formula, 16 fl. oz
13,011
by Neutrogena
$11.96
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Neutrogena Lightweight Body Oil for Dry Skin, Sheer Body Moisturizer in Light Sesame Formula, 16 fl. oz','13,011','by Neutrogena','$11.96')
Crest 3D White Whitestrips with Light, Teeth Whitening Strips Kit, 10 Treatments, 20 Individual Strips (Packaging May Vary)
9,172
by Crest
$23
91
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Crest 3D White Whitestrips with Light, Teeth Whitening Strips Kit, 10 Treatments, 20 Individual Strips (Packaging May Vary)','9,172','by Crest','$23
91')
Neutrogena Hydro Boost Hyaluronic Acid Hydrating Water Gel Daily Face Moisturizer for Dry Skin, Oil-Free, Non-Comedogenic Face Lotion, 1.7 fl. oz
51,417
by Neutrogena
$32.51
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Neutrogena Hydro Boost Hyaluronic Acid Hydrating Water Gel Daily Face Moisturizer for Dry Skin, Oil-Free, Non-Comedogenic Face Lotion, 1.7 fl. oz','51,417','by Neutrogena','$32.51')
CHI 44 Iron Guard Thermal Protection Spray 8 Fl Oz
17,660
by CHI
$5
33
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('CHI 44 Iron Guard Thermal Protection Spray 8 Fl Oz','17,660','by CHI','$5
33')
Crest 3D White Toothpaste Radiant Mint (3 Count of 4.1 oz Tubes), 12.3 oz (Packaging May Vary)
40,354
by Crest
$13.99
Here, inside the loop, you would execute INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price) VALUES ('Crest 3D White Toothpaste Radiant Mint (3 Count of 4.1 oz Tubes), 12.3 oz (Packaging May Vary)','40,354','by Crest','$13.99')
added a total of 12 records
REVISION
Here is what your code needs to look like.
Num_Groups = min(len(Product_Name),len(Product_Description),len(Vendor), len(Price))
con = sqlite3.connect('/home/mypc/Scrivania/aaaa/Database.db')
cursor = con.cursor()
records_added = 0
for i in range(Num_Groups):
print(Product_Name[i].text)
print(Product_Description[i].text)
print(Vendor[i].text)
print(Price[i].text)
Values = f" VALUES ('{Product_Name[i].text}','{Product_Description[i].text}','{Vendor[i].text}','{Price[i].text}')"
sqlite_insert_query = 'INSERT INTO TableExample1 (Product_Name, Product_Description, Vendor, Price)' + Values
print("Qui, all'interno del ciclo, eseguiresti" + sqlite_insert_query)
count = cursor.execute(sqlite_insert_query)
con.commit()
print("Record inserted successfully ", cursor.rowcount)
records_added = records_added + 1
cursor.close()
The body of code indented under the for: is executed for each value of i (in the case you mentioned, from 0 to 11). Right now, because your database insert is outside this loop, it is only executed once.
Really the last edit:
We will use a parameterized query to let the database engine handle the apostrophe in the values (and anything else I may have missed). From what I am reading, that is a safer approach anyway to help prevent SQL injection attacks.
Change the Values= line (keeping the same indentation) to
Values = (Product_Name[i].text, Product_Description[i].text, Vendor[i].text, Price[i].text)
Change the sqlite_insert_query = line to say (keeping the same indentation as now):
sqlite_insert_query = 'INSERT INTO TableExample1 (Product_NameDB, Product_DescriptionDB, VendorDB, PriceDB) VALUES (?, ?, ?, ?);'
change the count=cursor.execute line to say
count = cursor.execute(sqlite_insert_query, Values)
That should work better and be safer. If it works, there is some optional cleaning up you could do. For example you could set sqlite_insert_query outside the loop, back where you connect to the database and initialize variables. You could also stop printing sqlite_insert_query and print Values instead (or just not print any of it since you have the 4 lines earlier that print the values).
Related
More efficient way to make dynamic sql queries than iteration
I have to make multiple sql queries of entire tables, and concatenate them into one big data table. I have a dictionary where the key is a team name, and the value serves as an acronym where the acronym is the prefix to mySQL data tables engine = create_engine('mysql+mysqlconnector://%s:%s#%s/%s' % (mysql_user, mysql_password, mysql_host, mysql_dbname), echo=False, pool_recycle=1800) mysql_conn = engine.connect() team_dfs = [] nba_dict = {'New York Knicks': 'nyk', 'Boston Celtics': 'bos', 'Golden State Warriors': 'gsw', 'New York Knicks': 'nyk'} for name, abbr in nba_dict.items() query = f''' SELECT * from {abbr}_record ''' df = pd.read_sql_query(query, mysql_conn) df['team_name'] = name team_dfs.append(df) team_dfs = pd.concat(team_dfs) Is there a better way to refactor this code and make it more efficient?
Your database layout, with a separate table for each team, is doomed to inefficiency whenever you need to retrieve data for more than one team at a time. You would be much much better off putting all that data in one table, giving the table a column mentioning the team associated with each row. Why inefficient? More tables: more work. And, more queries: more work. I suggest you push back, hard, on the designer of this database table structure. Its design is, bluntly, wrong. If you must live with this structure, I suggest you create the following view. It will fake the single-table approach and give you your "gold layer". You get away with this because pro sports franchises don't come and go that often. You do this just once in your database. CREATE OR REPLACE VIEW teams_record AS SELECT 'nyk' team, * FROM nyk_record UNION ALL SELECT 'bos' team, * FROM bos_record UNION ALL SELECT 'gsw' team, * FROM gsw_record UNION ALL .... the other teams Then you can do SELECT * FROM teams_record ORDER BY team to get all your data.
If your nba_dict is fixed, you can use UNION to manually combine the result table of SQL. abbr = list(nba_dict.values()) query = f''' SELECT * from {abbr[0]}_record UNION SELECT * from {abbr[1]}_record UNION SELECT * from {abbr[2]}_record UNION SELECT * from {abbr[3]}_record UNION ''' df = pd.read_sql_query(query, mysql_conn) df['team_name'] = list(nba_dict.keys())
How do I insert Lists in python into a SQL Table
I have to do some work with a SQL Database. I have done a bit of research on with video but I got very confused Could you please explain to me how I insert python lists index into a SQL database? I'm trying to make the Location a foreign key from its own table. This will allow for easier access in the future so that I can call all the cities from one country with a WHERE statement. I'm doing this so that I can scale this in the future to something else but I want to know how to do it with this example. LocationID - United Kingdom = 1 France = 2 Denmark = 3 Germany = 4 Place: London + LocationID1 This is how I want it to look. This is what I have so far: import sqlite3 Location = ["United Kingdom","Germany","France","Denmark"] Places = ["London","Tottenham","Munich","Paris","Frankfurt"] conn = sqlite3.connect("Location.db") c = conn.cursor() c.execute("INSERT INTO Places Location[0]) conn.commit() conn.close() This returns an error for me I don't know if I need to provide the database file. Just ask me if you need it to assist me
Run checks on Items from tables in Sqlite and python
I have two tables below: ---------- Items | QTY ---------- sugar | 14 mango | 10 apple | 50 berry | 1 ---------- Items |QTY ---------- sugar |10 mango |5 apple |48 berry |1 I use the following query in python to check difference between the QTY of table one and table two. cur = conn.cursor() cur.execute("select s.Items, s.qty - t.qty as quantity from Stock s join Second_table t on s.Items = t.Items;") remaining_quantity = cur.fetchall() I'm a bit stuck on how to go about what I need to accomplish. I need to check the difference between the quantity of table one and table two, if the quantity (difference) is under 5 then for those Items I want to be able to store this in another table column with the value 1 if not then the value will be 0 for those Items. How can I go about this? Edit: I have attempted this like by looping through the rows and if the column value is less than 5 then insert into the new table with the value below. : for row in remaining_quantity: print(row[1]) if((row[1]) < 5): cur.execute('INSERT OR IGNORE INTO check_quantity_tb VALUES (select distinct s.Items, s.qty, s.qty - t.qty as quantity, 1 from Stock s join Second_table t on s.Items = t.Items'), row) print(row) But I get a SQL syntax error not sure where the error could be :/
First modify your first query so you retrieve all relevant infos and don't have to issue subqueries later: readcursor = conn.cursor() readcursor.execute( "select s.Items, s.qty, s.qty - t.qty as remain " "from Stock s join Second_table t on s.Items = t.Items;" ) Then use it to update your third table: writecursor = conn.cursor() for items, qty, remain in readcursor: print(remain) if remain < 5: writecursor.execute( 'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)', (items, qty, remain, 1) ) conn.commit() Note the following points: 1/ We use two distinct cursor so we can iterate over the first one while wrting with the second one. This avoids fetching all results in memory, which can be a real life saver on huge datasets 2/ when iterating on the first cursor, we unpack the rows into their individual componants. This is called "tuple unpacking" (but actually works for most sequence types): >>> row = ("1", "2", "3") >>> a, b, c = row >>> a '1' >>> b '2' >>> c '3' 3/ We let the db-api module do the proper sanitisation and escaping of the values we want to insert. This avoids headaches with escaping / quoting etc and protects your code from SQL injection attacks (not that you might have one here, but that's the correct way to write parameterized queries in Python). NB : since you didn't not post your full table definitions nor clear specs - not even the full error message and traceback - I only translated your code snippet to something more sensible (avoiding the costly and useless subquery, which migh or not be the cause of your error). I can't garantee it will work out of the box, but at least it should put you back on tracks. NB2 : you mentionned you had to set the last col to either 1 or 0 depending on remain value. If that's the case, you want your loop to be: writecursor = conn.cursor() for items, qty, remain in readcursor: print(remain) flag = 1 if remain < 5 else 0 writecursor.execute( 'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)', (items, qty, remain, flag) ) conn.commit() If you instead only want to process rows where remain < 5, you can specify it directly in your first query with a where clause.
Find the rows of data in a database
I need some basic help with my code, I want to fetch the data from the sqlite3 database using with the variable self.channels_Index. I'm using self.channels_Index to defined it when I add up the value in each time when I pressing on the down arrow button of the keyboard. So when I try this: programs = list() #get the programs list profilePath = xbmc.translatePath(os.path.join('special://userdata/addon_data/script.tvguide', 'source.db')) con = database.connect(profilePath) cur = con.cursor() cur.execute('SELECT channel, title, start_date, stop_date FROM programs WHERE channel=?', [self.channels_Index]) programs = cur.fetchall() start_pos = 375 # indent for first program print program for ind, row in enumerate(programs): program = row[1].encode('ascii'), str(row[2]), str(row[3]) I will get the result like this: 17:58:26 T:4976 NOTICE: [] My database: ABC Family ABC Family ABC Family ABC Family ABC Family ..etc until to 69 rows CBS CBS CBS CBS CBS ..etc until to 69 rows ESPN ESPN ESPN ESPN ESPN ..etc until to 69 rows FOX NEWS FOX NEWS FOX NEWS FOX NEWS FOX NEWS FOX NEWS ..etc until to 69 rows What I want to achieve is to find the rows in the database when self.channels_Index show as 7 to or whatever it is that I want to multiply it by 69 times then I want to get the 69 rows of data before print a list of data. Example: When the self.channels_Index show as 4, I want to find the rows of FOX NEWS in a database to get the whole data in 69 times then print a list of FOX NEWS data. Can you please help me how I can do that when using with self.channels_Index? EDIT: When I try this: cur = con.cursor() cur.execute('SELECT DISTINCT channel FROM programs;') channel_list = sorted(row[0] for row in cur.fetchall()) cur.execute('SELECT title, start_date, stop_date FROM programs WHERE channel=?;', channel_list[self.channels_Index]) programs = cur.fetchall() It give me the error: ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 10 supplied.
It looks like you're using the channels_Index variable to refer to the ith channel, sorted alphabetically in ascending order. In a database, there is no guaranteed sort order, so you would have to figure out the name of the ith channel first. The pure SQL way would be to use a query that explicitly sorts the data; for example, this query would get you the third channel, in alphabetical order: SELECT DISTINCT channel FROM programs ORDER BY channel ASC LIMIT 1 OFFSET 2; But you want to be able to browse through different channels, and the OFFSET value isn't one you can parameterize in your query. A better solution might be to just get all the channel names first, and sort them in Python: cur.execute('SELECT DISTINCT channel FROM programs;') channel_list = sorted(row[0] for row in cur.fetchall()) Now you can refer to the ith channel by applying the index to channel_list and passing the result to your query: cur.execute('SELECT title, start_date, stop_date FROM programs WHERE channel=?;', (channel_list[self.channels_Index],)) You don't need to select channel since you know its value already. This approach works no matter how many rows there are in your database corresponding to each channel.
Update PostgreSQL database with daily stock prices in Python
So I found a great script over at QuantState that had a great walk-through on setting up my own securities database and loading in historical pricing information. However, I'm not trying to modify the script so that I can run it daily and have the latest stock quotes added. I adjusted the initial data load to just download 1 week worth of historicals, but I've been having issues with writing the SQL statement to see if the row exists already before adding. Can anyone help me out with this. Here's what I have so far: def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data): """Takes a list of tuples of daily data and adds it to the database. Appends the vendor ID and symbol ID to the data. daily_data: List of tuples of the OHLC data (with adj_close and volume)""" # Create the time now now = datetime.datetime.utcnow() # Amend the data to include the vendor ID and symbol ID daily_data = [(data_vendor_id, symbol_id, d[0], now, now, d[1], d[2], d[3], d[4], d[5], d[6]) for d in daily_data] # Create the insert strings column_str = """data_vendor_id, symbol_id, price_date, created_date, last_updated_date, open_price, high_price, low_price, close_price, volume, adj_close_price""" insert_str = ("%s, " * 11)[:-2] final_str = "INSERT INTO daily_price (%s) VALUES (%s) WHERE NOT EXISTS (SELECT 1 FROM daily_price WHERE symbol_id = symbol_id AND price_date = insert_str[2])" % (column_str, insert_str) # Using the postgre connection, carry out an INSERT INTO for every symbol with con: cur = con.cursor() cur.executemany(final_str, daily_data)
Some notes regarding your code above: It's generally easier to defer to now() in pure SQL than to try in Python whenever possible. It avoids lots of potential pitfalls with timezones, library differences, etc. If you construct a list of columns, you can dynamically generate a string of %s's based on its size, and don't need to hardcode the length into a repeated string with is then sliced. Since it appears that insert_daily_data_into_db is meant to be called from within a loop on a per-stock basis, I don't believe you want to use executemany here, which would require a list of tuples and is very different semantically. You were comparing symbol_id to itself in the sub select, instead of a particular value (which would mean it's always true). To prevent possible SQL Injection, you should always interpolate values in the WHERE clause, including sub selects. Note: I'm assuming that you're using psycopg2 to access Postgres, and that the primary key for the table is a tuple of (symbol_id, price_date). If not, the code below would need to be tweaked at least a bit. With those points in mind, try something like this (untested, since I don't have your data, db, etc. but it is syntactically valid Python): def insert_daily_data_into_db(data_vendor_id, symbol_id, daily_data): """Takes a list of tuples of daily data and adds it to the database. Appends the vendor ID and symbol ID to the data. daily_data: List of tuples of the OHLC data (with adj_close and volume)""" column_list = ["data_vendor_id", "symbol_id", "price_date", "created_date", "last_updated_date", "open_price", "high_price", "low_price", "close_price", "volume", "adj_close_price"] insert_list = ['%s'] * len(column_str) values_tuple = (data_vendor_id, symbol_id, daily_data[0], 'now()', 'now()', daily_data[1], daily_data[2], daily_data[3], daily_data[4], daily_data[5], daily_data[6]) final_str = """INSERT INTO daily_price ({0}) VALUES ({1}) WHERE NOT EXISTS (SELECT 1 FROM daily_price WHERE symbol_id = %s AND price_date = %s)""".format(', '.join(column_list), ', '.join(insert_list)) # Using the postgre connection, carry out an INSERT INTO for every symbol with con: cur = con.cursor() cur.execute(final_str, values_tuple, values_tuple[1], values_tuple[2])