This application will read the mailbox data (mbox.txt) count up the number email messages per organization (i.e. domain name of the email address) using a database with the following schema to maintain the counts.
CREATE TABLE Counts (org TEXT, count INTEGER)
When you have run the program on mbox.txt upload the resulting database file above for grading.
If you run the program multiple times in testing or with different files, make sure to empty out the data before each run.
You can use this code as a starting point for your application: http://www.pythonlearn.com/code/emaildb.py. The data file for this application is the same as in previous assignments: http://www.pythonlearn.com/code/mbox.txt.
First time to learn Sqlite. I am very confused about this assignment although it seems to be easy. I don't know how can I connect Python codes to Sqlite. It seems that they don't need the code as assignment. All the need is database file. How should I solve this problem. Don't know how to start it. Much appreciated it!
The starting code you've been given is a really good template for what you want to do. The difference is that - in that example - you're counting occurences of email address, and in this problem you're counting domains.
First thing to do is think about how to get domain names from email addresses. Building from the code given (which sets email = pieces[1]):
domain = email.split('#')[1]
This will break the email on the # character, and return the second item (the part after the '#'), which is the domain - the thing you want to count.
After this, go through the SQL statements in the code and replace 'email' with 'domain', so that you're counting the right thing.
One last thing - the template code checks 'mbox-short.txt' - you'll need to edit that as well for the file you want.
import sqlite3
conn = sqlite3.connect('emaildb2.sqlite')
cur = conn.cursor()
cur.execute('''
DROP TABLE IF EXISTS Counts''')
cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox.txt'
fh = open(fname)
list_1 =[]
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
email = pieces[1]
dom = email.find('#')
org = email[dom+1:len(email)]
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, 1)''', (org,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
(org,))
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
I am still new here, but I want to thank Stidgeon for pointing me in the right direction. I suspect other Using Databases with Python students will end up here too.
There are two things you need to do with the source code.
domain = email.split('#')[1] http://www.pythonlearn.com/code/emaildb.py
Change from email TEXT to org TEXT when the database is generated.
That should get you on your way.
import sqlite3
conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
org = pieces[1].split('#')
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org[1],))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, 1)''', (org[1],))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
(org[1],))
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
print('-----------------done----------------')
Related
I have my python script which reads an excel column row by row and returns all rows str(values).
I want to write another script which will allow put these values to sql db. I've already written connect method:
def db_connect():
adr = 'some_addr'
uid = 'some_uid'
pwd = 'pwd'
port = port
dsn_tns = cx_Oracle.makedsn(adr, port, SID)
db = cx_Oracle.connect('username', 'pass', dsn_tns)
cur = db.cursor()
cur.execute('update TABLE set ROW = 666 where ANOTHER_ROW is null')
db.commit()
This method does an update but it sets 666 for ALL rows. How to do it by kind of iteration in sql? For example, first row of output == 1, second == 23, third == 888.
If I understand correctly what you are trying to do here it should be done in two phases. First select all rows for update (based on chosen condition), then you can iteratively update each of these rows.
It cannot be done in single query (or on only single condition that does not change through a number of queries), because SQL works on sets, that's why each time your query is executed you are updating whole table, and in the end only getting result of the last query.
You can use the "rownum" expression, as in:
cur.execute("update TABLE set ROW = rownum where ANOTHER_ROW is null")
This will start with the value 1 and increment up by one for each row updated.
If you want more control over the value to set, you can also do the following in PL/SQL (untested):
cur.execute("""
declare
t_NewValue number;
cursor c_Data is
select ROW, ANOTHER_ROW
from TABLE
where ANOTHER_ROW is null
for update;
begin
t_NewValue := 1;
for row in c_Data loop
update TABLE set ROW = t_NewValue
where current of c_Data;
t_NewValue := t_NewValue + 1;
end loop;
end;""")
This gives you the most control. You can use whatever logic you require to control what the new value should be.
Please take a look at another method which is writing to excel:
adr = 'some_addr'
uid = 'some_uid'
pwd = 'pwd'
port = port
dsn_tns = cx_Oracle.makedsn(adr, port, SID)
db = cx_Oracle.connect('username', 'pass', dsn_tns)
cur = db.cursor()
cells = excel.read_from_cell()
indices_and_statuses = []
stat = execute_script(some_js)
for req_id in cells:
indices_and_statuses.append((cells.index(req_id), stat))
cur.execute("""update TABLE set ROW ="""+"'"+req_id+"'"+"""where ANOTHER_ROW is null""")
db.commit()
db.close()
And in this code when you put print(req_id) in this FOR statement, you will see that req_id is changing. But in DB only the last req_id is saved.
import re
import sqlite3
from collections import Counter
from string import punctuation
from math import sqrt
# initialize the connection to the database
connection = sqlite3.connect('chatbot.sqlite')
cursor = connection.cursor()
# create the tables needed by the program
create_table_request_list = [
'CREATE TABLE words(word TEXT UNIQUE)',
'CREATE TABLE sentences(sentence TEXT UNIQUE, used INT NOT NULL DEFAULT 0)',
'CREATE TABLE associations (word_id INT NOT NULL, sentence_id INT NOT NULL, weight REAL NOT NULL)',
]
for create_table_request in create_table_request_list:
try:
cursor.execute(create_table_request)
except:
pass
def get_id(entityName, text):
"""Retrieve an entity's unique ID from the database, given its associated text.
If the row is not already present, it is inserted.
The entity can either be a sentence or a word."""
tableName = entityName + 's'
columnName = entityName
cursor.execute('SELECT rowid FROM ' + tableName + ' WHERE ' + columnName + ' = ?', (text,))
row = cursor.fetchone()
if row:
return row[0]
else:
cursor.execute('INSERT INTO ' + tableName + ' (' + columnName + ') VALUES (?)', (text,))
return cursor.lastrowid
def get_words(text):
"""Retrieve the words present in a given string of text.
The return value is a list of tuples where the first member is a lowercase word,
and the second member the number of time it is present in the text."""
wordsRegexpString = '(?:\w+|[' + re.escape(punctuation) + ']+)'
wordsRegexp = re.compile(wordsRegexpString)
wordsList = wordsRegexp.findall(text.lower())
return Counter(wordsList).items()
B = 'Hello!'
while True:
# output bot's message
print('B: ' + B)
# ask for user input; if blank line, exit the loop
H = raw_input('H: ').strip()
if H == '':
break
# store the association between the bot's message words and the user's response
words = get_words(B)
words_length = sum([n * len(word) for word, n in words])
sentence_id = get_id('sentence', H)
for word, n in words:
word_id = get_id('word', word)
weight = sqrt(n / float(words_length))
cursor.execute('INSERT INTO associations VALUES (?, ?, ?)', (word_id, sentence_id, weight))
connection.commit()
# retrieve the most likely answer from the database
cursor.execute('CREATE TEMPORARY TABLE results(sentence_id INT, sentence TEXT, weight REAL)')
words = get_words(H)
words_length = sum([n * len(word) for word, n in words])
for word, n in words:
weight = sqrt(n / float(words_length))
cursor.execute('INSERT INTO results SELECT associations.sentence_id, sentences.sentence, ?*associations.weight/(4+sentences.used) FROM words INNER JOIN associations ON associations.word_id=words.rowid INNER JOIN sentences ON sentences.rowid=associations.sentence_id WHERE words.word=?', (weight, word,))
# if matches were found, give the best one
cursor.execute('SELECT sentence_id, sentence, SUM(weight) AS sum_weight FROM results GROUP BY sentence_id ORDER BY sum_weight DESC LIMIT 1')
row = cursor.fetchone()
cursor.execute('DROP TABLE results')
# otherwise, just randomly pick one of the least used sentences
if row is None:
cursor.execute('SELECT rowid, sentence FROM sentences WHERE used = (SELECT MIN(used) FROM sentences) ORDER BY RANDOM() LIMIT 1')
row = cursor.fetchone()
# tell the database the sentence has been used once more, and prepare the sentence
B = row[1]
cursor.execute('UPDATE sentences SET used=used+1 WHERE rowid=?', (row[0],))
This is a code written for creating a chatbot. When I try running this code on cmd. By using command python chatbot.py, it returns an error saying invalid syntax.
IS there any way i can remove this error and run this code on my system?
it gives error: File "chatbot.py", line 1 syntax: invalid syntax
What version of Python are you running and in what environment? I ran this code on my Python 3.70b4 under Windows and it worked fine except for line 52:
H = raw_input('H: ').strip()
Which you have to change to:
H = input('H: ').strip()
This is probably unrelated directly to your issue, but the code you posted did run fine for me in my environment, after I made that one change (and of course installed any libraries or modules needed).
i am working on python. i want to update every row of sql with new value.
my code is:
val = cursor.execute("select id from tweeter1")
words= processedRow.split()
fdist2=len(words)
for id1 in val:
cursor.execute("""UPDATE TWEETER1 SET t1=%s where id = %s""",(fdist2,id1))
db.commit()
when i execute this code i got an error saying:
for id1 in val:
TypeError: 'long' object is not iterable
any help will be highly appreciated. Thank you
You have to iterate over the cursor object:
cursor.execute("select id from tweeter1")
words = processedRow.split()
fdist2 = len(words)
for id1 in cursor.fetchall():
cursor.execute("""UPDATE TWEETER1 SET t1=%s where id = %s""",(fdist2,id1))
db.commit()
But as long as you change all rows to the same value, you only need to UPDATE once without any where-clause:
words = processedRow.split()
fdist2 = len(words)
cursor.execute("""UPDATE TWEETER1 SET t1=%s""",(fdist2,))
db.commit()
I'm trying to add the list that is made after it parses through each line. As I go through each code I get different errors
(C:\Users\myname\Desktop\pythonCourse>dblesson2
Enter file name: mbox.txt
['uct.ac.za']
Traceback (most recent call last):
File "C:\Users\myname\Desktop\pythonCourse\dblesson2.py", line 25, in
<module>
#VALUES ( ?, 1 )''', ( email, ) )
sqlite3.OperationalError: near "#VALUES": syntax error)
and I know that it is because I am not passing the correct data to the database but I can't figure this out on my own.
import sqlite
import re
conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()
cur.execute('''
DROP TABLE IF EXISTS Counts''')
cur.execute('''
CREATE TABLE Counts (email TEXT, count INTEGER)''')
fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: ') : continue
line = line.rstrip()
email = re.findall('#(\S+[a-zA-Z]+)', line)
print email
cur.execute('SELECT count FROM Counts WHERE email = ? ', (email))
row = cur.fetchone()
if row is None:
#cur.execute('''INSERT INTO Counts (email, count)
#VALUES ( ?, 1 )''', ( email, ) )
else :
cur.execute('UPDATE Counts SET count=count+1 WHERE email = ?',
(email, ))
# This statement commits outstanding changes to disk each
# time through the loop - the program can be made faster
# by moving the commit so it runs only after the loop completes
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT email, count FROM Counts ORDER BY count DESC LIMIT 10'
print
print "Counts:"
for row in cur.execute(sqlstr) :
print str(row[0]), row[1]
cur.close()`
You have a number of small errors in your program. Let me try to list them:
re.findall returns a list, but you seem to treat it as a single string. Try email = email[0] to only consider the first element of the list.
Your first SELECT statement has (email). Putting a single item inside parentheses does not make it a tuple. Try (email,) or [email] instead.
The if after the for loop is meant to occur for each iteration of the for loop, so it must be indented by one stop.
The body of the if cannot be empty. Either uncomment that operation, or change it to pass.
The body of your final for loop needs to be indented one stop.
As a courtesy to Stack Overflow readers, please copy-paste entire stand-alone programs, not merely snippets.
Here is the your program after I fixed the problems:
import sqlite3
import re
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
cur.execute('''
DROP TABLE IF EXISTS Counts''')
cur.execute('''
CREATE TABLE Counts (email TEXT, count INTEGER)''')
fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: ') : continue
line = line.rstrip()
email = re.findall('#(\S+[a-zA-Z]+)', line)
email = email[0]
cur.execute('SELECT count FROM Counts WHERE email = ? ', (email,))
row = cur.fetchone()
if row is None:
cur.execute('''INSERT INTO Counts (email, count)
VALUES ( ?, 1 )''', ( email, ) )
else :
cur.execute('UPDATE Counts SET count=count+1 WHERE email = ?',
(email, ))
# This statement commits outstanding changes to disk each
# time through the loop - the program can be made faster
# by moving the commit so it runs only after the loop completes
conn.commit()
# https://www.sqlite.org/lang_select.html
sqlstr = 'SELECT email, count FROM Counts ORDER BY count DESC LIMIT 10'
print
print "Counts:"
for row in cur.execute(sqlstr) :
print str(row[0]), row[1]
cur.close()
import sqlite3
conn=sqlite3.connect('emaildb.sqlite')
cur=conn.cursor()
cur.execute('''DROP TABLE IF EXISTS counts''')
cur.execute('''CREATE TABLE counts (org TEXT, count INTEGER)''')
f_name=raw_input('Enter file name: ')
if len(f_name)<1 : f_name='mbox.txt'
fn=open(f_name)
for line in fn:
if not line.startswith('From: ') : continue
words = line.split()
email=words[1]
domain=email.split('#')
organiz=domain[1]
print organiz
cur.execute('SELECT count FROM Counts WHERE org=?',(organiz, ))
row=cur.fetchone()
if row==None:
cur.execute('''INSERT INTO counts (org, count) VALUES (?,1)''',
(organiz, ))
else:
cur.execute('''UPDATE counts SET count=count+1 WHERE org=?''',(organiz,
))
conn.commit()
Writing a script to clean up some data. Super unoptimized but this cursor is
returning the number of results in the like query rather than the rows what am I doing wrong.
#!/usr/bin/python
import re
import MySQLdb
import collections
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="admin", # your username
passwd="", # your password
db="test") # name of the data base
# you must create a Cursor object. It will let
# you execute all the query you need
cur = db.cursor()
# Use all the SQL you like
cur.execute("SELECT * FROM vendor")
seen = []
# print all the first cell of all the rows
for row in cur.fetchall() :
for word in row[1].split(' '):
seen.append(word)
_digits = re.compile('\d')
def contains_digits(d):
return bool(_digits.search(d))
count_word = collections.Counter(seen)
found_multi = [i for i in count_word if count_word[i] > 1 and not contains_digits(i) and len(i) > 1]
unique_multiples = list(found_multi)
groups = dict()
for word in unique_multiples:
like_str = '%' + word + '%'
res = cur.execute("""SELECT * FROM vendor where name like %s""", like_str)
You are storing the result of cur.execute(), which is the number of rows. You are never actually fetching any of the results.
Use .fetchall() to get all result rows or iterate over the cursor after executing:
for word in unique_multiples:
like_str = '%' + word + '%'
cur.execute("""SELECT * FROM vendor where name like %s""", like_str)
for row in cur:
print row