I'm not looking for a "best" or most efficient script to do this. But I was wondering if there exists a script to pull Internet History for a day's time from, say, Google Chrome and log it to a txt file. I'd prefer if it were in Python or MATLAB.
If you guys have a different method using one of these languages utilizing locally stored browser history data from Google Chrome, I'd be all ears for that too.
I'd be super-thankful if anyone could help with this!
From my understanding, it seems easy to be done. I don't know if this is what you want.
Internet history from Chrome is stored at a specific path. Take Win7 for example, it's stored at win7: C:\Users\[username]\AppData\Local\Google\Chrome\User Data\Default\History
In Python:
f = open('C:\Users\[username]\AppData\Local\Google\Chrome\User Data\Default\History', 'rb')
data = f.read()
f.close()
f = open('your_expected_file_path', 'w')
f.write(repr(data))
f.close()
Building on what m170897017 said:
That file is an sqlite3 database, so taking repr() of its contents won't do anything meaningful.
You need to open the sqlite database and run SQL against it to get the data out. In python use the sqlite3 library in the stdlib to do this.
Here's a related SuperUser question that shows some SQL for getting URLs and timestamps: https://superuser.com/a/694283
Dodged sqlite3/sqlite, I'm using the Google Chrome extension "Export History", exporting everything into a CSV file, and subsequently loading that CSV file into cells within MATLAB.
Export History
My code turned out to be:
file_o = ['history.csv'];
fid = fopen(file_o, 'rt');
fmt = [repmat('%s', 1, 6) '%*[^\n]'];
C = textscan(fid,fmt,'Delimiter',',','CollectOutput',true);
C_unpacked = C{:};
C_urls = C_unpacked(1:4199, 5);
Here's another one:
import csv, sqlite3, os
from datetime import datetime, timedelta
connection = sqlite3.connect(os.getenv("APPDATA") + "\..\Local\Google\Chrome\User Data\Default\history")
connection.text_factory = str
cur = connection.cursor()
output_file = open('chrome_history.csv', 'wb')
csv_writer = csv.writer(output_file)
headers = ('URL', 'Title', 'Visit Count', 'Date (GMT)')
csv_writer.writerow(headers)
epoch = datetime(1601, 1, 1)
for row in (cur.execute('select url, title, visit_count, last_visit_time from urls')):
row = list(row)
url_time = epoch + timedelta(microseconds=row[3])
row[3] = url_time
csv_writer.writerow(row)
This isn't exactly what you are looking for. However, by using this you can manipulate the database tables to your liking
import os
import sqlite3
def Find_path():
User_profile = os.environ.get("USERPROFILE")
History_path = User_profile + r"\\AppData\Local\Google\Chrome\User Data\Default\History" #Usually this is where the chrome history file is located, change it if you need to.
return History_path
def Main():
data_base = Find_path()
con = sqlite3.connect(data_base) #Connect to the database
c = con.cursor()
c.execute("SELECT name FROM sqlite_master WHERE type='table' ORDER BY name") #Change this to your prefered query
print(c.fetchall())
if __name__ == '__main__':
Main()
Related
I am trying to debug this issue that I have been having for a couple of weeks now. I am trying to copy the result of a query in a Postgresql db into a csv file using psycopg2 and copy expert, however when my script finishes running, sometimes I end up with less rows than if I ran the query directly into the db using pgAdmin. This is the code that runs the query and saves it into a csv:
cursor = pqlconn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
fd = open("query.sql", 'r')
sql_query = fd.read()
fd.close()
csv_path = 'test.csv'
query = "copy (" + sql_query + \
") TO STDOUT WITH (FORMAT csv, DELIMITER ',', HEADER)"
with open(csv_path, 'w', encoding='utf-8') as f_output:
cursor.copy_expert(query, f_output)
print("Saved information to csv: ", csv_path)`
When it runs I will sometimes end up with less rows than if I ran it directly on the db, running it again still returns less rows than what I am already seeing in the db directly. Would appreciate any guidance on this, thanks!
I am trying to run a query, with the result saved as a CSV that is uploaded to a SharePoint folder. This is within Databricks via Pyspark.
My code below is close to doing this, but the final line is not functioning correctly - the file generated in SharePoint does not contain any data, though the dataframe does.
I'm new to Python and Databricks, if anyone can provide some guidance on how to correct that final line I'd really appreciate it!
from shareplum import Site
from shareplum.site import Version
import pandas as pd
sharepointUsername =
sharepointPassword =
sharepointSite =
website =
sharepointFolder =
# Connect to SharePoint Folder
authcookie = Office365(website, username=sharepointUsername, password=sharepointPassword).GetCookies()
site = Site(sharepointSite, version=Version.v2016, authcookie=authcookie)
folder = site.Folder(sharepointFolder)
FileName = "Data_Export.csv"
Query = "SELECT * FROM TABLE"
df = spark.sql(Query)
pandasdf = df.toPandas()
folder.upload_file(pandasdf.to_csv(FileName, encoding = 'utf-8'), FileName)
Sure my code is still garbage, but it does work. I needed to convert the dataframe into a variable containing CSV formatted data prior to uploading it to SharePoint; effectively I was trying to skip a step before. Last two lines were updated:
from shareplum.site import Version
import pandas as pd
sharepointUsername =
sharepointPassword =
sharepointSite =
website =
sharepointFolder =
# Connect to SharePoint Folder
authcookie = Office365(website, username=sharepointUsername, password=sharepointPassword).GetCookies()
site = Site(sharepointSite, version=Version.v2016, authcookie=authcookie)
folder = site.Folder(sharepointFolder)
FileName = "Data_Export.csv"
Query = "SELECT * FROM TABLE"
df = (spark.sql(QueryAllocation)).toPandas().to_csv(header=True, index=False, encoding='utf-8')
folder.upload_file(df, FileName)
I'm using a simple script to pull data from an Oracle DB and write the data to a CSV file using the CSV writer.
The table i'm querying contains about 25k records, the script runs perfectly except for its actually very slow. It takes 25 minutes to finish.
In what way could i speed up this by altering the code? Any tips from you heroes are welcome.
#
# Load libraries
#
from __future__ import print_function
import cx_Oracle
import time
import csv
#
# Connect to Oracle and select the proper data
#
con = cx_Oracle.connect('secret')
cursor = con.cursor()
sql = "select * from table"
#
# Determine how and where the filename is created
#
path = ("c:\\path\\")
filename = time.strftime("%Y%m%d-%H%M%S")
extentionname = (".csv")
csv_file = open(path+filename+extentionname, "w")
writer = csv.writer(csv_file, delimiter=',', lineterminator="\n",
quoting=csv.QUOTE_NONNUMERIC)
r = cursor.execute(sql)
for row in cursor:
writer.writerow(row)
cursor.close()
con.close()
csv_file.close()
Did you try using writerows function from csv module? Instead of writing each record one by one, it gives provision to write all at once. This should fasten up the things.
data = [] #data rows
with open('csv_file.csv', 'w') as csv_file:
writer = csv.DictWriter(csv_file)
writer.writeheader()
writer.writerows(data)
Alternatively, you can also use pandas module to write big chunk of data to CSV file. This method is explained with examples here.
I used following python script to dump a MySQL table to a CSV file. But it was saved in the same folder which python script is saved. I want to save it in another folder. How can I do it? Thank you
print 'Writing database to csv file'
import MySQLdb
import csv
import time
import datetime
import os
currentDate=datetime.datetime.now().date()
user = ''
passwd = ''
host = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=passwd, host=host, db=db)
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data on %s.csv' % currentDate ,'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print 'Done'
Change this:
with open('/full/path/tofile/Data on %s.csv' % currentDate ,'w') as f:
This solves your problem X. But you have a problem Y. That is 'How do i efficiently, dump CSV data from mysql, without having to write a lot of code?'
Answer to problem Y is SELECT INTO OUTFILE
I am using Google App Engine (python), I want my users to be able to download a CSV file generated using some data from the datastore (but I don't want them to download the whole thing, as I re-order the columns and stuff).
I have to use the csv module, because there can be cells containing commas. But the problem that if I do that I will need to write a file, which is not allowed on Google App Engine
What I currently have is something like this:
tmp = open("tmp.csv", 'w')
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
So I guess what I would want to do is either to handle cells with commas.. or to use the csv module without writing a file as this is not possible with GAE..
I found a way to use the CSV module on GAE! Here it is:
self.response.headers['Content-Type'] = 'application/csv'
writer = csv.writer(self.response.out)
writer.writerow(["foo", "foo,bar", "bar"])
This way you don't need to write any files
Here is a complete example of using the Python CSV module in GAE. I typically use it for creating a csv file from a gql query and prompting the user to save or open it.
import csv
class MyDownloadHandler(webapp2.RequestHandler):
def get(self):
q = ModelName.gql("WHERE foo = 'bar' ORDER BY date ASC")
reqs = q.fetch(1000)
self.response.headers['Content-Type'] = 'text/csv'
self.response.headers['Content-Disposition'] = 'attachment; filename=studenttransreqs.csv'
writer = csv.writer(self.response.out)
create row labels
writer.writerow(['Date', 'Time','User' ])
iterate through query returning each instance as a row
for req in reqs:
writer.writerow([req.date,req.time,req.user])
Add the appropriate mapping so that when a link is clicked, the file dialog opens
('/mydownloadhandler',MyDownloadHandler),
import StringIO
tmp = StringIO.StringIO()
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
contents = tmp.getvalue()
tmp.close()
print contents