Python script to save bookmarks into a json file - python

I actually wanted my bookmarks for a text classifier .It needs data in .json format .So i want to know a python script which will retrieve data from the bookmarks directory and store it in a .json file.(I am using ubuntu)

Google Chrome already saves bookmarks in a form of JSON. Your question does not define what is desired outcome so here is a simple code to access and print the whole file of your saved bookmarks on Google Chrome Windows operating system. You will need to do some adjustments to the code as it is designed to run on Windows rather than Ubuntu as I do not have access to it at this moment.
import getpass
import json
user = getpass.getuser()
loc = "C:/Users/{}/AppData/Local/Google/Chrome/User Data/Default/Bookmarks.bak".format(user)
f = open(loc, encoding="utf8")
data = json.load(f)
print(data)
Edit:
import getpass
import json
user = getpass.getuser()
loc = "C:/Users/{}/AppData/Local/Google/Chrome/User Data/Default/Bookmarks.bak".format(user)
with open(loc, encoding="utf8") as f:
data = json.load(f)
for y in range(0,100):
try:
for x in data["roots"]["bookmark_bar"]["children"][y]["children"]:
print(x["url"])
except:
pass

Related

Export/convert Dialogflow agent to csv or excel file using python

How to export all Questions and answers to csv or excel file?
I have exported dialogflow agent in to zip file and I got the two json files for each question or intent.
Is there any way to create a Question and answer pair in csv or excel file?
The zip file contains two directories intents and entities. The intents directory contains Dialogflow each intents' response and training phrases. You can observe the pattern in JSON files and write a script to make a csv file out of it.
import os
import csv
import json
all_intents = os.listdir('intents')
with open('agent.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Response", "Questions"])
for intent in all_intents:
write = []
if intent.find('_usersays_en.json') == -1:
try:
with open('intents/' + intent) as f:
data = json.load(f)
resp = ''
try:
resp = data['responses'][0]['messages'][0]['speech'][0]
except:
print(intent)
write.append(resp)
except:
print(intent)
try:
with open('intents/' + intent.replace(".json", "") + '_usersays_en.json') as f:
data = json.load(f)
for d in data:
qn = (d['data'][0]['text'])
write.append(qn)
except:
print(intent.replace(".json", "") + '_usersays_en.json')
writer.writerow(write)
Instructions to run the code:
Export the agent as zip.
Unzip the file. You will see entities and intents directories getting extracted from zip.
Have this python file and intents directory in the same directory.
run python3 filename.py(The name of the file containing the code).
agent.csv will be created.
All intents with no response or training phrases will be displayed on the terminal.

Creating view in browser functionality with python

I have been struggling with this problem for a while but can't seem to find a solution for it. The situation is that I need to open a file in browser and after the user closes the file the file is removed from their machine. All I have is the binary data for that file. If it matters, the binary data comes from Google Storage using the download_as_string method.
After doing some research I found that the tempfile module would suit my needs, but I can't get the tempfile to open in browser because the file only exists in memory and not on the disk. Any suggestions on how to solve this?
This is my code so far:
import tempfile
import webbrowser
# grabbing binary data earlier on
temp = tempfile.NamedTemporaryFile()
temp.name = "example.pdf"
temp.write(binary_data_obj)
temp.close()
webbrowser.open('file://' + os.path.realpath(temp.name))
When this is run, my computer gives me an error that says that the file cannot be opened since it is empty. I am on a Mac and am using Chrome if that is relevant.
You could try using a temporary directory instead:
import os
import tempfile
import webbrowser
# I used an existing pdf I had laying around as sample data
with open('c.pdf', 'rb') as fh:
data = fh.read()
# Gives a temporary directory you have write permissions to.
# The directory and files within will be deleted when the with context exits.
with tempfile.TemporaryDirectory() as temp_dir:
temp_file_path = os.path.join(temp_dir, 'example.pdf')
# write a normal file within the temp directory
with open(temp_file_path, 'wb+') as fh:
fh.write(data)
webbrowser.open('file://' + temp_file_path)
This worked for me on Mac OS.

How to download a CSV file from the World Bank's dataset

I would like to automate the download of CSV files from the World Bank's dataset.
My problem is that the URL corresponding to a specific dataset does not lead directly to the desired CSV file but is instead a query to the World Bank's API. As an example, this is the URL to get the GDP per capita data: http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv.
If you paste this URL in your browser, it will automatically start the download of the corresponding file. As a consequence, the code I usually use to collect and save CSV files in Python is not working in the present situation:
baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen("%s" %(baseUrl))
myData = csv.reader(remoteCSV)
How should I modify my code in order to download the file coming from the query to the API?
This will get the zip downloaded, open it and get you a csv object with whatever file you want.
import urllib2
import StringIO
from zipfile import ZipFile
import csv
baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen(baseUrl)
sio = StringIO.StringIO()
sio.write(remoteCSV.read())
# We create a StringIO object so that we can work on the results of the request (a string) as though it is a file.
z = ZipFile(sio, 'r')
# We now create a ZipFile object pointed to by 'z' and we can do a few things here:
print z.namelist()
# A list with the names of all the files in the zip you just downloaded
# We can use z.namelist()[1] to refer to 'ny.gdp.pcap.cd_Indicator_en_csv_v2.csv'
with z.open(z.namelist()[1]) as f:
# Opens the 2nd file in the zip
csvr = csv.reader(f)
for row in csvr:
print row
For more information see ZipFile Docs and StringIO Docs
import os
import urllib
import zipfile
from StringIO import StringIO
package = StringIO(urllib.urlopen("http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv").read())
zip = zipfile.ZipFile(package, 'r')
pwd = os.path.abspath(os.curdir)
for filename in zip.namelist():
csv = os.path.join(pwd, filename)
with open(csv, 'w') as fp:
fp.write(zip.read(filename))
print filename, 'downloaded successfully'
From here you can use your approach to handle CSV files.
We have a script to automate access and data extraction for World Bank World Development Indicators like: https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS
The script does the following:
Downloading the metadata data
Extracting metadata and data
Converting to a Data Package
The script is python based and uses python 3.0. It has no dependencies outside of the standard library. Try it:
python scripts/get.py
python scripts/get.py https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS
You also can read our analysis about data from World Bank:
https://datahub.io/awesome/world-bank
Just a suggestion than a solution. You can use pd.read_csv to read any csv file directly from a URL.
import pandas as pd
data = pd.read_csv('http://url_to_the_csv_file')

Python script to save webpage and rename it while saving (save as - command)

Hi I searched a lot and ended up with no relevant results on how to save a webpage using python 2.6 and renaming it while saving.
Better user requests libraty:
import requests
pagelink = "http://www.example.com"
page = requests.get(pagelink)
with open('/path/to/file/example.html', "w") as file:
file.write(page.text)
You may want to use the urllib(2) package to access the webpage, and then save the file object to the desired location (os.path).
It should look something like this:
import urllib2, os
pagelink = "http://www.example.com"
page = urllib2.urlopen(pagelink)
with open(os.path.join('/(full)path/to/Documents',pagelink), "w") as file:
file.write(page)

generating a CSV file online on Google App Engine

I am using Google App Engine (python), I want my users to be able to download a CSV file generated using some data from the datastore (but I don't want them to download the whole thing, as I re-order the columns and stuff).
I have to use the csv module, because there can be cells containing commas. But the problem that if I do that I will need to write a file, which is not allowed on Google App Engine
What I currently have is something like this:
tmp = open("tmp.csv", 'w')
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
So I guess what I would want to do is either to handle cells with commas.. or to use the csv module without writing a file as this is not possible with GAE..
I found a way to use the CSV module on GAE! Here it is:
self.response.headers['Content-Type'] = 'application/csv'
writer = csv.writer(self.response.out)
writer.writerow(["foo", "foo,bar", "bar"])
This way you don't need to write any files
Here is a complete example of using the Python CSV module in GAE. I typically use it for creating a csv file from a gql query and prompting the user to save or open it.
import csv
class MyDownloadHandler(webapp2.RequestHandler):
def get(self):
q = ModelName.gql("WHERE foo = 'bar' ORDER BY date ASC")
reqs = q.fetch(1000)
self.response.headers['Content-Type'] = 'text/csv'
self.response.headers['Content-Disposition'] = 'attachment; filename=studenttransreqs.csv'
writer = csv.writer(self.response.out)
create row labels
writer.writerow(['Date', 'Time','User' ])
iterate through query returning each instance as a row
for req in reqs:
writer.writerow([req.date,req.time,req.user])
Add the appropriate mapping so that when a link is clicked, the file dialog opens
('/mydownloadhandler',MyDownloadHandler),
import StringIO
tmp = StringIO.StringIO()
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
contents = tmp.getvalue()
tmp.close()
print contents

Categories

Resources