How to access JSON translation from Google Translate API with Python - python

I'm trying to learn Serbian atm and got myself a csv file with the most frequently used words.
What I'd like to do now is have my script put each word into Google Translate via the API and save this translation to the same file.
Since I'm a total Python and JSON beginner I am massively confused about how to use the JSON I'm getting from the API.
How do I get to the translation?
from sys import argv
from apiclient.discovery import build
import csv
import json
script, filename = argv
serbian_words = []
# Open a CSV file with the serbian words in one column (one per row)
with open(filename, 'rb') as csvfile:
serbianreader = csv.reader(csvfile)
for row in serbianreader:
# Put all words in one single list
serbian_words.extend(row)
# send that list to google item by item to have it translated
def main():
service = build('translate', 'v2',
developerKey='xxx')
for word in serbian_words:
translation = service.translations().list(
source='sr',
target='de',
q = word
).execute()
print translation # Until here everything works totally fine.
if __name__ == '__main__':
main()
What Terminal prints for me looks like this {u'translations': [{u'translatedText': u'allein'}]} where the "allein" is the german translation of a serbian word.
How can I get to the "allein"? I've tried to figure this out by trying to implement the json Encoder and Decoder that comes with Python, but I can't figure it out.
I'd love any help on this and would be very grateful.

You can use item access to get to the innermost string:
translation['translations'][0]['translatedText']
or you could loop over all the translations listed (it's a list):
for trans in translation['translations']:
print trans['translatedText']
as Google's translation service can give more than one translation for a given text.

Related

how can i use the data i have in my json file to make the script work?

I'm recently learning python and I came across this exercise where I have to create a mail generator with the catchall, the latter must be inserted in an external json file, the only problem is that I can't tell the program to use the catchall which is placed in the json file, can anyone help me?
Thanks in advance
this is part of code: need to fix email_format
for i in range (emails_number):
letters_list = [string.digits, string.ascii_lowercase, string.ascii_uppercase]
letters_list_to_str = "".join(letters_list)
email_format = (json.data file)
email_generated = "".join(random.choices(letters_list_to_str, k=chars)) + email_format
print(email_generated)
email_gen()
If you want to open a json file in general, use:
import json
with open(jsonfilepath, "r") as f:
data = json.load(f)
then you can access the information in the file like a dictionary:
email_format=data["catchall"]

using python to extract data with duplicate names from a json array

I'll start by apologising if I use the wrong terms here, I am a rank beginner with python.
I have a json array containing 5 sets of data. The corresponding items in each set have duplicate names. I can extract them in java but not in python. The item(s) I want are called "summary_polyline". I have tried so many different ways in the last couple of weeks, so far nothing works.
This is the relevant part of my python-
#!/usr/bin/env python3.6
import os
import sys
from dotenv import load_dotenv, find_dotenv
import polyline
import matplotlib.pyplot as plt
import json
with open ('/var/www/vk7krj/running/strava_activities.json', 'rt') as myfile:
contents = myfile.read()
#print (contents)
#print (contents["summary_polyline"[1]])
activity1 = contents("summary_polyline"[1])
If I un-comment "print content", it prints the file to the screen ok.
I ran the json through an on-line json format checker and it passed ok
How do I extract the five "summay_polylines" and assign them to "activity1" to "activity5"?
If I right understand you, you need convert to json text data which was red from file.
with open ('/var/www/vk7krj/running/strava_activities.json', 'rt') as myfile:
contents = myfile.read()
# json_contents is a List with dicts now
json_contents = json.loads(contents)
# list with activities
activities = []
for dict_item in json_contents:
activities.append(dict_item)
# print all activities (whole file)
print(activities)
# print first activity
print(activities[0])
# print second activity
print(activities[1])

How do I process API return without writing to file in Python?

I am trying to slice the results of an API response to process just the first n values in Python without writing to a file first.
Specifically I want to do analysis on the "front page" from HN, which is just the first 30 items. However the API (https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty) gives you the first 500 results.
Right now I'm pulling top stories and writing to a file, then importing the file and truncating the string:
import json
import requests
#request JSON data from HN API
topstories = requests.get('https://hacker-news.firebaseio.com/v0/topstories.json')
#write return to .txt file named topstories.txt
with open('topstories.txt','w') as fd:
fd.write(topstories.text)
#truncate the text file to top 30 stories
f = open('topstories.txt','r+')
f.truncate(270)
f.close
This is inelegant and inefficient. I will have to do this again to extract each 8 digit object ID.
How do I process this API return data as much as possible in memory without writing to file?
Suggestion:
User jordanm suggested the code replacement:
fd.write(json.dumps(topstories.json()[:30]))
However that would just move the needle on when I would need to write/read versus doing anything else I want with it.
What you want is the io library
https://docs.python.org/3/library/io.html
Basically:
import io
f = io.StringIO(topstories.text)
f.truncate(270)

Parsing the Skills Section in a Resume in Python

I am trying to parse the skills section of a resume in python. I found a library by Mr. Omkar Pathak called pyresparser and I was able to extract a PDF resume's contents into a resume.txt file.
However, I was wondering how I can go about only extracting the skills section from the resume into a list and then writing that list possibly into a query.txt file.
I'm reading the contents of the resume.txt into a list and then comparing that to a list called skills which stores the extracted contents from a skill.cv file. Currently, the skills list is empty and I was wondering how I can go about storing the skills into that list? Is this the correct approach? Any help is greatly appreciated, thank you!
import string
import csv
import re
import sys
import importlib
import os
import spacy
from pyresparser import ResumeParser
import pandas as pd
import nltk
from spacy.matcher import matcher
import multiprocessing as mp
def main():
data = ResumeParser("C:/Users/infinitel88p/Downloads/resume.pdf").get_extracted_data()
print(data)
# Added encoding utf-8 to prevent unicode error
with open("C:/Users/infinitel88p/Downloads/resume.txt", "w", encoding='utf-8') as rf:
rf.truncate()
rf.write(str(data))
print("Resume results are getting printed into resume.txt.")
# Extracting skills
resume_list = []
skill_list = []
data = pd.read_csv("skills.csv")
skills = list(data.columns.values)
resume_file = os.path.dirname(__file__) + "/resume.txt"
with open(resume_file, 'r', encoding='utf-8') as f:
for line in f:
resume_list.append(line.strip())
for token in resume_list:
if token.lower() in skills:
skill_list.append(token)
print(skill_list)
if __name__ == "__main__":
main()
An easy way ( but not an efficient way ) to do:
Have a set of all possible relevant skills in a text file. For the words in skills sections of the resume or for all the words in resume, take each words and check whether it matches with any of the word from the text file. If a word matched, then that skill is present in resume. This way, you could identify a set of skills present in the resume.
For further addition or better identification, you can use naive-bayes classification or uni-gram probabilities to extract more relevant skills.

python: getting npm package data from a couchdb endpoint

I want to fetch the npm package metadata. I found this endpoint which gives me all the metadata needed.
I made a following script to get this data. My plan is to select some specific keys and add that data in some database (I can also store it in a json file, but the data is huge). I made following script to fetch the data:
import requests
import json
import sys
db = 'https://replicate.npmjs.com';
r = requests.get('https://replicate.npmjs.com/_all_docs', headers={"include_docs" : "true"})
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
print(line)
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
Notice, I don't even include all-docs, but it sticks in an infinite loop. I think this is because the data is huge.
A look at the head of the output from - https://replicate.npmjs.com/_all_docs
gives me following output:
{"total_rows":1017703,"offset":0,"rows":[
{"id":"0","key":"0","value":{"rev":"1-5fbff37e48e1dd03ce6e7ffd17b98998"}},
{"id":"0-","key":"0-","value":{"rev":"1-420c8f16ec6584c7387b19ef401765a4"}},
{"id":"0----","key":"0----","value":{"rev":"1-55f4221814913f0e8f861b1aa42b02e4"}},
{"id":"0-1-project","key":"0-1-project","value":{"rev":"1-3cc19950252463c69a5e717d9f8f0f39"}},
{"id":"0-100","key":"0-100","value":{"rev":"1-c4f41a37883e1289f469d5de2a7b505a"}},
{"id":"0-24","key":"0-24","value":{"rev":"1-e595ec3444bc1039f10c062dd86912a2"}},
{"id":"0-60","key":"0-60","value":{"rev":"2-32c17752acfe363fa1be7dbd38212b0a"}},
{"id":"0-9","key":"0-9","value":{"rev":"1-898c1d89f7064e58f052ff492e94c753"}},
{"id":"0-_-0","key":"0-_-0","value":{"rev":"1-d47c142e9460c815c19c4ed3355d648d"}},
{"id":"0.","key":"0.","value":{"rev":"1-11c33605f2e3fd88b5416106fcdbb435"}},
{"id":"0.0","key":"0.0","value":{"rev":"1-5e541d4358c255cbcdba501f45a66e82"}},
{"id":"0.0.1","key":"0.0.1","value":{"rev":"1-ce856c27d0e16438a5849a97f8e9671d"}},
{"id":"0.0.168","key":"0.0.168","value":{"rev":"1-96ab3047e57ca1573405d0c89dd7f3f2"}},
{"id":"0.0.250","key":"0.0.250","value":{"rev":"1-c07ad0ffb7e2dc51bfeae2838b8d8bd6"}},
Notice, that all the documents start from the second line (i.e. all the documents are part of the "rows" key's values). Now, my question is how to get only the values of "rows" key (i.e. all the documents). I found this repository for the similar purpose, but can't use/ convert it as I am a total beginner in JavaScript.
If there is no stream=True among the arguments of get() then the whole data will be downloaded into memory before the loop over the lines even starts.
Then there is the problem that at least the lines themselves are not valid JSON. You'll need an incremental JSON parser like ijson for this. ijson in turn wants a file like object which isn't easily obtained from the requests.Response, so I will use urllib from the Python standard library here:
#!/usr/bin/env python3
from urllib.request import urlopen
import ijson
def main():
with urlopen('https://replicate.npmjs.com/_all_docs') as json_file:
for row in ijson.items(json_file, 'rows.item'):
print(row)
if __name__ == '__main__':
main()
Is there a reason why you aren't decoding the json before iterating over the lines?
Can you try this:
import requests
import json
import sys
db = 'https://replicate.npmjs.com';
r = requests.get('https://replicate.npmjs.com/_all_docs', headers={"include_docs" : "true"})
decoded_r = r.decode('utf-8')
data = json.loads(decoded_r)
for row in data.rows:
print(row.key)

Categories

Resources