How to validate data received via the Telegram's Web App - python

I'm trying to validate WebApp data but the result is not what I wanted.
Telegram documentation:
data_check_string = ...
secret_key = HMAC_SHA256(<bot_token>, "WebAppData")
if (hex(HMAC_SHA256(data_check_string, secret_key)) == hash) {
// data is from Telegram
}
MyCode:
BOT_TOKEN = '5139539316:AAGVhDje2A3mB9yA_7l8-TV8xikC7KcudNk'
data_check_string = 'query_id=AAGcqlFKAAAAAJyqUUp6-Y62&user=%7B%22id%22%3A1246866076%2C%22first_name%22%3A%22Dante%22%2C%22last_name%22%3A%22%22%2C%22username%22%3A%22S_User%22%2C%22language_code%22%3A%22en%22%7D&auth_date=1651689536&hash=de7f6b26aadbd667a36d76d91969ecf6ffec70ffaa40b3e98d20555e2406bfbb'
data_check_arr = data_check_string.split('&')
needle = 'hash='
hash_item = ''
telegram_hash = ''
for item in data_check_arr:
if item[0:len(needle)] == needle:
telegram_hash = item[len(needle):]
hash_item = item
data_check_arr.remove(hash_item)
data_check_arr.sort()
data_check_string = "\n".join(data_check_arr)
secret_key = hmac.new("WebAppData".encode(), BOT_TOKEN.encode(), hashlib.sha256).digest()
calculated_hash = hmac.new(data_check_string.encode(), secret_key, hashlib.sha256).hexdigest()
print(calculated_hash == telegram_hash) # print False
I'm trying to validate webapp data in python, but my code didn't give the intended result.
the hash which my code gives me is different from the telegram's one.
UPDATE: valid data added, and bot-token has been changed.

You need to unquote data_check_string
from urllib.parse import unquote
data_check_string = unquote('query_id=AAGcqlFKAAAAAJyqUUp6-Y62&user=%7B%22id%22%3A1246866076%2C%22first_name%22%3A%22Dante%22%2C%22last_name%22%3A%22%22%2C%22username%22%3A%22S_User%22%2C%22language_code%22%3A%22en%22%7D&auth_date=1651689536&hash=de7f6b26aadbd667a36d76d91969ecf6ffec70ffaa40b3e98d20555e2406bfbb')
And swap the arguments
calculated_hash = hmac.new(secret_key, data_check_string.encode(), hashlib.sha256).hexdigest()

You can replace the for-loops with a couple of lines (already incorporates kurdyukovpv's suggestion to unquote the query string):
data_check_string = sorted([ chunk.split("=") for chunk in unquote(data_check_string).split("&")
if chunk[:len("hash=")]!="hash="],
key=lambda x: x[0])
data_check_string = "\n".join([f"{rec[0]}={rec[1]}" for rec in data_check_string])
EDIT: Figured I might as well just post the entire working function I got out of this thread ) :
import hmac
import hashlib
from urllib.parse import unquote
def validate(hash_str, init_data, token, c_str="WebAppData"):
"""
Validates the data received from the Telegram web app, using the
method documented here:
https://core.telegram.org/bots/webapps#validating-data-received-via-the-web-app
hash_str - the has string passed by the webapp
init_data - the query string passed by the webapp
token - Telegram bot's token
c_str - constant string (default = "WebAppData")
"""
init_data = sorted([ chunk.split("=")
for chunk in unquote(init_data).split("&")
if chunk[:len("hash=")]!="hash="],
key=lambda x: x[0])
init_data = "\n".join([f"{rec[0]}={rec[1]}" for rec in init_data])
secret_key = hmac.new(c_str.encode(), token.encode(),
hashlib.sha256 ).digest()
data_check = hmac.new( secret_key, init_data.encode(),
hashlib.sha256)
return data_check.hexdigest() == hash_str

Related

Binance API 'allOrders' (HMAC sha256) error 1022

For about the past week I've been trying to wrap my head around the concept of a signed HMAC sha256 request.
In this example I'm just trying to get a list of all current orders.
I thought I'd figured it out but for some reason this still won't work.
The API keys are new...I've tried both Read and Write versions, and my IP is whitelisted.
I'm getting {'code': -1022, 'msg': 'Signature for this request is not valid.'}
My code...
import hmac
import hashlib
import json
import requests
import time
import Credentials
class Private:
def GetAllOrders(pair,orderid='',start='',finish='',limit='',window=''):
# Credentials #
ApiKey = Credentials.Binance.ReadAPIKey
SecretKey = Credentials.Binance.ReadSecretKey
# Base #
BaseURL = 'https://api.binance.com'
EndPoint = '/api/v3/allOrders'
# Required #
Pair = '?symbol='+str(pair)
Time = '&timestamp='+str(int(time.time()*1000))
# Optional #
if orderid != '':
OrderID = '&orderId='+str(orderid)
else:
OrderID = orderid
if start != '':
Start = '&startTime='+str(start*1000)
else:
Start = start
if finish != '':
Finish = '&endTime='+str(finish*1000)
else:
Finish = finish
if limit != '':
Limit = '&limit='+str(limit)
else:
Limit = limit
if window != '':
Window = '&recvWindow='+str(window)
else:
Window = window
# HMAC #
HMAC = hmac.new(bytes(SecretKey.encode('utf-8')),
(Pair+OrderID+Start+Finish+Limit+Window+Time).encode('utf-8'),
hashlib.sha256).hexdigest()
# Signature #
Signature = '&signature='+str(HMAC)
# Headers #
Headers = {'X-MBX-APIKEY': ApiKey}
# Request #
JSON = requests.get(BaseURL+EndPoint+Pair+OrderID+Start+Finish+Limit+Window+Time+Signature,headers=Headers).json()
return JSON
print(Private.GetAllOrders(pair='BTCUSDT'))
Any help would be appreciated...
I figured it out...
The HMAC does not recognize the '?' as being the start of the parameters, whereas the URL (API) does.
The following lines should look like this...
# Required #
Pair = 'symbol='+str(pair)
# Request #
JSON = requests.get(BaseURL+EndPoint+'?'+Pair+OrderID+Start+Finish+Limit+Window+Time+Signature,headers=Headers).json()

jwe cannot encrypt data correctly by jwcrypto

I has a requirement to generate encrypted data by jwe. The implementation of ruby can work correctly. But the python implementation cannot work correctly.
The ruby implementation
require 'jwe'
key = OpenSSL::PKey::RSA.new File.read 'public.pem'
payload = {user:"admin"}.to_json
puts JWE.encrypt(payload, key, enc: 'A192GCM')
The python implementation
from jwt import jwk_from_pem
from jwcrypto import jwe,jwk
from jwcrypto.common import json_encode
import json
with open("public.pem", "rb") as f:
key = jwk.JWK.from_pem(f.read())
key = key.public()
token = jwe.JWE(u'{user:"admin"}', json_encode({"alg":"RSA-OAEP","enc":"A192GCM"}))
token.add_recipient(key)
result = token.serialize()
result = json.loads(result)
print(result["protected"] + "." + result["encrypted_key"])
I have reffered the examples of jwcrypto. But the generated token is not correct.
fixed. I should use compact instead of appending data manually.
from jwt import jwk_from_pem
from jwcrypto import jwe,jwk
from jwcrypto.common import json_encode
with open("public.pem", "rb") as f:
key = jwk.JWK.from_pem(f.read())
key = key.public()
token = jwe.JWE('{"user":"admin"}', json_encode({"alg":"RSA-OAEP","enc":"A192GCM"}))
token.add_recipient(key)
result = token.serialize(compact=True)
print(result)

How to verify the Signature of a JWT generated by AWS Cognito in Python 3.6?

Here's my script
import urllib.request
import json
import time
from jose import jwk, jwt
from jose.utils import base64url_decode
import base64
region = '....'
userpool_id = '.....'
app_client_id = '...'
keys_url = 'https://cognito-idp.{}.amazonaws.com/{}/.well-known/jwks.json'.format(region, userpool_id)
response = urllib.request.urlopen(keys_url)
keys = json.loads(response.read())['keys']
token = request.headers['Authorization']
print(token)
# get the kid from the headers prior to verification
headers = jwt.get_unverified_headers(request.headers['Authorization'])
kid = headers['kid']
print(kid)
# search for the kid in the downloaded public keys
key_index = -1
for i in range(len(keys)):
if kid == keys[i]['kid']:
key_index = i
break
if key_index == -1:
print('Public key not found in jwks.json')
return False
# construct the public key
public_key = jwk.construct(keys[key_index])
# get the last two sections of the token,
# message and signature (encoded in base64)
message, encoded_signature = str(token).rsplit('.', 1)
# decode the
print('>>encoded signature')
print(encoded_signature)
decoded_signature = base64.b64decode(encoded_signature)
if not public_key.verify(message, decoded_signature):
print('Signature verification failed')
return False
print('Signature successfully verified')
I am always ending up Signature verification failed even though jwt token is generated by a valid legitimate cognito user pool. I've looked at the documentation and it does not really specify the whole verification process.
I see you're using jose, and I'm using pyjwt, but this solution might help you. Most of the bulk code from the bottom comes from the "api-gateway-authorizer-python" blueprint. Note that this is very frail code that will just break if anything is fails, I ended up not using lambda authentication but rather selecting AWS_IAM authentication for my API Gateway with Identity Pools so I never finished it.
This example requires that you install pyjwt and cryptography with pip on your work directory and upload everything as a .zip file.
I'd recommend that you watch this video if you want to consider the AWS_IAM authentication option: https://www.youtube.com/watch?v=VZqG7HjT2AQ
They also have a solution with a more elaborate lambda authorizer implementation in github at: https://github.com/awslabs/aws-serverless-auth-reference-app (they show the link at the beggining of the video) but I don't know about their pip dependencies.
from __future__ import print_function
from jwt.algorithms import RSAAlgorithm
import re
import jwt
import json
import sys
import urllib
region = 'your-region'
userpoolId = 'your-user-pool-id'
appClientId = 'your-app-client-id'
keysUrl = 'https://cognito-idp.{}.amazonaws.com/{}/.well-known/jwks.json'.format(region, userpoolId)
def lambda_handler(event, context):
bearerToken = event['authorizationToken']
methodArn = event['methodArn']
print("Client token: " + bearerToken)
print("Method ARN: " + methodArn)
response = urllib.urlopen(keysUrl)
keys = json.loads(response.read())['keys']
jwtToken = bearerToken.split(' ')[-1]
header = jwt.get_unverified_header(jwtToken)
kid = header['kid']
jwkValue = findJwkValue(keys, kid)
publicKey = RSAAlgorithm.from_jwk(json.dumps(jwkValue))
decoded = decodeJwtToken(jwtToken, publicKey)
print('Decoded token: ' + json.dumps(decoded))
principalId = decoded['cognito:username']
methodArn = event['methodArn'].split(':')
apiGatewayArnTmp = methodArn[5].split('/')
awsAccountId = methodArn[4]
policy = AuthPolicy(principalId, awsAccountId)
policy.restApiId = apiGatewayArnTmp[0]
policy.region = methodArn[3]
policy.stage = apiGatewayArnTmp[1]
#policy.denyAllMethods()
policy.allowAllMethods()
# Finally, build the policy
authResponse = policy.build()
# new! -- add additional key-value pairs associated with the authenticated principal
# these are made available by APIGW like so: $context.authorizer.<key>
# additional context is cached
context = {
'key': 'value', # $context.authorizer.key -> value
'number': 1,
'bool': True
}
# context['arr'] = ['foo'] <- this is invalid, APIGW will not accept it
# context['obj'] = {'foo':'bar'} <- also invalid
authResponse['context'] = context
return authResponse
def findJwkValue(keys, kid):
for key in keys:
if key['kid'] == kid:
return key
def decodeJwtToken(token, publicKey):
try:
decoded=jwt.decode(token, publicKey, algorithms=['RS256'], audience=appClientId)
return decoded
except Exception as e:
print(e)
raise
class HttpVerb:
GET = 'GET'
POST = 'POST'
PUT = 'PUT'
PATCH = 'PATCH'
HEAD = 'HEAD'
DELETE = 'DELETE'
OPTIONS = 'OPTIONS'
ALL = '*'
class AuthPolicy(object):
# The AWS account id the policy will be generated for. This is used to create the method ARNs.
awsAccountId = ''
# The principal used for the policy, this should be a unique identifier for the end user.
principalId = ''
# The policy version used for the evaluation. This should always be '2012-10-17'
version = '2012-10-17'
# The regular expression used to validate resource paths for the policy
pathRegex = '^[/.a-zA-Z0-9-\*]+$'
'''Internal lists of allowed and denied methods.
These are lists of objects and each object has 2 properties: A resource
ARN and a nullable conditions statement. The build method processes these
lists and generates the approriate statements for the final policy.
'''
allowMethods = []
denyMethods = []
# The API Gateway API id. By default this is set to '*'
restApiId = '*'
# The region where the API is deployed. By default this is set to '*'
region = '*'
# The name of the stage used in the policy. By default this is set to '*'
stage = '*'
def __init__(self, principal, awsAccountId):
self.awsAccountId = awsAccountId
self.principalId = principal
self.allowMethods = []
self.denyMethods = []
def _addMethod(self, effect, verb, resource, conditions):
'''Adds a method to the internal lists of allowed or denied methods. Each object in
the internal list contains a resource ARN and a condition statement. The condition
statement can be null.'''
if verb != '*' and not hasattr(HttpVerb, verb):
raise NameError('Invalid HTTP verb ' + verb + '. Allowed verbs in HttpVerb class')
resourcePattern = re.compile(self.pathRegex)
if not resourcePattern.match(resource):
raise NameError('Invalid resource path: ' + resource + '. Path should match ' + self.pathRegex)
if resource[:1] == '/':
resource = resource[1:]
resourceArn = 'arn:aws:execute-api:{}:{}:{}/{}/{}/{}'.format(self.region, self.awsAccountId, self.restApiId, self.stage, verb, resource)
if effect.lower() == 'allow':
self.allowMethods.append({
'resourceArn': resourceArn,
'conditions': conditions
})
elif effect.lower() == 'deny':
self.denyMethods.append({
'resourceArn': resourceArn,
'conditions': conditions
})
def _getEmptyStatement(self, effect):
'''Returns an empty statement object prepopulated with the correct action and the
desired effect.'''
statement = {
'Action': 'execute-api:Invoke',
'Effect': effect[:1].upper() + effect[1:].lower(),
'Resource': []
}
return statement
def _getStatementForEffect(self, effect, methods):
'''This function loops over an array of objects containing a resourceArn and
conditions statement and generates the array of statements for the policy.'''
statements = []
if len(methods) > 0:
statement = self._getEmptyStatement(effect)
for curMethod in methods:
if curMethod['conditions'] is None or len(curMethod['conditions']) == 0:
statement['Resource'].append(curMethod['resourceArn'])
else:
conditionalStatement = self._getEmptyStatement(effect)
conditionalStatement['Resource'].append(curMethod['resourceArn'])
conditionalStatement['Condition'] = curMethod['conditions']
statements.append(conditionalStatement)
if statement['Resource']:
statements.append(statement)
return statements
def allowAllMethods(self):
'''Adds a '*' allow to the policy to authorize access to all methods of an API'''
self._addMethod('Allow', HttpVerb.ALL, '*', [])
def denyAllMethods(self):
'''Adds a '*' allow to the policy to deny access to all methods of an API'''
self._addMethod('Deny', HttpVerb.ALL, '*', [])
def allowMethod(self, verb, resource):
'''Adds an API Gateway method (Http verb + Resource path) to the list of allowed
methods for the policy'''
self._addMethod('Allow', verb, resource, [])
def denyMethod(self, verb, resource):
'''Adds an API Gateway method (Http verb + Resource path) to the list of denied
methods for the policy'''
self._addMethod('Deny', verb, resource, [])
def allowMethodWithConditions(self, verb, resource, conditions):
'''Adds an API Gateway method (Http verb + Resource path) to the list of allowed
methods and includes a condition for the policy statement. More on AWS policy
conditions here: http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html#Condition'''
self._addMethod('Allow', verb, resource, conditions)
def denyMethodWithConditions(self, verb, resource, conditions):
'''Adds an API Gateway method (Http verb + Resource path) to the list of denied
methods and includes a condition for the policy statement. More on AWS policy
conditions here: http://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html#Condition'''
self._addMethod('Deny', verb, resource, conditions)
def build(self):
'''Generates the policy document based on the internal lists of allowed and denied
conditions. This will generate a policy with two main statements for the effect:
one statement for Allow and one statement for Deny.
Methods that includes conditions will have their own statement in the policy.'''
if ((self.allowMethods is None or len(self.allowMethods) == 0) and
(self.denyMethods is None or len(self.denyMethods) == 0)):
raise NameError('No statements defined for the policy')
policy = {
'principalId': self.principalId,
'policyDocument': {
'Version': self.version,
'Statement': []
}
}
policy['policyDocument']['Statement'].extend(self._getStatementForEffect('Allow', self.allowMethods))
policy['policyDocument']['Statement'].extend(self._getStatementForEffect('Deny', self.denyMethods))
return policy
Following class verifies Cognito tokens. You are required to install jose and pydantic.
The implementation is derived from this repo, it contains more details, addiotional functionalitites, tests etc.
import json
import logging
import os
import time
import urllib.request
from typing import Dict, List
from jose import jwk, jwt
from jose.utils import base64url_decode
from pydantic import BaseModel
class JWK(BaseModel):
"""A JSON Web Key (JWK) model that represents a cryptographic key.
The JWK specification:
https://datatracker.ietf.org/doc/html/rfc7517
"""
alg: str
e: str
kid: str
kty: str
n: str
use: str
class CognitoAuthenticator:
def __init__(self, pool_region: str, pool_id: str, client_id: str) -> None:
self.pool_region = pool_region
self.pool_id = pool_id
self.client_id = client_id
self.issuer = f"https://cognito-idp.{self.pool_region}.amazonaws.com/{self.pool_id}"
self.jwks = self.__get_jwks()
def __get_jwks(self) -> List[JWK]:
"""Returns a list of JSON Web Keys (JWKs) from the issuer. A JWK is a
public key used to verify a JSON Web Token (JWT).
Returns:
List of keys
Raises:
Exception when JWKS endpoint does not contain any keys
"""
file = urllib.request.urlopen(f"{self.issuer}/.well-known/jwks.json")
res = json.loads(file.read().decode("utf-8"))
if not res.get("keys"):
raise Exception("The JWKS endpoint does not contain any keys")
jwks = [JWK(**key) for key in res["keys"]]
return jwks
def verify_token(
self,
token: str,
) -> bool:
"""Verify a JSON Web Token (JWT).
For more details refer to:
https://docs.aws.amazon.com/cognito/latest/developerguide/amazon-cognito-user-pools-using-tokens-verifying-a-jwt.html
Args:
token: The token to verify
Returns:
True if valid, False otherwise
"""
try:
self._is_jwt(token)
self._get_verified_header(token)
self._get_verified_claims(token)
except CognitoError:
return False
return True
def _is_jwt(self, token: str) -> bool:
"""Validate a JSON Web Token (JWT).
A JSON Web Token (JWT) includes three sections: Header, Payload and
Signature. They are base64url encoded and are separated by dot (.)
characters. If JWT token does not conform to this structure, it is
considered invalid.
Args:
token: The token to validate
Returns:
True if valid
Raises:
CognitoError when invalid token
"""
try:
jwt.get_unverified_header(token)
jwt.get_unverified_claims(token)
except jwt.JWTError:
logging.info("Invalid JWT")
raise InvalidJWTError
return True
def _get_verified_header(self, token: str) -> Dict:
"""Verifies the signature of a a JSON Web Token (JWT) and returns its
decoded header.
Args:
token: The token to decode header from
Returns:
A dict representation of the token header
Raises:
CognitoError when unable to verify signature
"""
# extract key ID (kid) from token
headers = jwt.get_unverified_header(token)
kid = headers["kid"]
# find JSON Web Key (JWK) that matches kid from token
key = None
for k in self.jwks:
if k.kid == kid:
# construct a key object from found key data
key = jwk.construct(k.dict())
break
if not key:
logging.info(f"Unable to find a signing key that matches '{kid}'")
raise InvalidKidError
# get message and signature (base64 encoded)
message, encoded_signature = str(token).rsplit(".", 1)
signature = base64url_decode(encoded_signature.encode("utf-8"))
if not key.verify(message.encode("utf8"), signature):
logging.info("Signature verification failed")
raise SignatureError
# signature successfully verified
return headers
def _get_verified_claims(self, token: str) -> Dict:
"""Verifies the claims of a JSON Web Token (JWT) and returns its claims.
Args:
token: The token to decode claims from
Returns:
A dict representation of the token claims
Raises:
CognitoError when unable to verify claims
"""
claims = jwt.get_unverified_claims(token)
# verify expiration time
if claims["exp"] < time.time():
logging.info("Expired token")
raise TokenExpiredError
# verify issuer
if claims["iss"] != self.issuer:
logging.info("Invalid issuer claim")
raise InvalidIssuerError
# verify audience
# note: claims["client_id"] for access token, claims["aud"] otherwise
if claims["client_id"] != self.client_id:
logging.info("Invalid audience claim")
raise InvalidAudienceError
# verify token use
if claims["token_use"] != "access":
logging.info("Invalid token use claim")
raise InvalidTokenUseError
# claims successfully verified
return claims
class CognitoError(Exception):
pass
class InvalidJWTError(CognitoError):
pass
class InvalidKidError(CognitoError):
pass
class SignatureError(CognitoError):
pass
class TokenExpiredError(CognitoError):
pass
class InvalidIssuerError(CognitoError):
pass
class InvalidAudienceError(CognitoError):
pass
class InvalidTokenUseError(CognitoError):
pass
if __name__ == "__main__":
auth = CognitoAuthenticator(
pool_region=os.environ["AWS_COGNITO_REGION"],
pool_id=os.environ["AWS_USER_POOL_ID"],
client_id=os.environ["AWS_USER_POOL_CLIENT_ID"],
)
# note: if you are not using access token, see line 161
access_token = "my_access_token"
print(f"Token verified: {auth.verify_token(access_token)}")

Flask server could not handle non ascii characters

I have wrriten a simple application using flask. Its main objective is to implement CLD2 (language detector) using post and get methods. It is working well for English but for any other language such Urdu, Arabic. It gives invalid results
Following is the corresponding script
# http://127.0.0.1:5000/cld2?text="Your input text string"
# OUTPUT ( It gives output as we done in CC)
#"585&URDU-99-1155"
from flask import Flask,abort,jsonify,request
from flask_restful import Resource, Api, reqparse
import cld2
from bs4 import BeautiflSoup
import sys
import urllib2, urllib
import re
reload(sys)
sys.setdefaultencoding('utf8')
app = Flask(__name__)
api = Api(app)
class HelloWorld(Resource):
def cld2_states(self, txt):
txt = txt.encode("utf8")
isReliable, textBytesFound, details = cld2.detect(txt)
outstr = str(textBytesFound)
for item in details: # Iterate 3 languages
if item[0] != "Unknown":
outstr += '&' + item[0] + '-' + str(item[2]) + '-' + str(int(item[3]))
return outstr
def get(self):
parser = reqparse.RequestParser()
parser.add_argument('text', type=str)
parser.add_argument('url', type=str)
_dict = dict(parser.parse_args())
if _dict["text"] is not None:
value = _dict["text"]
print type(value)
return self.cld2_states(value)
return None
def post(self):
data = request.get_json(force=True)
# print data
predict_request = [data['content']][1]
out = self.cld2_states(predict_request)
return jsonify(score=out)
api.add_resource(HelloWorld, '/cld2')
if __name__ == '__main__':
app.run(debug=True, port=6161, host='0.0.0.0')
If I give a query via get method, it give correct results but for same query in post method, it return just a number. But if text is in English then post also give correct result.
My client is a simple Java application then iterate over files and find their language one by one.
The problem might be with this line:
outstr = str(textBytesFound)
Instead of using str to convert from bytes to str, use str.decode(), like this:
outstr = textBytesFound.decode("utf-8")
(obviously if your text is not encoded with UTF-8, you need to tell Python the correct encoding to use)

Fetching language detection from Google api

I have a CSV with keywords in one column and the number of impressions in a second column.
I'd like to provide the keywords in a url (while looping) and for the Google language api to return what type of language was the keyword in.
I have it working manually. If I enter (with the correct api key):
http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=merde
I get:
{"responseData": {"language":"fr","isReliable":false,"confidence":6.213709E-4}, "responseDetails": null, "responseStatus": 200}
which is correct, 'merde' is French.
so far I have this code but I keep getting server unreachable errors:
import time
import csv
from operator import itemgetter
import sys
import fileinput
import urllib2
import json
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
#not working
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
# Deserialize JSON to Python objects
result_object = json.loads(result)
#Get the rows in the table, then get the second column's value
# for each row
return row in result_object
#not working
def retrieve_terms(seedterm):
print(seedterm)
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=%(seed)s'
url = url_template % {"seed": seedterm}
try:
with urllib2.urlopen(url) as data:
data = perform_request(seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
#terms = parse_result(result)
#print terms
print result
def main(argv):
filename = argv[1]
csvfile = open(filename, 'r')
csvreader = csv.DictReader(csvfile)
rows = []
for row in csvreader:
rows.append(row)
sortedrows = sorted(rows, key=itemgetter('impressions'), reverse = True)
keys = sortedrows[0].keys()
for item in sortedrows:
retrieve_terms(item['keywords'])
try:
outputfile = open('Output_%s.csv' % (filename),'w')
except IOError:
print("The file is active in another program - close it first!")
sys.exit()
dict_writer = csv.DictWriter(outputfile, keys, lineterminator='\n')
dict_writer.writer.writerow(keys)
dict_writer.writerows(sortedrows)
outputfile.close()
print("File is Done!! Check your folder")
if __name__ == '__main__':
start_time = time.clock()
main(sys.argv)
print("\n")
print time.clock() - start_time, "seconds for script time"
Any idea how to finish the code so that it will work? Thank you!
Try to add referrer, userip as described in the docs:
An area to pay special attention to
relates to correctly identifying
yourself in your requests.
Applications MUST always include a
valid and accurate http referer header
in their requests. In addition, we
ask, but do not require, that each
request contains a valid API Key. By
providing a key, your application
provides us with a secondary
identification mechanism that is
useful should we need to contact you
in order to correct any problems. Read
more about the usefulness of having an
API key
Developers are also encouraged to make
use of the userip parameter (see
below) to supply the IP address of the
end-user on whose behalf you are
making the API request. Doing so will
help distinguish this legitimate
server-side traffic from traffic which
doesn't come from an end-user.
Here's an example based on the answer to the question "access to google with python":
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from pprint import pprint
api_key, userip = None, None
query = {'q' : 'матрёшка'}
referrer = "https://stackoverflow.com/q/4309599/4279"
if userip:
query.update(userip=userip)
if api_key:
query.update(key=api_key)
url = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s' %(
urllib.urlencode(query))
request = urllib2.Request(url, headers=dict(Referer=referrer))
json_data = json.load(urllib2.urlopen(request))
pprint(json_data['responseData'])
Output
{u'confidence': 0.070496580000000003, u'isReliable': False, u'language': u'ru'}
Another issue might be that seedterm is not properly quoted:
if isinstance(seedterm, unicode):
value = seedterm
else: # bytes
value = seedterm.decode(put_encoding_here)
url = 'http://...q=%s' % urllib.quote_plus(value.encode('utf-8'))

Categories

Resources