Applying google_translator to a column of a pandas dataframe

Applying google_translator to a column of a pandas dataframe - python

I am trying the following. I am applying a detect and translate function to a column containing free text variable which are the professional role of some customers:
from langdetect import detect
from google_trans_new import google_translator
#simple function to detect and translate text
def detect_and_translate(text,target_lang='en'):
result_lang = detect(text)
if result_lang == target_lang:
return text
else:
translator = google_translator()
translate_text = translator.translate(text,lang_src=result_lang,lang_tgt=target_lang)
return translate_text
df_processed['Position_Employed'] = df_processed['Position_Employed'].replace({'0':'unknown', 0:'unknown'})
df_processed['Position_Employed'] = df_processed['Position_Employed'].apply(detect_and_translate)
But I am getting the following error;
JSONDecodeError: Extra data: line 1 column 433 (char 432)
I have tried to update the solution from this link but it did not work to editing line 151 in google_trans_new/google_trans_new.py which is: response = (decoded_line + ']') to response = decoded_line
Python google-trans-new translate raises error: JSONDecodeError: Extra data:
What can I do?

Related

I am getting a "'NoneType' object is not subscriptable" when trying to bring in data from a URL

Here is my code:
#Import libraries
import os
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
from datetime import date
import matplotlib.ticker as ticker
# API Key from EIA
api_key = 'blah blah'
# api_key = os.getenv("EIA_API_KEY")
# PADD Names to Label Columns
# Change to whatever column labels you want to use.
PADD_NAMES = ['PADD 1','PADD 2','PADD 3','PADD 4','PADD 5']
# Enter all your Series IDs here separated by commas
PADD_KEY = ['PET.MCRRIP12.M',
'PET.MCRRIP22.M',
'PET.MCRRIP32.M',
'PET.MCRRIP42.M',
'PET.MCRRIP52.M']
# Initialize list - this is the final list that you will store all the data from the json pull. Then you will use this list to concat into a pandas dataframe.
final_data = []
# Choose start and end dates
startDate = '2009-01-01'
endDate = '2021-01-01'
# Pull in data via EIA API
for i in range(len(PADD_KEY)):
url = 'http://api.eia.gov/series/?api_key=' + api_key + PADD_KEY[i]
r = requests.get(url)
json_data = r.json()
if r.status_code == 200:
print('Success!')
else:
print('Error')
df = pd.DataFrame(json_data.get('series')[0].get('data'),
columns = ['Date', PADD_NAMES[i]])
df.set_index('Date', drop=True, inplace=True)
final_data.append(df)
Here is my error:
TypeError Traceback (most recent call last)
<ipython-input-38-4de082165a0d> in <module>
10 print('Error')
11
---> 12 df = pd.DataFrame(json_data.get('series')[0].get('data'),
13 columns = ['Date', PADD_NAMES[i]])
14 df.set_index('Date', drop=True, inplace=True)
TypeError: 'NoneType' object is not subscriptable

'NoneType' object is not subscriptable comes when you try to find value in a none object like df["key"] where df is None.
Do you have PADD_NAMES defined somewhere in your code. For me the error looks like the issue of your json data. have you tried printing your json data?

The API you are calling requires HTTPS protocol to access, try to change "http" to "https"
https://api.eia.gov/series/?api_key=
Consider adding some debug output to check for other errors, by changing if...else block like this
if r.status_code == 200:
print('Success!')
else:
print('Error')
print(json_data)

pytube: 'NoneType' object has no attribute 'span'

I try to follow pytube example for downloading video from YouTube:
from pytube import YouTube
video = YouTube('https://www.youtube.com/watch?v=BATOxzbVNno')
video.streams.all()
and immediately get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-2556eb2eb903> in <module>()
1 from pytube import YouTube
2 video = YouTube('https://www.youtube.com/watch?v=BATOxzbVNno')
----> 3 video.streams.all()
5 frames
/usr/local/lib/python3.7/dist-packages/pytube/cipher.py in get_throttling_function_code(js)
301 # Extract the code within curly braces for the function itself, and merge any split lines
302 code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n')
--> 303 joined_lines = "".join(code_lines_list)
304
305 # Prepend function definition (e.g. `Dea=function(a)`)
AttributeError: 'NoneType' object has no attribute 'span'
Please help me. It worked fine just yesterday! Thanks a lot!

Just ran into that error myself, seems it occurs quite frequently regardless of it getting temporary fixes.
Found a fix on github: NoneType object has no attribute 'span'
Just replace the function get_throttling_function_name with:
def get_throttling_function_name(js: str) -> str:
"""Extract the name of the function that computes the throttling parameter.
:param str js:
The contents of the base.js asset file.
:rtype: str
:returns:
The name of the function used to compute the throttling parameter.
"""
function_patterns = [
# https://github.com/ytdl-org/youtube-dl/issues/29326#issuecomment-865985377
# a.C&&(b=a.get("n"))&&(b=Dea(b),a.set("n",b))}};
# In above case, `Dea` is the relevant function name
r'a\.[A-Z]&&\(b=a\.get\("n"\)\)&&\(b=([^(]+)\(b\)',
]
logger.debug('Finding throttling function name')
for pattern in function_patterns:
regex = re.compile(pattern)
function_match = regex.search(js)
if function_match:
logger.debug("finished regex search, matched: %s", pattern)
function_name = function_match.group(1)
is_Array = True if '[' or ']' in function_name else False
if is_Array:
index = int(re.findall(r'\d+', function_name)[0])
name = function_name.split('[')[0]
pattern = r"var %s=\[(.*?)\];" % name
regex = re.compile(pattern)
return regex.search(js).group(1).split(',')[index]
else:
return function_name
raise RegexMatchError(
caller="get_throttling_function_name", pattern="multiple"
)

JSONDecodeError when using for loop in python [duplicate]

This question already has answers here:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(24 answers)
Closed 1 year ago.
I've been trying to do some API queries to get some missing data in my DF. I'm using grequest library to send multiple request and create a list for the response object. Then I use a for loop to load the response in a json to retrieve the missing data. What I noticed is that when loading the data using .json() from the list directly using notition list[0].json() it works fine, but when trying to read the list and then load the response into a json, This error comes up : JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here's my code :
import requests
import json
import grequests
ls = []
for i in null_data['name']:
url = 'https://pokeapi.co/api/v2/pokemon/' + i.lower()
ls.append(url)
rs = (grequests.get(u) for u in ls)
s = grequests.map(rs)
#This line works
print(s[0].json()['weight']/10)
for x in s:
#This one fails
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
<ipython-input-21-9f404bc56f66> in <module>
13
14 for x in s:
---> 15 js = x.json()
16 peso = js['weight']/10
17 null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

One (or more) elements are empty.
So:
...
for x in s:
if x != ""
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
...
or
...
for x in s:
try:
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
except json.JSONDecodeError as ex: print("Failed to decode(%s)"%ex)
...
The first checks if x is an empty string while the other tries to decode every one but upon an exception just prints an error message instead of quitting.

python crawler ieee paper keywords

i trying to use crawler to get ieee paper keywords but now i get a error
how can to fix my crawler?
my code is here
import requests
import json
from bs4 import BeautifulSoup
ieee_content = requests.get("http://ieeexplore.ieee.org/document/8465981", timeout=180)
soup = BeautifulSoup(ieee_content.text, 'xml')
tag = soup.find_all('script')
for i in tag[9]:
s = json.loads(re.findall('global.document.metadata=(.*;)', i)[0].replace("'", '"').replace(";", ''))
and error is here
Traceback (most recent call last):
File "G:/github/爬蟲/redigg-leancloud/crawlers/sup_ieee_keywords.py", line 90, in <module>
a.get_es_data(offset=0, size=1)
File "G:/github/爬蟲/redigg-leancloud/crawlers/sup_ieee_keywords.py", line 53, in get_es_data
self.get_data(link=ieee_link, esid=es_id)
File "G:/github/爬蟲/redigg-leancloud/crawlers/sup_ieee_keywords.py", line 65, in get_data
s = json.loads(re.findall('global.document.metadata=(.*;)', i)[0].replace(";", '').replace("'", '"'))
IndexError: list index out of range

Here's another answer. I don't know what you are doing with 's' in your code after the load (replace) in my code.
The code below doesn't thrown an error, but again how are you using 's'
import requests
import json
from bs4 import BeautifulSoup
ieee_content = requests.get("http://ieeexplore.ieee.org/document/8465981", timeout=180)
soup = BeautifulSoup(ieee_content.text, 'xml')
tag = soup.find_all('script')
# i is a list
for i in tag[9]:
metadata_format = re.compile(r'global.document.metadata=.*', re.MULTILINE)
metadata = re.findall(metadata_format, i)
if len(metadata) != 0:
# convert the list
convert_to_json = json.dumps(metadata)
x = json.loads(convert_to_json)
s = x[0].replace("'", '"').replace(";", '')
###########################################
# I don't know what you plan to do with 's'
###########################################
print (s)

Apparently in line 65 some of the data provided in i did not suite the regex pattern you're trying to use. Therefor your [0] will not work as the data returned is not an array of suitable length.
Solution:
x = json.loads(re.findall('global.document.metadata=(.*;)', i)
if x:
s = x[0].replace("'", '"').replace(";", ''))

Python: error loading JSON object

I'm trying to load the following JSON string in python:
{
"Motivo_da_Venda_Perdida":"",
"Data_Visita":"2015-03-17 08:09:55",
"Cliente":{
"Distribuidor1_Modelo":"",
"RG":"",
"Distribuidor1_Marca":"Selecione",
"PlataformaMilho1_Quantidade":"",
"Telefone_Fazenda":"",
"Pulverizador1_Quantidade":"",
"Endereco_Fazenda":"",
"Nome_Fazenda":"",
"Area_Total_Fazenda":"",
"PlataformaMilho1_Marca":"Selecione",
"Trator1_Modelo":"",
"Tipo_Cultura3":"Selecione",
"Tipo_Cultura4":"Selecione",
"Cultura2_Hectares":"",
"Colheitadeira1_Quantidade":"",
"Tipo_Cultura1":"Soja",
"Tipo_Cultura2":"Selecione",
"Plantadeira1_Marca":"Stara",
"Autopropelido1_Modelo":"",
"Email_Fazenda":"",
"Autopropelido1_Marca":"Stara",
"Distribuidor1_Quantidade":"",
"PlataformaMilho1_Modelo":"",
"Trator1_Marca":"Jonh deere",
"Email":"",
"CPF":"46621644000",
"Endereco_Rua":"PAQUINHAS, S/N",
"Caixa_Postal_Fazenda":"",
"Cidade_Fazenda":"",
"Plantadeira1_Quantidade":"",
"Colheitadeira1_Marca":"New holland",
"Data_Nascimento":"2015-02-20",
"Cultura4_Hectares":"",
"Nome_Cliente":"MILTON CASTIONE",
"Cep_Fazenda":"",
"Telefone":"5491290687",
"Cultura3_Hectares":"",
"Trator1_Quantidade":"",
"Cultura1_Hectares":"",
"Autopropelido1_Quantidade":"",
"Pulverizador1_Modelo":"",
"Caixa_Postal":"",
"Estado":"RS",
"Endereco_Numero":"",
"Cidade":"COLORADO",
"Colheitadeira1_Modelo":"",
"Pulverizador1_Marca":"Selecione",
"CEP":"99460000",
"Inscricao_Estadual":"0",
"Plantadeira1_Modelo":"",
"Estado_Fazenda":"RS",
"Bairro":""
},
"Quilometragem":"00",
"Modelo_Pretendido":"Selecione",
"Quantidade_Prevista_Aquisicao":"",
"Id_Revenda":"1",
"Contato":"05491290687",
"Pendencia_Para_Proxima_Visita":"",
"Data_Proxima_Visita":"2015-04-17 08:09:55",
"Valor_de_Venda":"",
"Maquina_Usada":"0",
"Id_Vendedor":"2",
"Propensao_Compra":"Propensao_Compra_Frio",
"Comentarios":"despertar compra",
"Sistema_Compra":"Sistema_Compra_Finame",
"Outro_Produto":"",
"Data_Prevista_Aquisicao":"2015-04-17 08:09:55",
"Objetivo_Visita":"Despertar_Interesse",
"Tipo_Contato":"Telefonico"}
however I get the following error when I try to load it
File "python_file.py", line 107, in busca_proxima_mensagem
Visita = json.loads(corpo)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 2 - line 6 column 84 (char 1 - 1020)
but this JSON seems to be valid according to this site: http://jsonformatter.curiousconcept.com/ What am I doing wrong? Why can't I load this string as a JSON object?
I'm trying to load the string from AWS SQS like this:
import json
...
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
Visita = json.loads(corpo)
OK, so I figured out what is causing me problems: There is a slash as a value of a key
"Endereco_Rua":"PAQUINHAS, S/N",
However I'm telling python to filter that out (code below), but it's not working. How can I remove that? Can do it on the origin that created the data, as I don't have access to the interface the user uses to fill in.
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
corpo = corpo.replace("/", "") #Filtering slashes
Visita = json.loads(corpo)

Found a solution! Beside the slash caracter, sometimes this error also happened with no visible cause. Ended up solving this by adding the following lines in my python code:
1) At the start of my code, along with other python imports
from boto.sqs.message import RawMessage
2) Changing my SQS queue to use/fetch raw data:
fila = sqs_conn.get_queue(constantes.fila_SQS)
fila.set_message_class(RawMessage)
Hope this helps anyone who is having the same issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Applying google_translator to a column of a pandas dataframe - python

Related

I am getting a "'NoneType' object is not subscriptable" when trying to bring in data from a URL

pytube: 'NoneType' object has no attribute 'span'

JSONDecodeError when using for loop in python [duplicate]

python crawler ieee paper keywords

Python: error loading JSON object

Categories

Resources