Convert physical addresses to Geographic locations Latitude and Longitude - python

I Have read a CSV file (that have addresses of customers) and assign the data into DataFrame table.
Description of the csv file (or the DataFrame table)
DataFrame contains several rows and 5 columns
Database example
Address1 Address3 Post_Code City_Name Full_Address
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ
I have written a code (Geocode with Python) inorder to convert physical addresses to Geographic locations → Latitude and Longitude, but the code keep showing several errors
So far I have written this code :
The code is
import pandas as pd
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
# Read the CSV, by the way the csv file contains 43 columns
ERP_Data = pd.read_csv("test.csv")
# Extracting the address information into a new DataFrame
Address_info= ERP_Data[['Address1','Address3','Post_Code','City_Name']].copy()
# Adding a new column called (Full_Address) that concatenate address columns into one
# for example Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden
Address_info['Full_Address'] = Address_info[Address_info.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)), axis=1)
locator = Nominatim(user_agent="myGeocoder") # holds the Geocoding service, Nominatim
# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
# 2- create location column
Address_info['location'] = Address_info['Full_Address'].apply(geocode)
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
Address_info['point'] = Address_info['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
Address_info[['latitude', 'longitude', 'altitude']] = pd.DataFrame(Address_info['point'].tolist(), index=Address_info.index)
# using Folium to map out the points we created
folium_map = folium.Map(location=[49.61167,6.13], zoom_start=12,)
An example of the full output error is :
RateLimiter caught an error, retrying (0/2 tries). Called with (*('44 AVENUE JOHN FITZGERALD KENNEDY,L-1855,LUXEMBOURG',), **{}).
Traceback (most recent call last):
File "e:\Anaconda3\lib\urllib\request.py", line 1317, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "e:\Anaconda3\lib\http\client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "e:\Anaconda3\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "e:\Anaconda3\lib\http\client.py", line 966, in send
self.connect()
File "e:\Anaconda3\lib\http\client.py", line 1414, in connect
server_hostname=server_hostname)
File "e:\Anaconda3\lib\ssl.py", line 423, in wrap_socket
session=session
File "e:\Anaconda3\lib\ssl.py", line 870, in _create
self.do_handshake()
File "e:\Anaconda3\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1059: The handshake operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\base.py", line 355, in _call_geocoder
page = requester(req, timeout=timeout, **kwargs)
File "e:\Anaconda3\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "e:\Anaconda3\lib\urllib\request.py", line 543, in _open
'_open', req)
File "e:\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "e:\Anaconda3\lib\urllib\request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "e:\Anaconda3\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error _ssl.c:1059: The handshake operation timed out>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\Anaconda3\lib\site-packages\geopy\extra\rate_limiter.py", line 126, in __call__
return self.func(*args, **kwargs)
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\osm.py", line 387, in geocode
self._call_geocoder(url, timeout=timeout), exactly_one
File "e:\Anaconda3\lib\site-packages\geopy\geocoders\base.py", line 378, in _call_geocoder
raise GeocoderTimedOut('Service timed out')
geopy.exc.GeocoderTimedOut: Service timed out
Expected output is
Address1 Address3 Post_Code City_Name Full_Address Latitude Longitude
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535, MERSCH 49.7508296 6.1085476
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG 49.6302147 6.1713374
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898 ,FOETZ 49.5217917 6.0101385

I've updated your code:
Added: Address_info = Address_info.apply(lambda x: x.str.strip(), axis=1)
Removes whitespace before and after str
Added a function with try-except, to handle the lookup
from geopy.exc import GeocoderTimedOut, GeocoderQuotaExceeded
import time
ERP_Data = pd.read_csv("test.csv")
# Extracting the address information into a new DataFrame
Address_info= ERP_Data[['Address1','Address3','Post_Code','City_Name']].copy()
# Clean existing whitespace from the ends of the strings
Address_info = Address_info.apply(lambda x: x.str.strip(), axis=1) # ← added
# Adding a new column called (Full_Address) that concatenate address columns into one
# for example Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden
Address_info['Full_Address'] = Address_info[Address_info.columns[1:]].apply(lambda x: ','.join(x.dropna().astype(str)), axis=1)
locator = Nominatim(user_agent="myGeocoder") # holds the Geocoding service, Nominatim
# 1 - convenient function to delay between geocoding calls
# geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
def geocode_me(location):
time.sleep(1.1)
try:
return locator.geocode(location)
except (GeocoderTimedOut, GeocoderQuotaExceeded) as e:
if GeocoderQuotaExceeded:
print(e)
else:
print(f'Location not found: {e}')
return None
# 2- create location column
Address_info['location'] = Address_info['Full_Address'].apply(lambda x: geocode_me(x)) # ← note the change here
# 3 - create longitude, latitude and altitude from location column (returns tuple)
Address_info['point'] = Address_info['location'].apply(lambda loc: tuple(loc.point) if loc else None)
# 4 - split point column into latitude, longitude and altitude columns
Address_info[['latitude', 'longitude', 'altitude']] = pd.DataFrame(Address_info['point'].tolist(), index=Address_info.index)
Output:
Address1 Address3 Post_Code City_Name Full_Address location point latitude longitude altitude
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10000009 37 RUE DE LA GARE L-7535 MERSCH 37 RUE DE LA GARE,L-7535,MERSCH (Rue de la Gare, Mersch, Canton Mersch, 7535, Lëtzebuerg, (49.7508296, 6.1085476)) (49.7508296, 6.1085476, 0.0) 49.750830 6.108548 0.0
10001998 RUE EDWARD STEICHEN L-1855 LUXEMBOURG RUE EDWARD STEICHEN,L-1855,LUXEMBOURG (Rue Edward Steichen, Grünewald, Weimershof, Neudorf-Weimershof, Luxembourg, Canton Luxembourg, 2540, Lëtzebuerg, (49.6302147, 6.1713374)) (49.6302147, 6.1713374, 0.0) 49.630215 6.171337 0.0
11000051 9 RUE DU BRILL L-3898 FOETZ 9 RUE DU BRILL,L-3898,FOETZ (Rue du Brill, Mondercange, Canton Esch-sur-Alzette, 3898, Luxembourg, (49.5217917, 6.0101385)) (49.5217917, 6.0101385, 0.0) 49.521792 6.010139 0.0
10000052 3 RUE DU PUITS ROMAIN L-8070 BERTRANGE 3 RUE DU PUITS ROMAIN,L-8070,BERTRANGE (Rue du Puits Romain, Z.A. Bourmicht, Bertrange, Canton Luxembourg, 8070, Lëtzebuerg, (49.6084531, 6.0771901)) (49.6084531, 6.0771901, 0.0) 49.608453 6.077190 0.0
Note & Additional Resources:
The output includes the address that caused the error in your TraceBack
RateLimiter caught an error, retrying (0/2 tries). Called with (*('3 RUE DU PUITS ROMAIN ,L-8070 ,BERTRANGE ',)
Note all the extra whitespace in the address. I've added a line of code to remove whitespace from the beginning and end of the strings
GeocoderTimedOut, a real pain?
Geopy: catch timeout error
Final:
The final result is the service times out because of HTTP Error 429: Too Many Requests for the day.
Review Nominatim Usage Policy
Suggestion: Use a different Geocoder

Related

Python Get Charts in Google Sheet and copy in Google Slides

I have a google sheet that have some predefined charts and I inject data via API to update those charts. Now I want to copy those charts to Google Slides:
My code:
try:
service = build('sheets', 'v4', credentials=creds)
rangeName = "opiniones!A2:I" + str(len(data)+1)
Body = {
'values': data,
}
# Escribimos los datos en la hoja
service.spreadsheets().values().update(spreadsheetId=file_copy_id,
range=rangeName,
valueInputOption='USER_ENTERED',
body=Body).execute()
# insertamos la relación de tipos de restaurante con su media comparando
# con su valoracion el rango es desde la posición 27 hasta la 27+longitud
# de lista+1 pues la primera posición de lista es la 0
rangeName = "datos!B27:E" + str(27+len(restaurantTypes)+1)
Body = {
'values': restaurantTypes
}
service.spreadsheets().values().update(spreadsheetId=file_copy_id,
range=rangeName,
valueInputOption='USER_ENTERED',
body=Body).execute()
except Exception as e:
print(e)
Any idea on how to copy an already defined chart?
Thank you!

Dictionnary that returns a list of a key entered in argument in a function [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Here is a sample of my dictionnary with 3 Keys.
({'Musique': [['Musique', 'Shawn Phillips', 236, 236], ['Musique', "L'avenue Royale fête l'été!", 237, 237], ['Musique', 'Perséides musicales', 215, 215], ['Musique', 'Gaétan Leclerc chante Félix et…', 229, 229], ['Musique', 'The Ring of Fire : a Johnny Cash Experience', 202, 202], ['Musique', "Jazz'Art", 208, 210], {'Théatre': [['Théâtre', 'Coup de théâtre à la joyeuse maison hantée', 189, 189], ['Théâtre', 'Les galoches du bonheur', 203, 203], ['Théâtre', 'Le voyage de Pixelle dans le monde virtuel', 217, 217], ['Théâtre', 'Marimba à la ferme de la bonne entente', 224, 224], ['Théâtre', 'Pattes et cravates', 196, 196], {'Danse': [['Danse', 'Initiation au tango argentin suivi de la milonga', 182, 231], ['Danse', 'Samedi de danser...', 188, 188], ['Danse', 'Rusdell Nunez (latino)', 191, 191]
Keys are : 'Musique','Théâtre' and 'Danse
This is a list of sublists of events and all the int in my dictionnary are when those events are available. I need to return a list with the name of all the events with the right type who are offered at the date entereted in argument day_year.
Here is the full instructions and the function:
def obtain_events_date_type(dictio_events_par_type, day_year, type_event): #first argument in the dictionnary so dont rly worry about it.
Then, for each event of the right type as the argument type_event, if the beginning of the event(which is the first int in the dictionnary is lower or even than the int and if the end of the event(the last in of every index you could say) is higher or even than the argument day_year, we can add the name of this event in the list of event since its available on this day. i need to return that list of events.
So if i entered
def obtain_events_date_type(creer_dictio, 236, 'Musique'):
#creer_dictio is my dictio in another function
I would need to add all events that are available on the day 236, for exemple the first values in my dictionnary mentionned in this post. Its possible there is more than 1 events on the same day. If there is no event available on the day enterered in argument we return an empty list.
What have i tried :
Im actually familiar with loops and stuff in python, but i keep getting code error about tuples and a bunch of operation not allowed on dictionnary.
Someone told me that i could create a list for every type, but im still having a time reaching every events and int asked in arguments.
Thanks for the info/tips !
EDIT :
liste_type_asked = []
for element in dictio_evenements_par_type:
if 'Musique' in element:
for jour in element:
if jour_annee <= jour[2]:
if jour_annee >= jour[3]:
liste_type_asked.append(element)
return liste_type_asked
Error:
TypeError: '<=' not supported between instances of 'int' and 'str'
You can use list comprehension like this:
def obtain_events_date_type(dictio_events_par_type, day_year, type_event):
return [n for t in dictio_events_par_type for k, l in t.items() if k == type_event for _, n, s, e in l if s <= day_year <= e]
so that:
events = [
{
'Musique': [
['Musique', 'Shawn Phillips', 236, 236],
['Musique', "L'avenue Royale fête l'été!", 237, 237],
['Musique', 'Perséides musicales', 215, 215],
['Musique', 'Gaétan Leclerc chante Félix et…', 229, 229],
['Musique', 'The Ring of Fire : a Johnny Cash Experience', 202, 202],
['Musique', "Jazz'Art", 208, 210]
]
},
{
'Théâtre': [
['Théâtre', 'Coup de théâtre à la joyeuse maison hantée', 189, 189],
['Théâtre', 'Les galoches du bonheur', 203, 203],
['Théâtre', 'Le voyage de Pixelle dans le monde virtuel', 217, 217],
['Théâtre', 'Marimba à la ferme de la bonne entente', 224, 224],
['Théâtre', 'Pattes et cravates', 196, 196]
]
},
{
'Danse': [
['Danse', 'Initiation au tango argentin suivi de la milonga', 182, 231],
['Danse', 'Samedi de danser...', 188, 188],
['Danse', 'Rusdell Nunez (latino)', 191, 191]
]
}
]
print(obtain_events_date_type(events, 188, 'Danse'))
will output:
['Initiation au tango argentin suivi de la milonga', 'Samedi de danser...']

Split a column in pandas twice to multiple columns

I have a column "Nome_propriedade" with complete addresses, such as establishment name, streets, neighborhood, city and state
It always ends with the name of the city and state. With this pattern:
Nome_propriedade
"Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS"
"Fazenda da Várzea - zona rural, Serro/MG"
"Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ"
"Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI"
"Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ"
I want to create two new columns, "city" and "state", and fill them with the last values found in column "Nome_propriedade". I also want to stip those away from Nome_propiedade.
Nome_propriedade City State
Rod. BR 386, bairro Olarias/Conventos Lajeado RS
Fazenda da Várzea - zona rural Serro MG
Cidade do Rock - Jacarepaguá... Rio de Janeiro RJ
Área de extração de carnaúba - Povoado A... Santa Cruz do Piauí PI
Pastelaria - Av. Vicente de Carvalho, 99... Rio de Janeiro RJ
Please anyone know how I can create these two columns?
I can not do a general split because I just want to separate the city and state information. Other information may remain unchanged.
What do you think about:
import pandas as pd
propiedades = ["Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS",
"Fazenda da Várzea - zona rural, Serro/MG",
"Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ",
"Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI",
"Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ"]
df = pd.DataFrame({"Nome_propriedade":propiedades})
df[["City", "State"]] = df["Nome_propriedade"].apply(lambda x :x.split(",")[-1]).str.split("/",
expand=True)
UPDATE
If you then want to delete these infos from Nome_propriedade you can add this line
df["Nome_propriedade"] = df["Nome_propriedade"].apply(lambda x :",".join(x.split(",")[:-1]))
You need to split the string in the column by ,, takw the last element in the list and split it by /. That list is your two columns.
pd.DataFrame(list(df['Nome_propriedade'].str.split(',').apply(lambda x: x[-1]).str.split('/')), columns=['city', 'state'])
Output:
city state
0 Lajeado RS
1 Serro MG
2 Rio de Janeiro RJ
3 Santa Cruz do Piauí PI
4 Rio de Janeiro RJ
Here is an effective solution avoiding the tedious apply and simply sticking with str-operations.
df["Nome_propriedade"], x = df["Nome_propriedade"].str.rsplit(', ', 1).str
df["City"], df['State'] = x.str.split('/').str
Full example:
import pandas as pd
propiedades = [
"Rod. BR 386, bairro Olarias/Conventos, Lajeado/RS",
"Fazenda da Várzea - zona rural, Serro/MG",
"Cidade do Rock - Jacarepaguá, Rio de Janeiro/RJ",
"Área de extração de carnaúba - Povoado Areal, zona rural, Santa Cruz do Piauí/PI",
"Pastelaria - Av. Vicente de Carvalho, 995, Loja Q, Vila da Penha, Rio de Janeiro/RJ"
]
df = pd.DataFrame({
"Nome_propriedade":propiedades
})
df["Nome_propriedade"], x = df["Nome_propriedade"].str.rsplit(', ', 1).str
df["City"], df['State'] = x.str.split('/').str
# Stripping Nome_propriedade to len 40 to fit screen
print(df.assign(Nome_propriedade=df['Nome_propriedade'].str[:40]))
Returns:
Nome_propriedade City State
0 Rod. BR 386, bairro Olarias/Conventos Lajeado RS
1 Fazenda da Várzea - zona rural Serro MG
2 Cidade do Rock - Jacarepaguá Rio de Janeiro RJ
3 Área de extração de carnaúba - Povoado A Santa Cruz do Piauí PI
4 Pastelaria - Av. Vicente de Carvalho, 99 Rio de Janeiro RJ
If you'd like to keep the items:
df["City"], df['State'] = df["Nome_propriedade"]\
.str.rsplit(', ', 1).str[-1]\
.str.split('/').str
The easiest approach I can see is, for a single example:
example = 'some, stuff, here, city/state'
elements = example.split(',')
city, state = elements[-1].split('/')
To apply this to the column in your dataframe:
df['city_state'] = df.Nome_propriedade.apply(lambda r: r.split(',')[-1].split('/'))
df['city'] = [cs[0] for cs in df['city_state']]
df['state'] = [cs[1] for cs in df['city_state']]
For example:
example2 = 'another, thing here city2/state2'
df = pd.DataFrame({'address': [example, example2],
'other': [1, 2]})
df['city_state'] = df.address.apply(lambda r: r.split()[-1].split('/'))
df['city'] = [cs[0] for cs in df['city_state']]
df['state'] = [cs[1] for cs in df['city_state']]
df.drop(columns=['city_state'], inplace=True)
print(df)
# address other city state
# 0 some, stuff, here, city/state 1 city state
# 1 another, thing here city2/state2 2 city2 state2
Note: some of the other answers provide a more efficient way to unpack the result into your dataframe. I'll leave this here because I think breaking it out into steps is illustrative, but for efficiency sake, I'd go with one of the others.

django model wont agree with fixture

I am trying to populate a postgresql database with initial values using fixtures in django. I keep getting these weird Could not load publication.Article(pk=None): value too long for type character varying(100)
errors even though my model looks like this:
class Article(models.Model):
_id = models.CharField(max_length=1000)
author_name = models.CharField(max_length=1000)
caption = models.CharField(max_length=1000)
isGraphic = models.BooleanField(max_length=1000, default=True)
pictures = models.URLField(max_length=1000)
text = models.CharField(max_length=10000)
title = models.CharField(max_length=1000)
user_img = models.URLField(max_length=1000)
videoname = models.CharField(max_length=1000)
vimeo_id = models.IntegerField(max_length=1000)
Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/__init__.py", line 385, in execute_from_command_line
utility.execute()
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/__init__.py", line 377, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/base.py", line 288, in run_from_argv
self.execute(*args, **options.__dict__)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/base.py", line 338, in execute
output = self.handle(*args, **options)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 61, in handle
self.loaddata(fixture_labels)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 91, in loaddata
self.load_label(fixture_label)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/management/commands/loaddata.py", line 148, in load_label
obj.save(using=self.using)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/core/serializers/base.py", line 173, in save
models.Model.save_base(self.object, using=using, raw=True)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/base.py", line 617, in save_base
updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/base.py", line 698, in _save_table
result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/base.py", line 731, in _do_insert
using=using, raw=raw)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/manager.py", line 92, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/query.py", line 921, in _insert
return query.get_compiler(using=using).execute_sql(return_id)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 920, in execute_sql
cursor.execute(sql, params)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/backends/utils.py", line 81, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/Users/sam.royston/PycharmProjects/sahelien_d/lib/python2.7/site-packages/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
django.db.utils.DataError: Problem installing fixture '/Users/sam.royston/PycharmProjects/sahelien_d/sahelien_django/fixtures/test.json' : Could not load publication.Article(pk=None): value too long for type character varying(100)
why am I getting this error?
test.json:
[
{ "model" : "publication.Article" , "fields":
{
"_id" : "5306dfa9ed2379f03a000001" ,
"author_name" : "Sahélien Tombouctou",
"caption" : "Les n’ont fait aucune victime, ni de dégâts matériels",
"isGraphic" : false,
"pictures" : [],
"text" : "La ville de Tombouctou a reçu des tirs d'obus dans la nuit de dimanche. \n<br>\n<br>\nLes deux premiers obus sont tombés dans la localité de Kabara, à 10km de la cité des 333 saints. Le troisième obus est tombé sur la route de Goundam.\n<br>\n<br>\nLes tirs n’ont fait aucune victime, ni de dégâts matériels. Selon le lieutenant-colonel Seydou Koné, en poste à Tombouctou, l'armée malienne est mobilisée pour déterminer l'origine de cette attaque.",
"title" : "Tombouctou attaquée à la roquette",
"videoname" : "okok.mp4",
"vimeo_id" : "87246621"
}
}
]
Your json fixture is missing a primary key. Django automatically adds a primary key to your models; called id. As this key is required, you should provide it in fixtures.
The fixture you have posted is does not have this key, you should add it:
[
{ "model" : "publication.Article" , "fields":
{
"id": "1",
"_id" : "5306dfa9ed2379f03a000001" ,
"author_name" : "Sahélien Tombouctou",
"caption" : "Les n’ont fait aucune victime, ni de dégâts matériels",
"isGraphic" : false,
"pictures" : [],
"text" : "La ville de Tombouctou a reçu des tirs d'obus dans la nuit de dimanche. \n<br>\n<br>\nLes deux premiers obus sont tombés dans la localité de Kabara, à 10km de la cité des 333 saints. Le troisième obus est tombé sur la route de Goundam.\n<br>\n<br>\nLes tirs n’ont fait aucune victime, ni de dégâts matériels. Selon le lieutenant-colonel Seydou Koné, en poste à Tombouctou, l'armée malienne est mobilisée pour déterminer l'origine de cette attaque.",
"title" : "Tombouctou attaquée à la roquette",
"videoname" : "okok.mp4",
"vimeo_id" : "87246621"
}
}
]
You are missing key fields that are required in your model from your fixture. You need to add user_img and pictures cannot be empty.
The fixture needs to pass all the validation rules of your model; and since all fields are required as per your model, they all need to be available in the fixture.
In addition, you have a max_length argument for integer, boolean and url fields which are not applicable.

Unicode elements in list save to file

I have two questions:
1) What I have done wrong in the script below? The result in not encoded propertly and all non standard characters are stored incorrectly. When I print out data list it gives me a proper list of unicode types:
[u'Est-ce que tu peux traduire \xc3\xa7a pour moi? \n \n \n Can you translate this for me?'], [u'Chicago est tr\xc3\xa8s diff\xc3\xa9rente de Boston. \n \n \n Chicago is very different from Boston.'],
After that I strip all extra spaces and next lines and result in file is like this (looks same when print and save to file):
Est-ce que tu peux traduire ça pour moi?;Can you translate this for me?
Chicago est très différente de Boston.;Chicago is very different from Boston.
2) What other than Python scripting langage would you recommend?
import requests
import unicodecsv, os
from bs4 import BeautifulSoup
import re
import html5lib
countries = ["fr"] #,"id","bn","my","chin","de","es","fr","hi","ja","ko","pt","ru","th","vi","zh"]
for country in countries:
f = open("phrase_" + country + ".txt","w")
w = unicodecsv.writer(f, encoding='utf-8')
toi = 1
print country
while toi<2:
url = "http://www.englishspeak.com/"+ country +"/english-phrases.cfm?newCategoryShowed=" + str(toi) + "&sortBy=28"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
soup.unicode
[s.extract() for s in soup('script')]
[s.extract() for s in soup('style')]
[s.extract() for s in soup('head')]
[s.extract() for s in soup("table" , { "height" : "102" })]
[s.extract() for s in soup("td", { "class" : "copyLarge"})]
[s.extract() for s in soup("td", { "width" : "21%"})]
[s.extract() for s in soup("td", { "colspan" : "3"})]
[s.extract() for s in soup("td", { "width" : "25%"})]
[s.extract() for s in soup("td", { "class" : "blacktext"})]
[s.extract() for s in soup("div", { "align" : "center"})]
data = []
rows = soup.find_all('tr', {"class": re.compile("Data.")})
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
wordsList = []
for index, item in enumerate(data):
str_tmp = "".join(data[index]).encode('utf-8')
str_tmp = re.sub(r' +\n\s+', ';', str_tmp)
str_tmp = re.sub(r' +', ' ', str_tmp)
wordsList.append(str_tmp.decode('utf-8'))
print str_tmp
w.writerow(wordsList)
toi += 1
You should use r.text not r.content because content are the bytes and text is the decoded text:
soup = BeautifulSoup(r.text, 'html5lib')
You can just write utf-8 encoded to file:
with open("out.txt","w") as f:
for d in data:
d = " ".join(d).encode("utf-8")
d = re.sub(r'\n\s+', ';', d)
d = re.sub(r' +', ' ', d)
f.write(d)
Output:
Fais attention en conduisant. ;Be careful driving.Fais attention. ;Be careful.Est-ce que tu peux traduire ça pour moi? ;Can you translate this for me?Chicago est très différente de Boston. ;Chicago is very different from Boston.Ne t'inquiète pas. ;Don't worry.Tout le monde le sais. ;Everyone knows it.Tout est prêt. ;Everything is ready.Excellent. ;Excellent.De temps en temps. ;From time to time.Bonne idée. ;Good idea.Il l'aime beaucoup. ;He likes it very much.A l'aide! ;Help!Il arrive bientôt. ;He's coming soon.Il a raison. ;He's right.Il est très ennuyeux. ;He's very annoying.Il est très célèbre. ;He's very famous.Comment ça va? ;How are you?Comment va le travail? ;How's work going?Dépêche-toi! ;Hurry!J'ai déjà mangé. ;I ate already.Je ne vous entends pas. ;I can't hear you.Je ne sais pas m'en servir. ;I don't know how to use it.Je ne l'aime pas. ;I don't like him.Je ne l'aime pas. ;I don't like it.Je ne parle pas très bien. ;I don't speak very well.Je ne comprends pas. ;I don't understand.Je n'en veux pas. ;I don't want it.Je ne veux pas ça. ;I don't want that.Je ne veux pas te déranger. ;I don't want to bother you.Je me sens bien. ;I feel good.Je sors du travail à six heures. ;I get off of work at 6.J'ai mal à la tête. ;I have a headache.J'espère que votre femme et vous ferez un bon voyage. ;I hope you and your wife have a nice trip.Je sais. ;I know.Je l'aime. ;I like her.J'ai perdu ma montre. ;I lost my watch.Je t'aime. ;I love you.J'ai besoin de changer de vêtements. ;I need to change clothes.J'ai besoin d'aller chez moi. ;I need to go home.Je veux seulement un en-cas. ;I only want a snack.Je pense que c'est bon. ;I think it tastes good.Je pense que c'est très bon. ;I think it's very good.Je pensais que les vêtements étaient plus chers. ;I thought the clothes were cheaper.J'allais quitter le restaurant quand mes amis sont arrivés. ;I was about to leave the restaurant when my friends arrived.Je voudrais faire une promenade. ;I'd like to go for a walk.Si vous avez besoin de mon aide, faites-le-moi savoir s'il vous plaît. ;If you need my help, please let me know.Je t'appellerai vendredi. ;I'll call you when I leave.Je reviendrai plus tard. ;I'll come back later.Je paierai. ;I'll pay.Je vais le prendre. ;I'll take it.Je t'emmenerai à l'arrêt de bus. ;I'll take you to the bus stop.Je suis un Américain. ;I'm an American.Je nettoie ma chambre. ;I'm cleaning my room.J'ai froid. ;I'm cold.Je viens te chercher. ;I'm coming to pick you up.Je vais partir. ;I'm going to leave.Je vais bien, et toi? ;I'm good, and you?Je suis content. ;I'm happy.J'ai faim. ;I'm hungry.Je suis marié. ;I'm married.Je ne suis pas occupé. ;I'm not busy.Je ne suis pas marié. ;I'm not married.Je ne suis pas encore prêt. ;I'm not ready yet.Je ne suis pas sûr. ;I'm not sure.Je suis désolé, nous sommes complets. ;I'm sorry, we're sold out.J'ai soif. ;I'm thirsty.Je suis très occupé. Je n'ai pas le temps maintenant. ;I'm very busy. I don't have time now.Est-ce que Monsieur Smith est un Américain? ;Is Mr. Smith an American?Est-ce que ça suffit? ;Is that enough?C'est plus long que deux kilomètres. ;It's longer than 2 miles.Je suis ici depuis deux jours. ;I've been here for two days.J'ai entendu dire que le Texas était beau comme endroit. ;I've heard Texas is a beautiful place.Je n'ai jamais vu ça avant. ;I've never seen that before.Juste un peu. ;Just a little.Juste un moment. ;Just a moment.Laisse-moi vérifier. ;Let me check.laisse-moi y réfléchir. ;Let me think about it.Allons voir. ;Let's go have a look.Pratiquons l'anglais. ;Let's practice English.Pourrais-je parler à madame Smith s'il vous plaît? ;May I speak to Mrs. Smith please?Plus que ça. ;More than that.Peu importe. ;Never mind.La prochaine fois. ;Next time.Non, merci. ;No, thank you.Non. ;No.N'importe quoi. ;Nonsense.Pas récemment. ;Not recently.Pas encore. ;Not yet.Rien d'autre. ;Nothing else.Bien sûr. ;Of course.D'accord. ;Okay.S'il vous plaît remplissez ce formulaire. ;Please fill out this form.S'il vous plaît emmenez-moi à cette adresse. ;Please take me to this address.S'il te plaît écris-le. ;Please write it down.Vraiment? ;Really?Juste ici. ;Right here.Juste là. ;Right there.A bientôt. ;See you later.A demain. ;See you tomorrow.A ce soir. ;See you tonight.Elle est jolie. ;She's pretty.Désolé de vous déranger. ;Sorry to bother you.Arrête! ;Stop!Tente ta chance. ;Take a chance.Réglez ça dehors. ;Take it outside.Dis-moi. ;Tell me.Merci Mademoiselle. ;Thank you miss.Merci Monsieur. ;Thank you sir.Merci beaucoup. ;Thank you very much.Merci. ;Thank you.Merci pour tout. ;Thanks for everything.Merci pour ton aide. ;Thanks for your help.Ça a l'air super. ;That looks great.Ça sent mauvais. ;That smells bad.C'est pas mal. ;That's alright.Ça suffit. ;That's enough.C'est bon. ;That's fine.C'est tout. ;That's it.Ce n'est pas juste. ;That's not fair.Ce n'est pas vrai. ;That's not right.C'est vrai. ;That's right.C'est dommage. ;That's too bad.C'est trop. ;That's too many.C'est trop. ;That's too much.Le livre est sous la table. ;The book is under the table.Ils vont revenir tout de suite. ;They'll be right back.Ce sont les mêmes. ;They're the same.Ils sont très occupés. ;They're very busy.Ça ne marche pas. ;This doesn't work.C'est très difficile. ;This is very difficult.C'est très important. ;This is very important.Essaie-le/la. ;Try it.Très bien, merci. ;Very good, thanks.Nous l'aimons beaucoup. ;We like it very much.Voudriez-vous prendre un message s'il vous plaît? ;Would you take a message please?Oui, vraiment. ;Yes, really.Vos affaires sont toutes là. ;Your things are all here.Tu es belle. ;You're beautiful.Tu es très sympa. ;You're very nice.Tu es très intelligent. ;You're very smart.
Also you don't actually use the data in your list comps so they seem a little pointless:

Categories

Resources