How to use variables and OR cunjunctions in SQL statement in Python? - python

I have a list of Ids in a list named res that I want to use line by line as WHERE conditions on a SQL query before saving the results in an array :
ids
grupos
0 [160, 161, 365, 386, 471]
1 [296, 306]
Here is what I tried to insert it in a SQL query :
listado = [None]*len(res)
# We store the hashtags that describes the best the groups
# We iterate on the people of a group to construct the WHERE condition
print "res : ", res
for i in (0,len(res)):
conn = psycopg2.connect(**params)
cur = conn.cursor()
listado = [None]*len(res)
for i in (0,len(res)):
print "res[i:p] : ", res.iloc[i]['ids']
cur.execute("""SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
-- join para que las publicidades/eclipses que gusta un usarios estan vinculadas con las de la tabla de correspondencia con los hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
-- join para que los usarios estan vinculados con los de la tabla de correspondencia con los hashtag
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
-- recobremos los "me gusta"
WHERE subscriber_hastag.subscriber_id in (%s)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",(res.iloc[i]['ids']))
n = cur.fetchall()
listado[i] = [{"count": elem[0], "eclipse_id": elem[1]} for elem in n]
Data for a reproducible example
Providing the further data informations :
subscriber_id hashtag_id
160 345
160 347
161 345
160 334
161 347
306 325
296 362
306 324
296 326
161 322
160 322
The output should, here, be like :
{0:[324,1],[325,1],[326,1],[362,1], 1 : [345,2],[347,2],[334,1]}
Current error message
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 50))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-f7c3c5b81303> in <module>()
39 WHERE subscriber_hastag.subscriber_id in (%s)
40 GROUP BY subscriber_hashtag.hashtag_id
---> 41 ORDER BY COUNT(swipe.eclipse_id) DESC;""",(res.iloc[i]['ids']))
42
43 n = cur.fetchall()
TypeError: not all arguments converted during string formatting

Have a look at tuples adaptation:
Python tuples are converted into a syntax suitable for the SQL IN operator and to represent a composite type:
Pass ids as a tuple query argument, so your argument to execute is a 1-tuple of tuple of ids, and drop the manual parentheses around %s. At the moment your (res.iloc[i]['ids']) is nothing but a sequence expression in redundant parentheses, so execute() uses it as the argument sequence, which causes your TypeError exception; your argument sequence has more arguments than the query has placeholders.
Try (tuple(res.iloc[i]['ids']),) instead. Note the comma, it is a very common error to omit it. All in all:
cur.execute("""SELECT COUNT(swipe.eclipse_id),
subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE subscriber_hashtag.subscriber_id in %s
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",
(tuple(res.iloc[i]['ids']),))
Your for-loop is a bit strange, since you iterate over a 2-tuple (0, len(res)). Perhaps you meant range(len(res)). You could also just iterate over the Pandas Series:
for i, ids in enumerate(res['ids']):
...
cur.execute(..., (tuple(ids),))

Related

ASSERTION ERROR: Issue in running SQL query

Question #1
List all the directors who directed a 'Comedy' movie in a leap year. (You need to check that the genre is 'Comedy’ and year is a leap year) Your query should return director name, the movie name, and the year.
%%time
def grader_1(q1):
q1_results = pd.read_sql_query(q1,conn)
print(q1_results.head(10))
assert (q1_results.shape == (232,3))
#m as movie , m_director as md,Genre as g,Person as p
query1 ="""SELECT m.Title,p.Name,m.year
FROM Movie m JOIN
M_director d
ON m.MID = d.MID JOIN
Person p
ON d.PID = p.PID JOIN
M_Genre mg
ON m.MID = mg.MID JOIN
Genre g
ON g.GID = mg.GID
WHERE g.Name LIKE '%Comedy%'
AND ( m.year%4 = 0
AND m.year % 100 <> 0
OR m.year % 400 = 0 ) LIMIT 2"""
grader_1(query1)
ERROR:
title Name year
0 Mastizaade Milap Zaveri 2016
1 Harold & Kumar Go to White Castle Danny Leiner 2004
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-a942fcc98f72> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'def grader_1(q1):\n q1_results = pd.read_sql_query(q1,conn)\n print(q1_results.head(10))\n assert (q1_results.shape == (232,3))\n\n#m as movie , m_director as md,Genre as g,Person as p\nquery1 ="""SELECT m.Title,p.Name,m.year\nFROM Movie m JOIN \n M_director d\n ON m.MID = d.MID JOIN \n Person p\n ON d.PID = p.PID JOIN\n M_Genre mg\n ON m.MID = mg.MID JOIN\n Genre g \n ON g.GID = mg.GID\n WHERE g.Name LIKE \'%Comedy%\'\nAND ( m.year%4 = 0\nAND m.year % 100 <> 0\nOR m.year % 400 = 0 ) LIMIT 2"""\ngrader_1(query1)')
2 frames
<decorator-gen-53> in time(self, line, cell, local_ns)
/usr/local/lib/python3.7/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None
<timed exec> in <module>()
<timed exec> in grader_1(q1)
AssertionError:
I have run this SQL query on IMDB DATASET without grad_1 function, I am able to run this query. However when I try to run within grader_1 function. I am getting assertion error.
How can I fix this?
Your query has a LIMIT clause, which prevents the SQL engine to fetch all data.
Just run it again without this clause.
query1 = """ SELECT M.title,Pe.Name,M.year FROM Movie M JOIN M_Director MD ON M.MID = MD.MID JOIN M_Genre MG ON M.MID = MG.MID JOIN Genre Ge ON MG.GID = Ge.GID JOIN Person Pe ON MD.PID = Pe.PID WHERE Ge.Name LIKE '%Comedy%' AND CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 4 = 0 AND (CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 100 <> 0 OR CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 400 = 0) """
Run this query all your problem resolves.

Get postal code from full address column in dataframe by regex str.extract() and add as new column in pandas

I have a dataframe with full addresses in a column, and I need to create a separate column with just the postal code of 5 digits starting by 7 in the same dataframe. Some of the addresses may be empty or postal code not found.
How do I split the column to just get the postal code?
the postal code start with 7 for example 76000 is the postal code in index 0
MedicalCenters["Postcode"][0]
Location(75, Avenida Corregidora, Centro, Delegación Centro Histórico, Santiago de Querétaro, Municipio de Querétaro, Querétaro, 76000, México, (20.5955795, -100.39274225, 0.0))
Example Data
Venue Venue Latitude Venue Longitude Venue Category Address
0 Lab. Corregidora 20.595621 -100.392677 Medical Center Location(75, Avenida Corregidora, Centro, Delegación Centro Histórico, Santiago de Querétaro, Municipio de Querétaro, Querétaro, 76000, México, (20.5955795, -100.39274225, 0.0))
I tried using regex but I get and error
# get zipcode from full address
import re
MedicalCenters['Postcode'] = MedicalCenters['Address'].str.extract(r'\b\d{5}\b', expand=False)
ERROR
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-185-84c21a29d484> in <module>
1 # get zipcode from full address
2 import re
----> 3 MedicalCenters['Postcode'] = MedicalCenters['Address'].str.extract(r'\b\d{5}\b', expand=False)
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/strings.py in wrapper(self, *args, **kwargs)
1950 )
1951 raise TypeError(msg)
-> 1952 return func(self, *args, **kwargs)
1953
1954 wrapper.__name__ = func_name
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/strings.py in extract(self, pat, flags, expand)
3037 #forbid_nonstring_types(["bytes"])
3038 def extract(self, pat, flags=0, expand=True):
-> 3039 return str_extract(self, pat, flags=flags, expand=expand)
3040
3041 #copy(str_extractall)
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/strings.py in str_extract(arr, pat, flags, expand)
1010 return _str_extract_frame(arr._orig, pat, flags=flags)
1011 else:
-> 1012 result, name = _str_extract_noexpand(arr._parent, pat, flags=flags)
1013 return arr._wrap_result(result, name=name, expand=expand)
1014
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/strings.py in _str_extract_noexpand(arr, pat, flags)
871
872 regex = re.compile(pat, flags=flags)
--> 873 groups_or_na = _groups_or_na_fun(regex)
874
875 if regex.groups == 1:
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/strings.py in _groups_or_na_fun(regex)
835 """Used in both extract_noexpand and extract_frame"""
836 if regex.groups == 0:
--> 837 raise ValueError("pattern contains no capture groups")
838 empty_row = [np.nan] * regex.groups
839
ValueError: pattern contains no capture groups
time: 39.5 ms
You need to add parentheses to get make it a group
MedicalCenters['Address'].str.extract(r"\b(\d{5})\b")
You can try to split the string first, then it will be easier to match the postcode:
address = '75, Avenida Corregidora, Centro, Delegación Centro Histórico, Santiago de Querétaro, Municipio de Querétaro, Querétaro, 76000, México, (20.5955795, -100.39274225, 0.0'
matches = list(filter(lambda x: x.startswith('7') and len(x) == 5, address.split(', '))) # ['76000']
So you can populate your DataFrame by:
df['postcode'] = df['address'].apply(lambda address: list(filter(lambda x: x.startswith('7') and len(x) == 5, address.split(', ')))[0])
Data of Address were an object thats why the regex was not working
MedicalCenters.dtypes
Venue object
Venue Latitude float64
Venue Longitude float64
Venue Category object
Health System object
geom object
Address object
Postcode object
dtype: object
time: 6.41 ms
after convert object to string :
MedicalCenters['Address'] = MedicalCenters['Address'].astype('str')
I was able to apply the regex modified thanks to glam
# get zipcode from full address
import re
MedicalCenters['Postcode'] = MedicalCenters['Address'].str.extract(r"\b(\d{5})\b")

Using MapReducer MRJob and my mapper function gives me an indexerror: list index out of range

I am new to MapReduce MRJob (and also to Python to be honest). I am trying to use MRJob to count the number of combinations of pairs of letters in different columns, from "A" to "E", that I have in a text file, i.e. "A", "A" = 10 occurences, "A", "B" = 13 occurences, "C", "E"= 6 occurences, etc. The error I get when I run it is a "list index out of range" and for the life of me, I can't figure out why.
Here is a sample of the text file used in conjunction with the python mapreduce file with the mapper and reducer functions (by the way, the string has a date, a time, the duration of a phone call, a customer ID of the person making a call that begins with a letter from "A" to "E" where the letter designates a country, another customer ID of the person receiving a call and key words in the conversation). I broke down the string into a list and in my mapper indicated the index I am interested in, but I am not sure if this approach is correct:
Details
2020-03-05 # 19:28 # 5:10 # A-466 # C-563 # tendremos lindo ahi fuimos derecho carajo junto acabar
2020-03-10 # 05:08 # 5:14 # C-954 # D-353 # carajo calle película acaso voz creía irá san montón ambos hablas empieza estaremos parecía mitad estén vuelto música anoche tendremos tenían dormir habitación encuentra ésa
2020-01-15 # 09:47 # 4:46 # C-413 # B-881 # pudiera dejes querido maestro hacerle llamada paz estados estuviera hablo decirle bonito linda blanco negro querida hacerte dormir empieza mayoría
2020-01-10 # 20:54 # 4:58 # E-027 # A-549 # estuviera tuviste vieja volvió solía alrededor decía maestro estaremos línea sigues
2020-03-17 # 21:38 # 5:21 # C-917 # D-138 # encima música barco tuvimos dejes damas boca
Here is the entire code of the python file:
from mrjob.job import MRJob
class MRduracion_llamadas(MRJob):
def mapper(self, _, line):
"""
First we need to convert the string from the text file into a list and eliminate the
unnecessary characters, such as "#", "-", ":", which I have substituted with a ";" to
facilitate the "split"part of this process.
"""
table = {35 : 59, 45 : 59, 58 : 59}
llamadas2020_text_line = [column.strip() for column in \
(line.translate(table)).split(";")]
#Now we can assign values to "Key" and "Values"
print(line)
pais_emisor = llamadas2020_text_line[7]
pais_receptor = llamadas2020_text_line[9]
minutos = ""
#If a call is "x" minutes and "y" secs long, where y > 0, then we can round up
#the minutes by 1 minute.
if int(llamadas2020_text_line[6]) > 0:
minutos = int(llamadas2020_text_line[5]) + 1
else:
minutos = int(llamadas2020_text_line[5])
yield (pais_emisor, pais_receptor), minutos
def reducer(self, key, values):
yield print(key, sum(values))
if __name__ == "__main__":
MRduracion_llamadas.run()

How to create a sql server table variable in a query using sqlalchemy in python

I'm trying to create a table variable in SQL Server, query it, and return the results to a pandas dataframe (see example). I want to do this so that I can aggregate data in the database prior to sending it to a pandas dataframe. I recall that setting NOCOUNT ON would allow for this to work since it wouldn't return anything as it executed each query. But this isn't working. So this is obviously an example code, but I've been able to recreate the error here. Following the suggested link gives you the documentating for ProgrammingErrors. I didn't find it very helpful.
import urllib
import sqlalchemy
import pandas as pd
quoted = urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};Server=127.0.0.1;Database=mydb;UID=myuser;PWD=mypasswd;Port=1433;')
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
query = """
SET NOCOUNT ON;
DECLARE #n_majors TABLE (id varchar(9), n_majors int)
INSERT INTO #n_majors
SELECT m.student_id_fk
, COUNT(DISTINCT dc.category) AS [N majors declared]
FROM msu_db.dbo.Majors AS m
JOIN department_categories AS dc
ON dc.dept_name = m.dept_name
WHERE m.Student_Level_Code = 'UN'
GROUP BY m.student_id_fk
DECLARE #grad_category TABLE (id varchar(9), category varchar(20))
INSERT INTO #grad_category
select m.student_id_fk
, MIN(dc.category)
from Majors AS m
join department_categories as dc
on dc.dept_name = m.dept_name
WHERE m.Student_Level_Code = 'UN'
and graduated = 'CONF'
GROUP BY m.student_id_fk
DECLARE #first_category TABLE (id varchar(9), category varchar(20))
INSERT INTO #first_category
select m.student_id_fk
, MIN(dc.category) as cat
from Majors AS m
join department_categories as dc
on dc.dept_name = m.dept_name
WHERE m.Student_Level_Code = 'UN'
and graduated IS NULL
GROUP BY m.student_id_fk
DECLARE #first_semester_grades TABLE (id varchar(9), avg_grade float, std_grade float, first_Semester_seq_id varchar(4))
INSERT INTO #first_semester_grades
SELECT c.student_id_fk
, AVG(c.Grade) AS [mean grade]
, STDEV(c.Grade) AS [stdev grade]
, MIN(c.Term_Seq_Id) AS Term_Seq_Id
FROM Courses AS c
WHERE c.Student_Level_Code = 'UN'
GROUP BY c.student_id_fk
SET NOCOUNT OFF;
SELECT s.[student_id_fk]
,[gender]
,[ethnicity]
,[first_course_datetime]
,[hs_gpa]
,[math_placement_score]
,[math_act]
,[natsci_act]
,COUNT(c.[transfer institution name]) AS [N AP courses]
, nm.n_majors AS [n-categories]
, fc.category
, gc.category AS [grad category]
, fsg.avg_grade AS first_term_avg
, fsg.std_grade AS first_term_std
, fsg.first_Semester_seq_id
FROM [msu_db].[dbo].[Students] AS s
LEFT JOIN msu_db.dbo.Courses AS c
ON s.student_id_fk = c.student_id_fk
AND c.[transfer institution name] = 'Advanced Placement'
LEFT JOIN #n_majors as nm
ON s.student_id_fk = nm.id
LEFT JOIN #grad_category as gc
ON s.student_id_fk = gc.id
LEFT JOIN #first_category AS fc
ON s.student_id_fk = fc.id
LEFT JOIN #first_semester_grades AS fsg
ON s.student_id_fk = fsg.id
WHERE s.first_course_datetime BETWEEN '1993' AND '2013'
GROUP BY s.[student_id_fk]
,[gender]
,[ethnicity]
,[first_course_datetime]
,[hs_gpa]
,[math_placement_score]
,[math_act]
,[natsci_act]
, nm.n_majors
, fc.category
, gc.category
, fsg.avg_grade
, fsg.std_grade
, fsg.first_Semester_seq_id
"""
pd.read_sql_query(query, engine)
The error message that is output is as follows:
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
1081 try:
-> 1082 return self.cursor.fetchall()
1083 except AttributeError:
AttributeError: 'NoneType' object has no attribute 'fetchall'
During handling of the above exception, another exception occurred:
ResourceClosedError Traceback (most recent call last)
<ipython-input-3-2a0ea765a8e2> in <module>()
----> 1 df = pd.read_sql_query(query, engine)
~/anaconda3/envs/research/lib/python3.6/site-packages/pandas/io/sql.py in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize)
312 return pandas_sql.read_query(
313 sql, index_col=index_col, params=params, coerce_float=coerce_float,
--> 314 parse_dates=parse_dates, chunksize=chunksize)
315
316
~/anaconda3/envs/research/lib/python3.6/site-packages/pandas/io/sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize)
1070 parse_dates=parse_dates)
1071 else:
-> 1072 data = result.fetchall()
1073 frame = _wrap_result(data, columns, index_col=index_col,
1074 coerce_float=coerce_float,
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/result.py in fetchall(self)
1135 self.connection._handle_dbapi_exception(
1136 e, None, None,
-> 1137 self.cursor, self.context)
1138
1139 def fetchmany(self, size=None):
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
1414 )
1415 else:
-> 1416 util.reraise(*exc_info)
1417
1418 finally:
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
185 if value.__traceback__ is not tb:
186 raise value.with_traceback(tb)
--> 187 raise value
188
189 else:
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/result.py in fetchall(self)
1129
1130 try:
-> 1131 l = self.process_rows(self._fetchall_impl())
1132 self._soft_close()
1133 return l
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/result.py in _fetchall_impl(self)
1082 return self.cursor.fetchall()
1083 except AttributeError:
-> 1084 return self._non_result([])
1085
1086 def _non_result(self, default):
~/anaconda3/envs/research/lib/python3.6/site-packages/sqlalchemy/engine/result.py in _non_result(self, default)
1087 if self._metadata is None:
1088 raise exc.ResourceClosedError(
-> 1089 "This result object does not return rows. "
1090 "It has been closed automatically.",
1091 )
ResourceClosedError: This result object does not return rows. It has been closed automatically.
It seems like as soon as the NoneType object gets passed, it fails. What I don't understand is why a NoneType object is being passed in the first place. Shouldn't the query results be passed?
You misspelled table in the variable declaration - it has a 1 instead of an l. If something isn't working that you believe should work, check your assumptions first.
Update:
import urllib
import sqlalchemy
import pandas as pd
quoted = urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};Server=127.0.0.1;Database=mydb;UID=myuser;PWD=mypasswd;Port=1433;')
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
query = """
SET NOCOUNT ON
DECLARE #table TABLE (id int, value float)
INSERT INTO #table VALUES (1, 2.7)
INSERT INTO #table VALUES (2, 4.5)
INSERT INTO #table VALUES (3, 1.2)
SELECT * FROM #table
"""
pd.read_sql_query(query, engine)
You have to turn NOCOUNT back off before returning your query result for the correct row(s) affected message to be returned from SQL Server:
import urllib
import sqlalchemy
import pandas as pd
quoted = urllib.parse.quote_plus('DRIVER={ODBC Driver 17 for SQL Server};Server=127.0.0.1;Database=mydb;UID=myuser;PWD=mypasswd;Port=1433;')
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
query = """
SET NOCOUNT ON
DECLARE #table TABLE (id int, value float)
INSERT INTO #table VALUES (1, 2.7)
INSERT INTO #table VALUES (2, 4.5)
INSERT INTO #table VALUES (3, 1.2)
SET NOCOUNT OFF
SELECT * FROM #table
"""
pd.read_sql_query(query, engine)

Unhandled exception in py2neo: Type error

I am writing an application whose purpose is to create a graph from a journal dataset. The dataset was a xml file which was parsed in order to extract leaf data. Using this list I wrote a py2neo script to create the graph. The file is attached to this message.
As the script was processed an exception was raised:
The debugged program raised the exception unhandled TypeError
"(1676 {"titulo":"reconhecimento e agrupamento de objetos de aprendizagem semelhantes"})"
File: /usr/lib/python2.7/site-packages/py2neo-1.5.1-py2.7.egg/py2neo/neo4j.py, Line: 472
I don't know how to handle this. I think that the code is syntactically correct...but...
I dont know if I shoud post the entire code here, so the code is at: https://gist.github.com/herlimenezes/6867518
There goes the code:
+++++++++++++++++++++++++++++++++++
'
#!/usr/bin/env python
#
from py2neo import neo4j, cypher
from py2neo import node, rel
# calls database service of Neo4j
#
graph_db = neo4j.GraphDatabaseService("DEFAULT_DOMAIN")
#
# following nigel small suggestion in http://stackoverflow.com
#
titulo_index = graph_db.get_or_create_index(neo4j.Node, "titulo")
autores_index = graph_db.get_or_create_index(neo4j.Node, "autores")
keyword_index = graph_db.get_or_create_index(neo4j.Node, "keywords")
dataPub_index = graph_db.get_or_create_index(neo4j.Node, "data")
#
# to begin, database clear...
graph_db.clear() # not sure if this really works...let's check...
#
# the big list, next version this is supposed to be read from a file...
#
listaBase = [['2007-12-18'], ['RECONHECIMENTO E AGRUPAMENTO DE OBJETOS DE APRENDIZAGEM SEMELHANTES'], ['Raphael Ghelman', 'SWMS', 'MHLB', 'RNM'], ['Objetos de Aprendizagem', u'Personaliza\xe7\xe3o', u'Perfil do Usu\xe1rio', u'Padr\xf5es de Metadados', u'Vers\xf5es de Objetos de Aprendizagem', 'Agrupamento de Objetos Similares'], ['2007-12-18'], [u'LOCPN: REDES DE PETRI COLORIDAS NA PRODU\xc7\xc3O DE OBJETOS DE APRENDIZAGEM'], [u'Maria de F\xe1tima Costa de Souza', 'Danielo G. Gomes', 'GCB', 'CTS', u'Jos\xe9 ACCF', 'MCP', 'RMCA'], ['Objetos de Aprendizagem', 'Modelo de Processo', 'Redes de Petri Colorida', u'Especifica\xe7\xe3o formal'], ['2007-12-18'], [u'COMPUTA\xc7\xc3O M\xd3VEL E UB\xcdQUA NO CONTEXTO DE UMA GRADUA\xc7\xc3O DE REFER\xcaNCIA'], ['JB', 'RH', 'SR', u'S\xe9rgio CCSPinto', u'D\xe9bora NFB'], [u'Computa\xe7\xe3o M\xf3vel e Ub\xedqua', u'Gradua\xe7\xe3o de Refer\xeancia', u' Educa\xe7\xe3o Ub\xedqua']]
#
pedacos = [listaBase[i:i+4] for i in range(0, len(listaBase), 4)] # pedacos = chunks
#
# lists to collect indexed nodes: is it really useful???
# let's think about it when optimizing code...
dataPub_nodes = []
titulo_nodes = []
autores_nodes = []
keyword_nodes = []
#
#
for i in range(0, len(pedacos)):
# fill dataPub_nodes and titulo_nodes with content.
#dataPub_nodes.append(dataPub_index.get_or_create("data", pedacos[i][0], {"data":pedacos[i][0]})) # Publication date nodes...
dataPub_nodes.append(dataPub_index.get_or_create("data", str(pedacos[i][0]).strip('[]'), {"data":str(pedacos[i][0]).strip('[]')}))
# ------------------------------- Exception raised here... --------------------------------
# The debugged program raised the exception unhandled TypeError
#"(1649 {"titulo":["RECONHECIMENTO E AGRUPAMENTO DE OBJETOS DE APRENDIZAGEM SEMELHANTES"]})"
#File: /usr/lib/python2.7/site-packages/py2neo-1.5.1-py2.7.egg/py2neo/neo4j.py, Line: 472
# ------------------------------ What happened??? ----------------------------------------
titulo_nodes.append(titulo_index.get_or_create("titulo", str(pedacos[i][1]).strip('[]'), {"titulo":str(pedacos[i][1]).strip('[]')})) # title node...
# creates relationship publicacao
publicacao = graph_db.get_or_create_relationships(titulo_nodes[i], "publicado_em", dataPub_nodes[i])
# now processing autores sublist and collecting in autores_nodes
#
for j in range(0, len(pedacos[i][2])):
# fill autores_nodes list
autores_nodes.append(autores_index.get_or_create("autor", pedacos[i][2][j], {"autor":pedacos[i][2][j]}))
# creates autoria relationship...
#
autoria = graph_db.get_or_create_relationships(titulo_nodes[i], "tem_como_autor", autores_nodes[j])
# same logic...
#
for k in range(0, len(pedacos[i][3])):
keyword_nodes.append(keyword_index.get_or_create("keyword", pedacos[i][3][k]))
# cria o relacionamento 'tem_como_keyword'
tem_keyword = graph_db.get_or_create_relationships(titulo_nodes[i], "tem_como_keyword", keyword_nodes[k])
`
The fragment of py2neo which raised the exception
def get_or_create_relationships(self, *abstracts):
""" Fetch or create relationships with the specified criteria depending
on whether or not such relationships exist. Each relationship
descriptor should be a tuple of (start, type, end) or (start, type,
end, data) where start and end are either existing :py:class:`Node`
instances or :py:const:`None` (both nodes cannot be :py:const:`None`).
Uses Cypher `CREATE UNIQUE` clause, raising
:py:class:`NotImplementedError` if server support not available.
.. deprecated:: 1.5
use either :py:func:`WriteBatch.get_or_create_relationship` or
:py:func:`Path.get_or_create` instead.
"""
batch = WriteBatch(self)
for abstract in abstracts:
if 3 <= len(abstract) <= 4:
batch.get_or_create_relationship(*abstract)
else:
raise TypeError(abstract) # this is the 472 line.
try:
return batch.submit()
except cypher.CypherError:
raise NotImplementedError(
"The Neo4j server at <{0}> does not support " \
"Cypher CREATE UNIQUE clauses or the query contains " \
"an unsupported property type".format(self.__uri__)
)
======
Any help?
I have already fixed it, thanks to Nigel Small. I made a mistake as I wrote the line on creating relationships. I typed:
publicacao = graph_db.get_or_create_relationships(titulo_nodes[i], "publicado_em", dataPub_nodes[i])
and it must to be:
publicacao = graph_db.get_or_create_relationships((titulo_nodes[i], "publicado_em", dataPub_nodes[i]))
By the way, there is also another coding error:
keyword_nodes.append(keyword_index.get_or_create("keyword", pedacos[i][3][k]))
must be
keyword_nodes.append(keyword_index.get_or_create("keyword", pedacos[i][3][k], {"keyword":pedacos[i][3][k]}))

Categories

Resources