Inserting into scrapy items - python

So, I think I'm having a hard time wrapping my head around python Dicts. I have a scraper , where I successfully managed to make a dictionary parsing through webpages; now its time to insert into items I have defined as scrapy Field
class ApplicationItem(Item):
Case_reference = Field()
Application_name = Field()
Contact_name = Field()
Contact_telephone = Field()
where I create a python Dictionary like this:
{u" Applicant's name: ": u' KIMARO COMPANY INC ',
u' Case reference: ': u' ARB/15/00696 ',
u' Contact name: ': u' ',
u' Contact telephone: ': u' 07957 140179 ' }
my question is how would I make sure that I am inserting the correct value from the dictionary to scrapy item.
I have this so far:
for res, rec in application.items():
item['Case_reference'] = application.get(result(res))
item['Application_name'] = application.get(result(res))
item['Contact_name'] = application.get(result(res))
item['Contact_telephone'] = application.get(result(res))
which I don not think would do what I expect! I am actually getting python typeError on this.
Any Idea?
item is an actual field that I am trying to insert. So it is an instance of ApplicationItem
item['Case_reference'] = application.get()
is the way I try to insert from a previously created dict into each field.

Since item is not a dictionary, you cannot call item['key'] = value.
Instead, use:
setattr(obj, attribute, value)
Try this code:
class ApplicationItem(Item):
ATTRIBUTE_MAPPINGS = {
" Applicant's name: " : "Application_name",
" Case refference " : "Case_reference",
" Contact name: " : "Contact_name",
"' Contact telephone: " : "Contact_telephone",
}
Case_reference = Field()
Application_name = Field()
Contact_name = Field()
Contact_telephone = Field()
And your loop would be:
for key, value in application.iteritems():
setattr(item, ApplicationItem.ATTRIBUTE_MAPPINGS[key], application.get(result(key)))

Related

Split and save text string on scrapy

I need split a substring from a string, exactly this source text:
Article published on: Tutorial
I want delete "Article published on:" And leave only
Tutorial
, so i can save this
i try with:
category = items[1]
category.split('Article published on:','')
and with
for p in articles:
bodytext = p.xpath('.//text()').extract()
joined_text = ''
# loop in categories
for each_text in text:
stripped_text = each_text.strip()
if stripped_text:
# all the categories together
joined_text += ' ' + stripped_text
joined_text = joined_text.split('Article published on:','')
items.append(joined_text)
if not is_phrase:
title = items[0]
category = items[1]
print('title = ', title)
print('category = ', category)
and this don't works, what im missing?
error with this code:
TypeError: 'str' object cannot be interpreted as an integer
You probably just forgot to assign the result:
category = category.replace('Article published on:', '')
Also it seems that you meant to use replace instead of split. The latter also works though:
category = category.split(':')[1]

How would I store 3 values in one element of a list? (PYTHON)

for example I have a title, author and date, and I want to store this in only list[0] how would I do it?
I suggest 2 solutions (but there are many more) :
If your variables are strings, you can concatenate them : concatenated = title+author+data and then incorporate them into your list : my_list[0] = concatenated.
Or, more general solution, create a list containing your 3 variables : my_3_variables = [title, author, date] and then put this list at the first position of your other list : my_list[0] = my_3_variables.
You can store it in a tuple and then assign it to list:
mylist = []
mylist.append(('title', 'author', 'date'))
print(mylist[0])
Output:
('title', 'author', 'date')
Another solution (a bit far fetched) would be to create a class with the aforementioned attributes and store each instance of this class in a list:
class Book:
def __init__(self, title, author, date):
self.title = title
self.author = author
self.date = date
def __repr__(self):
return "\n".join(["Title: " + self.title, "Author: " + self.author, "Date: " + self.date])
books = []
book1 = Book("title1", "author1", "date1")
book2 = Book("title2", "author2", "date2")
book3 = Book("title3", "author3", "date3")
books.append(book1)
books.append(book2)
books.append(book3)
print(books[0])
This will print:
Title: title1
Author: author1
Date: date1

Practice to call a function inside itself

I have parent/child relation with tables. Looks like this.
class Person(models.Model):
Name = models.CharField()
class ParentChild(models.Model):
child = models.ForeignKey('Person', related_name='child')
parent = models.ForeignKey('Person', related_name='parent')
validfrom = models.DateTimeField(blank=True, null=True)
validto = models.DateTimeField(blank=True, null=True)
I am trying to query a whole tree and create a json to send to template.
So for each person I'm thinking of using a function to query the children, and for each children use same function to query if that child has any children.
So this is my function
def getChildren(parentID):
try:
children = Person.objects.filter(parent=parentID)
addJson = 'children: ['
for a in children:
addJson = addJson + '{text: { name: "Child '+str(a.id)+'" }},'
addJson = getChildren(str(a.id))
return addJson
except:
return addJson
This only gets me one child then nothing more. So I'm guessing it's not possible to invoke itself, or maybe a function have to finish before being called again.
I'm pretty stuck right now. Ideas are much welcome!
I solved the issue. I wasn't taking notice about that the query were for a PK in wrong table.
def getChildren(parentID, jsonData):
try:
qChildren = ParentChild.objects.filter(parent=parentID).values('child')
addJson = jsonData + 'children: ['
for a in qChildren:
addJson = addJson + '{text: { name: "Child ' + str(a['child']) + '" }},'
addJson = getChildren(str(a['child']), addJson)
addJson = addJson + ']'
return addJson
except:
return None

Python sort a dictionary that has objects in it's values by alphabetical order of one of it's attributes

so I'm making an agenda.
The attribute of agenda that manages the dictionary is : self.ContactList = {}
Inside this I have the telephone number as key for a contact (which is a class).
The Contact class has an attribute called telephone and other ones including the contact name.
I want a function that lists the contacts in the agenda, however I wanna list them by alphabetical order of the Contacts it contains.
Right now I'm using this to print the contacts (already have overriden the ____str____ of the contact class to allow this):
def listcontacts(self, agenda):
print("Contact List\n")
for tel, contact in agenda.ContactList.items():
print(contact,"\n"*2)
How to sort self.ContactList by the contact's attribute "name"?
EDIT: The contact class is as follows
class Contact:
def __init__(self, name, adress, zipcode, telephone):
self.name = name
self.adress = adress
self.zipcode = zipcode
self.telephone = telephone
def __str__(self):
return ( "Name: " + self.name + "\nAdress: " + self.adress + "\nZipCode: " + self.zipcode
+ "\nTelephone: " + self.telephone)
If you want to sort a list, use the sorted() function. If you want the sort to use an interesting ordering criterion, use the key= keyword:
for tel, contact in sorted(agenda.ContactList.items(), key=lambda x: x[1].name):

Find a way to add a string to a tuple

y="Peter Email: peter#rp.com Phone: 91291212"
z="Alan Email: alan#rp.com Phone: 98884444"
w="John Email: john#rp.com Phone: 93335555"
add_book=str(y) ,"" + str(z) ,"" + str(w)
**I am trying to add a contact into my address book but I am not sure how to add the string "details" into the add_book. I also found that I cannot use append because its a tuple.
details = raw_input("Enter name in the following format: name Email: Phone:")
print "New contact added"
print details
if details in add_book:
o=add_book+details
print "contact found"
print details
print add_book
address_book = {}
address_book['Alan'] = ['alan#rp.com, 91234567']#this is what I was supposed to do:
#but when I print it out, the output I get is:
{'Alan': ['alan#rp.com, 91234567']} #but I want to remove the '' and {}
I am still an amateur in programming with python so I really need all the help I can get, thanks:)!!
A simple fix would be to use a list instead of a tuple. You can do this by changing your initialization of add_book from:
add_book=str(y) ,"" + str(z) ,"" + str(w)
to:
add_book = [y,z,w]
#No need to call str() every time because your data are already strings
However, wouldn't it make more sense to organize your data as a list of dictionaries? For example:
contacts = ["Peter", "Alan", "John"]
addr_book = [len(contacts)]
for i in range(len(contacts)):
contact = contacts[i]
email= raw_input(contact+"'s email: ")
phone= raw_input(contact+"'s phone: ")
addr_book[i] = {'name':contact, 'email':email, 'phone':phone}
FURTHERMORE:
If I understood your question correctly, you have specific requirements as to how the output of your program should look. If you use the above data format, you can create whatever output you like. for example, this code
def printContact(contact):
print contact['name']+': ['+contact[email]+','+contact[phone]+']'
will output something like:
Alan: [alan#email.com,555-555-5555]
Of course you can change it however you like.
firstly [] is a list. a tuple is (,);
so what you want is
address_book['Alan'] = ('alan#rp.com', '91234567')
But this seems quite odd. What i would do is create a class
class Contact(object):
name = "Contact Name"
email = "Contact Email"
ph_number = "00000000"
def __str__(self):
return "%S: %s, %s" % (self.name, self.email, self.ph_number)
then
address_book = []
contact_alan = Contact()
contact_alan.name = "Alan"
contact_alan.email = "alan#rp.com"
contact_alan.ph_number = "91234567"
print contact
(not next to a machine with python so it might be slightly wrong. Will test it when i can get to one.)
EDIT:- as Paul pointed out in his comment:
class Contact(object):
def __init__(self, name, email, ph_number):
self.name = name
self.email = email
self.ph_number = ph_number
contact_alan = Contact(name="Alan", email = "alan#rp.com", ph_number="91234567")

Categories

Resources