The data collected using the spiders needs to be stored in a database. In Django, the database tables are called models and defined in the models.py
file (within the pages
folder). The content of this file is as follows:
from django.db import models from django.conf import settings from django.utils.translation import ugettext_lazy as _ class SearchTerm(models.Model): term = models.CharField(_('search'), max_length=255) num_reviews = models.IntegerField(null=True,default=0) #display term on admin panel def __unicode__(self): return self.term class Page(models.Model): searchterm = models.ForeignKey(SearchTerm, related_name='pages',null=True,blank=True) url = models.URLField(_('url'), default='', blank=True) title = models.CharField(_('name'), max_length=255) depth = models.IntegerField(null=True,default=-1) html = models.TextField(_('html'),blank=True, default='') review = models.BooleanField(default=False) old_rank = models.FloatField(null=True,default=0) new_rank = models.FloatField(null=True,default=1) content = models.TextField(_('content'),blank=True, default='') sentiment = models.IntegerField(null=True,default=100) class Link(models.Model): searchterm = models.ForeignKey(SearchTerm, related_name='links',null=True,blank=True) from_id = models.IntegerField(null=True) to_id = models.IntegerField(null=True)
Each movie title typed on the home page of the application is stored in the SearchTerm
model, while the data of each web page is collected in an object of the Page
model. Apart from the content field (HTML, title, URL, content), the sentiment of the review and the depth in graph network are recorded (a Boolean also indicates if the web page is a movie review page or simply a linked page). The Link
model stores all the graph links between pages, which are then used by the PageRank algorithm to calculate the relevance of the reviews web pages. Note that the Page
model and the Link
model are both linked to the related SearchTerm
through a foreign key. As usual, to write these models as database tables, we type the following commands:
python manage.py makemigrations python manage.py migrate
To populate these Django models, we need to make Scrapy interact with Django, and this is the subject of the following section.
3.15.231.194