Information retrieval system (movies query)

In order to rate movies, the user needs to search for them using the home page:

Information retrieval system (movies query)

By Typing some relevant words in the text box, the page will call (through the urls.py corresponding home URL) the home function in the views.py file:

def home(request):
    context={}
    if request.method == 'POST':
        post_data = request.POST
        data = {}
        data = post_data.get('data', None)
        if data:
            return redirect('%s?%s' % (reverse('books_recsys_app.views.home'),
                                urllib.urlencode({'q': data})))
    elif request.method == 'GET':
        get_data = request.GET
        data = get_data.get('q',None)
        titles = cache.get('titles')
        if titles==None:
            print 'load data...'
            texts = []
            mobjs = MovieData.objects.all()
            ndim = mobjs[0].ndim
            matr = np.empty([1,ndim])
            titles_list = []
            cnt=0
            for obj in mobjs[:]:
                texts.append(obj.description)
                newrow = np.array(obj.array)
                #print 'enw:',newrow
                if cnt==0:
                    matr[0]=newrow
                else:
                    matr = np.vstack([matr, newrow])
                titles_list.append(obj.title)
                cnt+=1
            vectorizer = TfidfVectorizer(min_df=1,max_features=ndim) 
            processedtexts = PreprocessTfidf(texts,stoplist,True)
            model = vectorizer.fit(processedtexts)
            cache.set('model',model)
            #cache.set('processedtexts',processedtexts)
            cache.set('data', matr)
            cache.set('titles', titles_list)
        else:
            print 'loaded',str(len(titles))
          
        Umatrix = cache.get('umatrix')
        if Umatrix==None:
            df_umatrix = pd.read_csv(umatrixpath)
            Umatrix = df_umatrix.values[:,1:]
            cache.set('umatrix',Umatrix)
            cf_itembased = CF_itembased(Umatrix)
            cache.set('cf_itembased',cf_itembased)
            cache.set('loglikelihood',LogLikelihood(Umatrix,movieslist))
            
        if not data:
            return render_to_response(
                'books_recsys_app/home.html', RequestContext(request, context))
        
        
        #load all movies vectors/titles
        matr = cache.get('data')
        titles = cache.get('titles')
        model_tfidf = cache.get('model')
        #find movies similar to the query
        queryvec = model_tfidf.transform([data.lower().encode('ascii','ignore')]).toarray()     
        sims= cosine_similarity(queryvec,matr)[0]
        indxs_sims = list(sims.argsort()[::-1])
        titles_query = list(np.array(titles)[indxs_sims][:nmoviesperquery])
        
        context['movies']= zip(titles_query,indxs_sims[:nmoviesperquery])
        context['rates']=[1,2,3,4,5]
        return render_to_response(
            'books_recsys_app/query_results.html', 
              RequestContext(request, context))

The data parameter at the beginning of the function will store the typed query and the function will use it to transform it to a vector tf-idf representation using the model already loaded in memory by the load_data command:

        matr = cache.get('data')
        titles = cache.get('titles')
        model_tfidf = cache.get('model')

Also the matrix (key: matr) and the movies' titles (key: titles) are retrieved from the cache to return the list of movies similar to the query vector (see Chapter 4, Web-mining techniques for further details). Also note that in case the cache is empty, the models (and the other data) are created and loaded in memory directly from the first call of this function. For example, we can type war as a query and the website will return the most similar movies to this query (query_results.html):

Information retrieval system (movies query)

As we can see, we have five movies (at the beginning of the views.py file we can set the number of movies per query parameter: nmoviesperquery) and most of them are related to war. From this page we can rate the movies as we discuss in the following section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.156.251