In order to rate movies, the user needs to search for them using the home page:
By Typing some relevant words in the text box, the page will call (through the urls.py
corresponding home URL) the home
function in the views.py
file:
def home(request): context={} if request.method == 'POST': post_data = request.POST data = {} data = post_data.get('data', None) if data: return redirect('%s?%s' % (reverse('books_recsys_app.views.home'), urllib.urlencode({'q': data}))) elif request.method == 'GET': get_data = request.GET data = get_data.get('q',None) titles = cache.get('titles') if titles==None: print 'load data...' texts = [] mobjs = MovieData.objects.all() ndim = mobjs[0].ndim matr = np.empty([1,ndim]) titles_list = [] cnt=0 for obj in mobjs[:]: texts.append(obj.description) newrow = np.array(obj.array) #print 'enw:',newrow if cnt==0: matr[0]=newrow else: matr = np.vstack([matr, newrow]) titles_list.append(obj.title) cnt+=1 vectorizer = TfidfVectorizer(min_df=1,max_features=ndim) processedtexts = PreprocessTfidf(texts,stoplist,True) model = vectorizer.fit(processedtexts) cache.set('model',model) #cache.set('processedtexts',processedtexts) cache.set('data', matr) cache.set('titles', titles_list) else: print 'loaded',str(len(titles)) Umatrix = cache.get('umatrix') if Umatrix==None: df_umatrix = pd.read_csv(umatrixpath) Umatrix = df_umatrix.values[:,1:] cache.set('umatrix',Umatrix) cf_itembased = CF_itembased(Umatrix) cache.set('cf_itembased',cf_itembased) cache.set('loglikelihood',LogLikelihood(Umatrix,movieslist)) if not data: return render_to_response( 'books_recsys_app/home.html', RequestContext(request, context)) #load all movies vectors/titles matr = cache.get('data') titles = cache.get('titles') model_tfidf = cache.get('model') #find movies similar to the query queryvec = model_tfidf.transform([data.lower().encode('ascii','ignore')]).toarray() sims= cosine_similarity(queryvec,matr)[0] indxs_sims = list(sims.argsort()[::-1]) titles_query = list(np.array(titles)[indxs_sims][:nmoviesperquery]) context['movies']= zip(titles_query,indxs_sims[:nmoviesperquery]) context['rates']=[1,2,3,4,5] return render_to_response( 'books_recsys_app/query_results.html', RequestContext(request, context))
The data
parameter at the beginning of the function will store the typed query and the function will use it to transform it to a vector tf-idf representation using the model already loaded in memory by the load_data
command:
matr = cache.get('data') titles = cache.get('titles') model_tfidf = cache.get('model')
Also the matrix (key: matr
) and the movies' titles (key: titles
) are retrieved from the cache to return the list of movies similar to the query vector (see Chapter 4, Web-mining techniques for further details). Also note that in case the cache is empty, the models (and the other data) are created and loaded in memory directly from the first call of this function. For example, we can type war
as a query and the website will return the most similar movies to this query (query_results.html
):
As we can see, we have five movies (at the beginning of the views.py
file we can set the number of movies per query parameter: nmoviesperquery
) and most of them are related to war. From this page we can rate the movies as we discuss in the following section.
3.133.156.251