机器学习:matlab实现协同过滤算法进行电影推荐

本文介绍了使用协同过滤算法进行电影推荐的详细步骤。首先讲解了算法原理,然后通过加载MovieLens100k数据集进行数据预处理,包括数据可视化和均值归一化。接着,实现了代价-梯度函数,并进行了梯度检验以确保正确性。最后,利用fmincg优化函数求解参数,并进行预测,给出用户可能喜欢的电影推荐。整个过程展示了协同过滤在推荐系统中的实际应用。

本练习ex8的压缩包已上传至我的资源已上传至我的资源 ,结合本篇博客食用更佳。

原理

原理在此,其实就是改了一下代价-梯度函数的回归算法。

载入数据

MovieLens 100k Dataset from GroupLens Research. 包含943位用户、1682部电影。

% Load data
load('ex8_movies.mat');

% From the matrix, we can compute statistics like average rating.
fprintf('Average rating for movie 1 (Toy Story): %f / 5\n\n', mean(Y(1, R(1, :))));
%  We can "visualize" the ratings matrix by plotting it with imagesc
imagesc(Y);
ylabel('Movies');
xlabel('Users');

可以可视化看一看,跟背景一个颜色(蓝色)就是无评分,其他的就是不同颜色表示不同评分:
在这里插入图片描述

代价-梯度函数

表达式原理部分已经有了,照着表达式进行向量化即可,记住向量化的真理:按照矩阵运算规则替换代数公式

另外要注意跟最优化函数保持兼容,所以代价-梯度函数的输入输出都是列向量,这点跟神经网络部分很像。

function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...
                                  num_features, lambda)
%COFICOSTFUNC Collaborative filtering cost function
%   [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ...
%   num_features, lambda) returns the cost and gradient for the
%   collaborative filtering problem.
%

% Unfold the U and W matrices from params
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
                num_users, num_features);

            
% You need to return the following values correctly
J = 0;
X_grad = zeros(size(X));
Theta_grad = zeros(size(Theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost function and gradient for collaborative
%               filtering. Concretely, you should first implement the cost
%               function (without regularization) and make sure it is
%               matches our costs. After that, you should implement the 
%               gradient and use the checkCostFunction routine to check
%               that the gradient is correct. Finally, you should implement
%               regularization.
%
% Notes: X - num_movies  x num_features matrix of movie features
%        Theta - num_users  x num_features matrix of user features
%        Y - num_movies x num_users matrix of user ratings of movies
%        R - num_movies x num_users matrix, where R(i, j) = 1 if the 
%            i-th movie was rated by the j-th user
%
% You should set the following variables correctly:
%
%        X_grad - num_movies x num_features matrix, containing the 
%                 partial derivatives w.r.t. to each element of X
%        Theta_grad - num_users x num_features matrix, containing the 
%                     partial derivatives w.r.t. to each element of Theta
%

E=(X*Theta'-Y).*R;
J=sum(sum(E.^2))/2+(lambda/2)*sum(sum(Theta.^2))+(lambda/2)*sum(sum(X.^2));

Theta_grad=E'*X+lambda*Theta;
X_grad=E*Theta+lambda*X;


% =============================================================

grad = [X_grad(:); Theta_grad(:)];

end

如果你觉得求梯度比较复杂,担心出错,可以使用梯度检验,该代码由吴恩达提供。

function numgrad = computeNumericalGradient(J, theta)
%COMPUTENUMERICALGRADIENT Computes the gradient using "finite differences"
%and gives us a numerical estimate of the gradient.
%   numgrad = COMPUTENUMERICALGRADIENT(J, theta) computes the numerical
%   gradient of the function J around theta. Calling y = J(theta) should
%   return the function value at theta.

% Notes: The following code implements numerical gradient checking, and 
%        returns the numerical gradient.It sets numgrad(i) to (a numerical 
%        approximation of) the partial derivative of J with respect to the 
%        i-th input argument, evaluated at theta. (i.e., numgrad(i) should 
%        be the (approximately) the partial derivative of J with respect 
%        to theta(i).)
%                

numgrad = zeros(size(theta));
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
    % Set perturbation vector
    perturb(p) = e;
    loss1 = J(theta - perturb);
    loss2 = J(theta + perturb);
    % Compute Numerical Gradient
    numgrad(p) = (loss2 - loss1) / (2*e);
    perturb(p) = 0;
end

end
function checkCostFunction(lambda)
%CHECKCOSTFUNCTION Creates a collaborative filering problem 
%to check your cost function and gradients
%   CHECKCOSTFUNCTION(lambda) Creates a collaborative filering problem 
%   to check your cost function and gradients, it will output the 
%   analytical gradients produced by your code and the numerical gradients 
%   (computed using computeNumericalGradient). These two gradient 
%   computations should result in very similar values.

% Set lambda
if ~exist('lambda', 'var') || isempty(lambda)
    lambda = 0;
end

%% Create small problem
X_t = rand(4, 3);
Theta_t = rand(5, 3);

% Zap out most entries
Y = X_t * Theta_t';
Y(rand(size(Y)) > 0.5) = 0;
R = zeros(size(Y));
R(Y ~= 0) = 1;

%% Run Gradient Checking
X = randn(size(X_t));
Theta = randn(size(Theta_t));
num_users = size(Y, 2);
num_movies = size(Y, 1);
num_features = size(Theta_t, 2);

numgrad = computeNumericalGradient( ...
                @(t) cofiCostFunc(t, Y, R, num_users, num_movies, ...
                                num_features, lambda), [X(:); Theta(:)]);

[cost, grad] = cofiCostFunc([X(:); Theta(:)],  Y, R, num_users, ...
                          num_movies, num_features, lambda);

disp([numgrad grad]);
fprintf(['The above two columns you get should be very similar.\n' ...
         '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);

diff = norm(numgrad-grad)/norm(numgrad+grad);
fprintf(['If your cost function implementation is correct, then \n' ...
         'the relative difference will be small (less than 1e-9). \n' ...
         '\nRelative Difference: %g\n'], diff);

end

求解

求解还是那一套,调用大佬写的函数就完事了,记得均值归一化。

function [Ynorm, Ymean] = normalizeRatings(Y, R)
%NORMALIZERATINGS Preprocess data by subtracting mean rating for every 
%movie (every row)
%   [Ynorm, Ymean] = NORMALIZERATINGS(Y, R) normalized Y so that each movie
%   has a rating of 0 on average, and returns the mean rating in Ymean.
%

[m, n] = size(Y);
Ymean = zeros(m, 1);
Ynorm = zeros(size(Y));
for i = 1:m
    idx = find(R(i, :) == 1);
    Ymean(i) = mean(Y(i, idx));
    Ynorm(i, idx) = Y(i, idx) - Ymean(i);
end

end

%  Load data
load('ex8_movies.mat');

%  Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by 943 users
%  R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a rating to movie i
%  Add our own ratings to the data matrix
Y = [my_ratings Y];
R = [(my_ratings ~= 0) R];

%  Normalize Ratings
[Ynorm, Ymean] = normalizeRatings(Y, R);

%  Useful Values
num_users = size(Y, 2);
num_movies = size(Y, 1);
num_features = 10;

% Set Initial Parameters (Theta, X)
X = randn(num_movies, num_features);
Theta = randn(num_users, num_features);
initial_parameters = [X(:); Theta(:)];

% Set options for fmincg
options = optimset('GradObj','on','MaxIter',100);

% Set Regularization
lambda = 10;
theta = fmincg(@(t)(cofiCostFunc(t, Ynorm, R, num_users, num_movies, num_features,lambda)), initial_parameters, options);

% Unfold the returned theta back into U and W
X = reshape(theta(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(theta(num_movies*num_features+1:end), num_users, num_features);

预测

求出了 Θ \Theta Θ X X X就相当于求得了用户的喜好与电影的特征值,根据这些参数我们就可以为用户推荐电影了:

p = X * Theta';
my_predictions = p(:,1) + Ymean;

movieList = loadMovieList();

[r, ix] = sort(my_predictions,'descend');
for i=1:10
    j = ix(i);
    if i == 1
        fprintf('\nTop recommendations for you:\n');
    end
    fprintf('Predicting rating %.1f for movie %s\n', my_predictions(j), movieList{j});
end

在这里插入图片描述

Python基于协同过滤算法电影推荐视频网站设计 开发软件:Pycharm 开发环境: Python3.6 数据库:mysql5.6 本系统包含电影前端展示界面、电影评分板块、推荐算法实现以及后端数据库的设 计.其中实现推荐算法是整个电影推荐系统的核心.系统采用由grouplens项目组从美国著名 电影网站movielens整理的ml-latest-small数据集,该数据集包含了671个用户对9000多部电 影的10万条评分数据.首先将该数据集包含的全部文件经过筛选重组之后存储到建好的数 据库中,并将数据集按一定比例划分为训练集和测试集,对训练集进行算法分析生成Top-N 个性化电影推荐列表,然后在测试集上对算法进行评测,至少包括准确率和召回率两种评 测指标. 协同过滤算法推荐领域最出名也是应用最广泛的推荐算法.所以系统拟采用两种协 同过滤算法给出两种不同的推荐结果,一种是基于用户的协同过滤算法,另一种是基于物 品的协同过滤算法,用户可以根据两种推荐结果更加合理的选择合适的电影.系统采用了 改进之后的ItemCF-IUF和UserCF-IIF算法,对计算用户相似度和物品相似度的计算都做出 了改进.最后通过计算两种算法的准确率(Precision)、召回率(Recall)和流行度从而对系 统进行评测、并比较了两种算法各自的优势和劣势.实验证明,改进后的算法比原始的协 同过滤算法推荐效果要好,准确率更高. 整个系统涉及到的编程语言包含Python、Html5、JQuery、CSS3以及MySQL数据库编 程.用到的框架是Flask重量级web框架,通过该框架连接系统的前、后端.用户首先需要 填写用户名、密码以及邮箱注册系统,然后才能登陆推荐系统.进入首页后会看到8个电影 分类,包括恐怖片、动作片、剧情片等.用户需要给自己看过的电影进行评分,评分起止 为0.5-5.0分,共10个分段.每评价一部电影就要点击一下提交按钮,将所评分的电影的 imdbId号以及对应的评分存入数据库中.用户点击“推荐结果”按钮,系统就调用推荐算法 遍历数据库所存数据,得出推荐列表之后将结果反馈给浏览器,同时调取数据库所存电影 海报图片进行展示.用户点击自己登陆的昵称,会跳转页面显示自己已经评价过的电影. -------- <项目介绍> 该资源内项目源码是个人的毕设,代码都测试ok,都是运行成功后才上传资源,答辩评审平均分达到96分,放心下载使用! 1、该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的,请放心下载使用! 2、本项目适合计算机相关专业(如计科、人工智能、通信工程、自动化、电子信息等)的在校学生、老师或者企业员工下载学习,也适合小白学习进阶,当然也可作为毕设项目、课程设计、作业、项目初期立项演示等。 3、如果基础还行,也可在此代码基础上进行修改,以实现其他功能,也可用于毕设、课设、作业等。 下载后请首先打开README.md文件(如有),仅供学习参考, 切勿用于商业用途。 --------
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ShadyPi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值