Machine Learning (9) - Recommender Engine: Content-Based Filtering & Hybrid


This is part of the Machine Learning series.

Step-by-Step Demo

  1. Get Data
  2. Clean Data
  3. Build and Score Content-Based Filtering Model
  4. Publish Content-Based Filtering Model as a Web Service
  5. Make it Hybrid
    1. User Features
    2. Item Features
    3. Build and Score the Hybrid Recommender Model
  6. Publish the Hybrid Recommender as a Web Service

Here is the accompanying GitHub repository.

Back to top

1. Get Data

We start off with reading data again from the Azure SQL Database containing the AdventureWorks Warehouse. Running the script matchbox.sql in SQL Server Management Studio displays a dataset of five columns. The most important column is the last one: FreqBuy. Technically speaking, we want to have a ratings column, but since we do not have explicit ratings data, we infer the ratings implicitly from data collected. Here we assume that the more often you buy one item, the higher the rating a customer would give. Obviously, this assumption is not flawless since it may very well be possible that one buys an item multiple times since it wears off very quickly.

Create a new experiment in Machine Learning Studio called Content-Based Filtering. Expand Data Input and Output in the catalogue pane (in the left hand side) and drag the module Reader into the canvas. Configure it in the properties pane (on the right hand side) with the credentials of your Azure SQL Database containing the AdventureWorks database and paste in the query matchbox.sql:

Run the experiment. After it experiment finishes running, click on the circle at the Reader module in the canvas and then on Visualize to see the imported dataset of five columns:

Back to top

2. Clean Data

Expand Data Transformation and then Manipulation in the catalogue pane to drag the module Metadata Editor into the canvas. Drag an arrow from the Reader module into Metadata Editor as shown below, and then click on Launch column selector in the properties pane:

We want to force the last column FreqBuy to be of integer and rename it. Thus, in the column selector we only choose the column name FreqBuy and click on the check mark:

In the properties pane, set the Data Type to be Integer and change the column name to Rating as shown below:

Now drag the module Project Columns into the canvas and an arrow from Metadata Editor into Project Columns:

We only want three columns, since the MatchBox recommender that we will be using for training a recommender only takes in a dataset of triples: (user, item, rating). Translating it into our case, we need a dataset of the following three columns: CustomerKey, Model and Rating. Thus click on Launch column selector in the properties pane to select the aforementioned three columns:

Back to top

3. Build and Score Content-Based Filtering Model

Let's move on to building and then testing a recommendation model.
Expand Data Transformation and then Sample and Split in the catalogue pane to drag the module Split into the canvas. Drag an arrow from Project Columns into Split as shown below. Change the Splitting Mode in the properties pane to Recommender Split:

We can now train a model. Expand Machine Learning and then Train to insert Train Matchbox Recommender into the canvas. Drag an arrow from the first output of Split into the first input of Train Matchbox Recommender as follows:

After training the matchbox recommender, let's apply it on the remaining dataset kept aside when splitting. Expand Score under Machine Learning in the catalogue to drag the module Score Matchbox Recommender into the canvas. The first input of Score Matchbox Recommender is the output of Train Matchbox Recommender, while the second input is the second output of Split:

Run the experiment!

Back to top

4. Publish Content-Based Filtering Model as a Web Service

The recommender is trained and tested. Now on to creating a web service out of it. Start off with saving the trained model by clicking on the circle of Train Matchbox Recommender and then on Save as Trained Model:

Save the trained model as "Content-Based Filtering" or a name of your choice:

Back to the experiment, you can save the experiment as a new experiment, e.g. "Content-Based Filtering - Web Service". Select the two modules Split and Train Matchbox Recommender...

...and delete the two:

Drag the module Content-Based Filtering listed in the catalogue under Trained Models into the canvas and connect it to the first input of Score Matchbox Recommender. Similarly, connect Project Columns to the second input of Score Matchbox Recommender:

Click on Project Columns to launch the column selector in the properties pane:

The input of the web service should only require the customer key and not the model the customer has bought and rated. Hence, delete Model and Rating in the column selector:

Click on the module Score Matchbox Recommender and change the Recommender item selection from From Rated Itmes (for model evaluation) to From All Items in the properties pane:

Now insert the modules Web Service Input and Web Service Output in the experiment. These modules can be found under the category Web Service in the catalogue. Connect the Web Service Input module to the second input of Score Matchbox Recommender, and the Web Service Output to the one output of the Score... module. Then run the experiment.

Once finished running, click on the button Deploy Web Service in the bottom bar:

And we are redirected to the usual web service page of the newly deployed web service:

Back to top

5. Make it Hybrid

We now want to extend the content-/rating-based filtering approach to a hybrid recommender. This can be done by integrating user as well as item features - the remaining two inputs of the module Train Matchbox Recommender. User features encompass more information on the customers, such as demographic information, while item features contain information on the models, e.g. categories.

Recall that the Reader module imports a dataset of five columns; in other words, it contains demographic information on our customers but no further item features. What we will do in this section is on the one hand extract the user information and on the other hand create a "dummy" item feature set that will not change the outcome of the model, at all.

Let us first save the current experiment as another experiment:

Save it under the name Hybrid Recommender:

And then run the experiment so that the column names of the imported dataset are in the cache. (It will make things easier later on when projecting certain columns.)

As mentioned earlier, the module Train Matchbox Recommender takes in three inputs, of which the first one is mandatory whereas the other two (2 and 3) are optional. The first input requires a triple-dataset, i.e. of the form (user, item, rating); the second input takes user features whereas the third one takes item features.

Back to top

5.a User Features

For sake of simplicity, we can delete the two modules Train Matchbox Recommender and Score Matchbox Recommender for now.
To obtain a set of user features, all we need to do is select the three columns related to the customer. Hence, take the module Project Columns (under Data Transformation and then under Manipulation) and connect it to Metadata Editor. Click on Launch column selector:

Select the three columns CustomerKey, IncomeGroup and Region and click on the check mark:

Insert the module Remove Duplicate Rows (also to be found under Data Transformation-->Manipulation) and connect it to Project Columns. The reason is that some customers bought multiiple items resulting in duplicate rows when only projecting the 3 user-relevant columns in the step before. Similar to the Project Columns module, click on Launch column selector:

...and select the column name customerkey. This is the column indicating if a row is a duplicate or not.

Back to top

5.b Item Features

Now on to the item features which in this case will just be dummy data. We will create an item feature set of two columns: the model column and a column of only 1's, i.e.

  • Sport-100, 1
  • Water Bottle, 1
  • Road Tire Tube, 1
  • Patch Kit, 1

Since the Properties column contains the same value across all rows, it doesn't give any information gain and therefore will not make a difference on the recommender model.

There are two options: One is to use the model Execute R Script with the script CreateItemFeatures.R followed by Metadata Editor to rename the columns and skip to the Remove Duplicate Rows.

Another option is just using the modules provided in AzureML to add the column of 1's for all rows. We start off with the column Apply Math Operation found under Statistical Functions in the catalogue. You connect it to Metadata Editor. In the properties pane you configure the following settings:

  • Comparison function: GreaterThan
  • Value to compare type: Constant
  • Constant value to compare: 0
  • Selected columns: Column names: Rating
  • Output mode: Inplace

Thus, click on Launch column selector...

...and specify column names and select Rating:

The properties of the module Apply Math Operation look as follows:

Insert the module Metadata Editor (under Data Transformation and then Manipulation) and connect it to Apply Math Operation. Select the column Rating (by clicking on Launch column selector), change the value under Categorical to Make categorical and set the new column name to Properties as shown below in the properties pane:

Use the module Indicator Values (also under Data Transformation and then Manipulation) to transform the column of TRUEs to 1's. Connect it to the Metadata Editor and select the column Properties in properties (i.e. launch the column selector). Run the experiment.

Drag Project Columns into the canvas, connect it to Indicator Values and select the columns Model and Properties-1 (newly created by Indicator Values) by launching the column selector:


Insert the module Remove Duplicate Rows (also under Data Transformation-->Manipulation), connect it to Project Columns and specify the column to Model, since obviously many items have been bought by multiple customers.

Back to top

5.c Build and Score the Hybrid Recommender Model

Train the matchbox recommender using all three inputs:

  1. User-item-rating dataset, i.e. CustomerKey, Model and Rating
  2. User features, i.e. CustomerKey, IncomeGroup and Region
  3. Item features, i.e. Model and Properties (just a column of 1's)

Connect the inserted module Train Matchbox Recommender accordingly:

After training the model, it's time to apply it to test data. The inputs of the scoring module is as follows:

  1. Trained matchbox recommender
  2. Test dataset in the form of (user, item, rating), i.e. (CustomerKey, Model, Rating)
  3. User features just like when training
  4. Item features just like when training

Connect Score Matchbox Recommender as displayed below and run the experiment:

The experiment finished running:

Back to top

6. Publish the Hybrid Recommender as a Web Service

Save the trained model by clicking on the circle at Train Matchbox Recommender and then on Save as Trained Model:

Save it as, say, Hybrid Recommender:

Optionally you can save the experiment under a different name, eg. Hybrid Recommender - Web Service.
Delete the modules Split and Train Matchbox Recommender and instead insert the previously trained model Hybrid Recommender found under Trained Models in the catalogue. Connect the Hybrid Recommender module to the first input of Score Matchbox Recommender, and Project Columns of the mandatory dataset to the second input of the scoring module:

Click on Project Columns to only select the column CustomerKey since the input of our to-be-deployed web service should only require the customer key and no further information on bought items and ratings. Click on Launch column selector in the properties to remove the columns Model and Rating:

After that, click on the Score Matchbox Recommender module and change the Recommended item selection to From All Items in the properties pane (just like in 4. Publish Content-Based Filtering Model as a Web Service.

Insert the Input and Output modules for our web service and connect them as follows:

  • Web Service Input to the second input of Score Matchbox Recommender
  • Web Service Ouptut to the one output of Score Matchbox Recommender

Then run the experiment!

And once finished running publish it as a web service by clicking on the button Deploy Web Service in the bottom bar:

You have published your hybrid recommender model as a web service and now can integrate it in any app or dashboard etc.

Back to top

Further Resources

Free online course on the Microsoft Virtual Academy: Building Recommendation Systems in Microsoft Azure

4 | Content-Based Filtering & HybridOlivia Klose

Back to top

comments powered by Disqus