Search in:

Connection Science Volume 34, 2022 - Issue 1

Submit an article Journal homepage

Open access

1,858

Views

CrossRef citations to date

Altmetric

Listen

Articles

Memory-augmented meta-learning on meta-path for fast adaptation cold-start recommendation

Tianyuan Lia School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan, People’s Republic of ChinaView further author information

Xin Sub Hunan Provincial Key Laboratory of Network Investigational Technology, Hunan Police Academy, Changsha, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Wei Liua School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan, People’s Republic of ChinaView further author information

Wei Liangc College of Computer Science and Electronic Engineering, Hunan University, Changsha, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Meng-Yen Hsiehd Computer Science and Information Engineering, Providence University, TaiwanCorrespondence[email protected]
View further author information

Zhuhui Chena School of Computer Science and School of Cyberspace Science, Xiangtan University, Xiangtan, People’s Republic of ChinaView further author information

XuChong Liub Hunan Provincial Key Laboratory of Network Investigational Technology, Hunan Police Academy, Changsha, People’s Republic of ChinaView further author information

Hong Zhange School of Economics and Management, Changsha University of Science and Technology, Changsha, People’s Republic of ChinaCorrespondence[email protected]
View further author information

show all

Pages 301-318 | Received 13 Jul 2021, Accepted 15 Oct 2021, Published online: 30 Nov 2021

Cite this article
https://doi.org/10.1080/09540091.2021.1996537
CrossMark

In this article

1. Introduction
2. Related work
3. Proposed approach
4. Experiments
5. Conclusion
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Personalised recommendation is a difficult problem that has received a lot of attention to academia and industry. Because of the sparse user–item interaction, cold-start recommendation has been a particularly difficult problem. Some efforts have been made to solve the cold-start problem by using model-agnostic meta-learning on the level of the model and heterogeneous information networks on the level of data. Moreover, using the memory-augmented meta-optimisation method effectively prevents the meta-learning model from entering the local optimum. As a result, this paper proposed memory-augmented meta-learning on meta-path, a new meta-learning method that addresses the cold-start recommendation on the meta-path furthered. The meta-path builds at the data level to enrich the relevant semantic information of the data. To achieve fast adaptation, semantic-specific memory is utilised to conduct the model with semantic parameter initialisation, and the method is optimised by a meta-optimisation method. We put this method to the test using two widely used recommended data set and three cold-start scenarios. The experimental results demonstrate the efficiency of our proposed method.

KEYWORDS:

Cold-start recommendation
memory-augmented
meta-path
meta-learning

1. Introduction

Recommendation systems (Dacrema et al., Citation2019; Tang et al., Citation2021; Wang et al., Citation2019) have become increasingly important to the industry due to the rapid development of mobile applications. The core goal is to solve the problems of user information overload, which comes with a slew of challenges. Although there are several kinds of recommendation systems, traditional recommendation methods based on matrix factorisation (Jiang et al., Citation2021; Xu et al., Citation2021) or deep learning (D'Angelo et al., Citation2021) have been efficient, cold-start (Kumar et al., Citation2020; Zhang et al., Citation2019) is an unavoidable problem in most recommendation systems. Because of the absence of user–item interaction, the recommendation system frequently fails to recommend appropriately for new users. The issue of the cold-start is broken into two parts: user cold-start and item cold-start, which refer to the situation in which the recommendation system is unable to deal with the presence of new users or items because of the absence of user–item interaction. To deal with the issue of cold-start, an effective solution is to enrich new users and new items with auxiliary data, such as recommendation systems based on user or item content (Li & She, Citation2017; Wei et al., Citation2016). Furthermore, a heterogeneous information network (HIN) (Pham & Do, Citation2020; Shi et al., Citation2016) is used to supplement user–item interaction with complementary heterogeneous information.

Because the cold-start problem is a natural data sparsity concern, increasing the data can address the cold-start strain significantly. On the basis of these works (Dong et al., Citation2020; Lu et al., Citation2020), we combined the data level method with the model level method and proposed memory-augmented meta-learning on meta-path (MAMP) method for cold-start recommendation. The proposed approach first develops the appropriate initial embedding for the user's semantic context and the items, a meta-optimisation-based strategy is advised. In particular, introducing the user's semantics and items into a specific user's task will be significant, and by establishing two layers of fully connected neural networks, utilising previously studied semantics and items, learning new semantic embedding and new item embedding, the last meta-optimisation approach will be used to update parameters. Then, to solve the local optimisation problem, MAMP creates a semantic-specific memory that generates a customised bias item while parameters of model are initialised. The following is the specific procedure: semantic adaptation learns each aspect's unique semantic priors, updates each user's preference for task adaptation learning from diverse semantic priors and, lastly, uses semantic-specific memory to guide the initialisation of semantic priors with individualised parameters.

To conclude, the main contributions of MAMP are as follows:

We introduce HIN to meta-embedding, which is able to learn new semantics embeddings and new items embeddings to solve the cold-start problem more effectively.
We present semantic memory improvement to aid the co-adaptation meta-learner (Lu et al., Citation2020), which significantly improves the co-adaptation meta-learner's performance measure.
Experiments using DBook and MovieLens data sets to demonstrate the performance effectiveness of our meta-learning technique.

2. Related work

2.1. Cold-start recommendation

There may be a different proportion of new users and new items in the recommendation system, and interaction between these users and items is sparse. As a result, personalised recommendation for new users is challenging due to the cold-start problem. Deep learning (Liang, Xie, et al., Citation2020) has achieved great results in a variety of artificial intelligence domains. However, in order to obtain significant generalisation, a large number of examples must be trained. Deep learning becomes ineffective when used in a cold-start recommendation scenario with sparse user–item interactions. Data augmentation at the data level or the provision of auxiliary data (Zhu et al., Citation2019) are the most typical solutions to cold-start recommendations. There are also some methods involving the high-level representation of the data, such as capturing the rich heterogeneous data (Chang et al., Citation2021) of the items and the users, using the data representation of the heterogeneous information network, in addition to considering the basic characteristics of the data. Alternatively, a semantic network can be built using a knowledge graph, in which nodes represent entities and edges to reflect various semantic relationships between items. There are also cross-domain recommendations based on mapping of neighbour user attributes, or recommendations based on mining friend lists on social networks. These techniques rely largely on data and have a number of drawbacks.

2.2. Meta-learning

The purpose of meta-learning is to acquire meta-knowledge, which can be regarded as the most basic general knowledge needed to solve similar learning tasks, as a result of which it would swiftly adapt to new target tasks according to several learning tasks. Due to its characteristic of cross-task learning, meta-learning is also regarded as one of the key technologies to open up general artificial intelligence. According to different research contents, meta-learning can be divided into three types (Hospedales et al., Citation2020). Metrics-based approaches (Hu et al., Citation2018; Snell et al., Citation2017; Vinyals et al., Citation2016) to compare and classify by calculating measurement or approximation, while model-based approach (Munkhdalai & Yu, Citation2017; Santoro et al., Citation2016) wraps internal learning steps in the feed-forward transfer stage of a single model, so as to generalise tasks quickly. Finally, the optimisation-based method focuses on acquiring meta-knowledge to improve the optimisation performance of the model, including the optimisation of the model's initialisation parameters (Finn et al., Citation2017).

2.3. Meta-learning for cold-start recommendation

Wang et al.’s work (Citation2019) demonstrated the effectiveness of meta-learning on the few-shot problem, and the issue of cold-start could be thought as a subset of the few-shot problem. As a result, it is possible to incorporate meta-learning into a cold-start recommendation system. These papers (Chen, Luo, et al., Citation2018; Zhao et al., Citation2019) make some progress in introducing the meta-learning paradigm from the model level of recommendation systems.

Lu et al. (Citation2020) began to address the issue of cold-start on the levels of data and model. It is the first time meta-learning has been used to address cold-start recommendation on HIN, while it can make effective recommendations, it also has flaws. On the one hand, although the link between two objects sequence meta-path (Sun et al., Citation2011), such figure forms can effectively capture the semantic context on the data level, but on the various semantic (i.e. meta-path) to integrate the user history record embed into a single implicit, thus strongly relies on collaborative filtering technology, fine structure is difficult to adjust to individual new users. Some characteristics or items may lose their relevance, causing multidimensional semantic fusion to perform poorly. On the other hand, the author uses an optimisation-based meta-learning technique called model-agnostic meta-learning (MAML) (Finn et al., Citation2017) at the model level. It is frequently recommended for the user as a learning task since it performs well in learning configuration initialisation for new tasks. The central concept is to learn a global parameter that will be used to set the parameters of the personalised recommendation model. Personalisation parameters are updated locally to comprehend a user's preferences, global parameters are updated by reducing training task loss between users and learnt global parameters are then utilised to guide model settings for subsequent users. Although these methods based on the MAML method and its derivatives have a significant capacity to deal with data sparsity, they have a number of flaws, including instability, sluggish convergence and poor generalisation ability. When dealing with users who have different gradient descent directions than the bulk of users, which are included in the data set of training, they are more likely to be influenced by gradient degradation, which can lead to local optimisation. Starting at the model level, to offer a customised bias item while initialising parameters of the model and to gain knowledge to catch the commonality of prospective user preference shared across items, Dong et al. (Citation2020) created a feature-specific memory and a task-specific memory. It effectively solves the problem that the MAML technique and its variants are easy to optimise locally, but without considering the level of auxiliary data, the model would become inefficient when the interaction data is sparse.

At present, most traditional cold-start recommendation systems try to initialise the basic model $f_{θ}$ by learning the global meta-parameter $θ$ by the meta-learner; therefore, the global meta-parameter $θ$ is also called the global prior. Global prior $θ$ is optimised in several tasks and quickly adapts to a target task after one or several gradient stages, given the limited number of instances. Specifically, meta-learning divides tasks into meta-training tasks and meta-testing tasks, for each task of meta-training and meta-testing, it also includes support set and query set, which are mutually exclusive. First, during meta-training, the meta-learner adjusts the global prior $θ$ to suit the specific parameters of the task by using the loss w.r.t support set. In the query set, the loss w.r.t the specific parameters of the task are calculated and propagated backwards to update the global prior $θ$ . Second, during meta-testing, the meta-learner adapts the $θ$ updated in meta-training to the task by one or several gradient steps on the support set, and then applies the adjusted parameters to predict the results of the query set.

2.4. Memory neural networks

In general, after the training parameters have been trained, the neural network will put the samples directly into the trained model for calculation, and then retrieve the results without interacting with the memory. For few-shot learning with sparse data, or models with human–computer interaction, it is difficult to achieve the goal only by connecting calculation between parameters within the model. Weston et al. (Citation2014) first implemented the memory-augmented network using the neural Turing machine model, which consists of a controller and a memory module. The controller writes data to the memory module with a writing head and reads data from the memory module with a reading head. In the memory-augmented network, the feature information is closely associated with the corresponding label in the writing process, and the feature vector is accurately classified in the reading process.

2.5. Memory-augment neural network for recommendation

Neural Turing machine (Graves et al., Citation2014) is the foundation of a significant subsidiary of memory neural networks: extract and update information from memory. These features make NTM a good example of few-shot learning or meta-learning. Based on this idea, Santoro et al. (Citation2016) proposed a Memory-Augmented Neural Network (MANN), which applied NTM to few-shot learning, retained sample feature information by displaying external memory modules and optimised the reading and writing process of NTM by using meta-learning algorithm, and finally realised effective small sample classification and regression. Chen, Xu, et al.’s work (Citation2018) is one of the first batches of work to integrate man into recommendation task, proposed the idea of integrating MANN with collaborative filtering (Lian & Tang, Citation2021) to recommend, with the help of external memory matrix, the model can store and update the user's history, which effectively improves the presentation ability of the model. Dong et al. (Citation2020) further propose two kinds of global shared memory, to deal with the cold-start recommendation.

3. Proposed approach

In this section, a meta-learning method called MAMP based on memory augmented and meta-path is proposed to solve the cold-start recommendation.

3.1. Overview

Which can be seen in Figure , the suggested model of this paper mainly consists of three parts: the first part is semantic enhancement task constructor that extracts meta-paths from profiles. The second part is meta-learning recommendation and semantic co-adaptation models for prediction. The third part is memory-augment optimiser to help recommendation model initialisation parameters.

Figure 1. The training phase of MAMP.

3.2. Semantic-enhanced task constructor

In a cold-start recommendation task, a node's neighbours on the meta-path have different importance. The focus is on substituting the various levels of semantic context of meta-path into tasks. The definition of the task $T_{u}$ involves one user $u$ follows Lu et al.’s work (Citation2020), that is (1) $T_{u} = (S_{u}, Q_{u})$ (1) where $S_{u}$ represents the semantic-enhanced support set, $Q_{u}$ represents the semantic-enhanced query set. The support and query sets to every task $T_{u}$ are mutually exclusive and they comprise items randomly selected from item sets that user $u$ has rated. Similarly, the semantic-enhanced support and query set are defined as follows: (2) $S_{u} = (S_{u}^{R}, S_{u}^{P}), Q_{u} = Q_{u}^{R}, Q_{u}^{P}$ (2) where $S_{u}^{R}$ and $Q_{u}^{R}$ are sets of items that the user $u$ has rated, $S_{u}^{P}$ and $Q_{u}^{P}$ represent a semantic context according to a collection of meta-paths $P$ .

Firstly, $S_{u}^{P}$ is used to encode the multidimensional semantics of predicted rating $y_{u, i}^{Q}$ . Specifically, in the recommendation, it is assumed that user $u$ has not rated item $i$ , but the rated items $u$ have some relationship with unrated item $i$ or the users with some relationship with the user $u$ has relation with item $i$ .

These relationships are defined as multifaceted semantics here. For example, in Figure , the multifaceted semantics can be defined as follows: (3) $P = {UB, UBAB, UBUB}$ (3) where capital letters represent a type of node, such as `U’ for user, `B’ for book and `A’ for author. The multifaceted semantics are a collection of meta-paths, such as `UBAB’ and `UBUB’, that define different semantic contexts induced by meta-paths: books written by the same author or books purchased separately by the same user. Since there are multiple interactions for user $u$ in each task, a specific meta-path $p \in P$ semantic context is built for task $T_{u}$ , as follows: (4) $S_{u}^{p} = \cup_{i \in S_{u}^{R}} C_{u, i}^{p}$ (4) where $C_{u, i}^{p}$ represents the items reachable along the meta-path $p$ starting from the user $u$ and rated item $i$ .

Figure 2. An example meta-path of Book. The blue line represents the meta-path {UBUB}, while the red line represents the meta-path {UBAB}. In addition, the broken yellow lines represent the attributes associated with the node.

Secondly, we formulate personalised recommendation as a task in the context of meta-learning. Given the user set $U^{S}$ and its profile $F_{U}^{S}$ , the item set $I^{S}$ and its profile $F_{I}^{S}$ , and the corresponding rating set $Y_{U, I}^{S}$ , then the task of the recommendation system is to predict the rating $y_{u, i}^{Q}$ of an item $i \in I^{Q}$ for a user $u \in U^{Q}$ then the recommendation system’s goal is to anticipate a user $u$ ’s rating of an item, under the conditions $u \in U^{Q}$ and $i \in I^{Q}$ , where $S$ and $Q$ represent support set and query set, respectively. Under the cold-start recommendation scenario, to construct a support set for rated items, the data include only users with 13–100 items, we take the below equation (5) $S_{u}^{R} = {y_{u, i}^{S} | 13 \leq | Y_{u, I}^{S} | \leq 100}$ (5) as the rated condition to construct, the number of ratings for a user $u$ is indicated by $| Y_{u, I}^{S} |$ .

Using the same method above, these steps can continue to make $Q_{u} = (Q_{u}^{R}, Q_{u}^{P})$ and $S_{u}^{R} \cap Q_{u}^{R} = \emptyset$ .

3.3. Recommendation model and co-adaptation meta-learner

3.3.1. Recommendation model

Based on the initial embedding $e_{u}$ of a certain user $u \in U^{Q}$ and the embedding $e_{i}$ of a certain item $i \in I_{u}^{Q}$ to be rated, we predict $u$ ’s rating for $i$ .

The semantic-enhanced user encoder $g_{ϕ}$ creates a user's embedding based on the embeddings of the p-related items $C_{u}^{p}$ that are reachable through meta-path p rooted from $u$ , as follows: (6) $e_{s} = g_{ϕ} (u, C_{u}) = σ (\frac{1}{| C_{u} |} \sum_{j \in C_{u}} (W e_{j} + b))$ (6) where $C_{u}$ represents the collection of items that are related by a user $u$ through either direct interaction (i.e. rated items) or meta-paths (i.e. induced indirect items). $g_{ϕ}$ is the context aggregation function, with $ϕ = {W \in R^{d \times d_{I}}, b \in R^{d}}$ as the parameter. $R^{d}$ is a d-dimensional dense vector, mainly used to embed multiple features of the users. $R^{d \times d_{I}}$ is the feature embedding matrix. $σ (\cdot)$ represents the activation function which is defined as follows: (7) $y_{i} = {\begin{array}{l} x_{i}, x_{i} \geq 0 \\ \frac{x_{i}}{a_{i}}, x_{i} < 0 \end{array}$ (7) where $a_{i}$ is a fixed parameter in the interval of $(1 + \infty)$ .

Using an embedding $e_{s}$ of user $u$ as well as an embedding $e_{i}$ of item $i$ in the prediction of preferences, to learn better initial embedding to adapt to new users, $e_{s}$ and $e_{i}$ are relearned by building two multi-layers fully connected neural networks, which is defined as Equation (8) (8) $\hat{e_{s}} = z_{χ_{s}} (e_{s}), \hat{e_{i}} = z_{χ_{i}} (e_{i})$ (8) where $z (\cdot)$ represents the fully connected layer, which will be used as the generator of semantic embedding and item embedding; $χ_{s}$ and $χ_{i}$ represent the full-connection layer parameters of learning semantic embedding and item embedding, respectively, which will be optimised by meta-leaning.

The preference prediction function predicts user $u$ ’s rating on item $i$ , which is defined as follows: (9) $\hat{y_{ui}} = h_{ω} (\hat{e_{s}}, \hat{e_{i}}) = MLP (\hat{e_{s}} \oplus \hat{e_{i}})$ (9) where $h_{ω}$ is implemented by a multi-layer perceptron parametrised by $ω$ and $\oplus$ stands for the concatenation of the two real-valued vectors.

Finally, we denote the recommendation model by the below equation (10) $f_{θ} = (g_{ϕ}, h_{ω}, z_{χ})$ (10) where $θ = (ϕ, ω, χ)$ .

3.3.2. Co-adaptation meta-learner

The objective of a co-adaptive meta-learner is to learn global prior $θ$ so that it can swiftly adjust to a new task with only sparse interactions. The co-adaptation meta-learner includes semantic-based adaptation and task-based adaptation procedures, which will be described as follows.

3.3.2.1. Semantic-based adaptation

Using $S_{u}^{P}$ of task $T_{u}$ , the semantic adaptor evaluates the loss based on the semantic context caused by a particular meta-path $p$ , and uses a gradient descent step to get global context prior $ϕ$ w.r.t a specific $p$ . It not only encodes how to use contextual semantics in a heterogeneous information network, but also adapts to the semantic space induced by meta-path $p$ .

In general, given task $T_{u}$ of a particular user $u$ , the support set $S_{u} = (S_{u}^{R}, S_{u}^{P})$ is enhanced by the semantic context $S_{u}^{P}$ . Then given $p \in P$ , the semantic embedding $e_{s}$ in the semantic space of $p$ can be calculated by the below equation (11) $e_{u}^{P} = g_{ϕ} (u, S_{u}^{P})$ (11)

Then the loss of the rated item $S_{u}^{R}$ in task $T_{u}$ is further calculated by the below equation (12) $L_{T_{u}} (ω, e_{u}^{P}, S_{u}^{R}) = \frac{1}{| S_{u}^{R} |} \sum_{i \in S_{u}^{R}} {(y_{u i} - h_{ω} (e_{u}^{p}, e_{i}))}^{2}$ (12) which $h_{ω} (e_{u}^{P}, e_{i})$ represent meta-path $p$ induced semantic space used for $u$ on the item $i$ predicted rating. Finally, to get the semantic prior of various aspects $ϕ_{u}^{p}$ , the loss of task $T_{u}$ in each semantic space can be calculated after gradient descent follows the below equation (13) $ϕ_{u}^{p} = ϕ - γ \frac{\partial L_{T_{u}} (ω, e_{u}^{p}, S_{u}^{R})}{\partial e_{u}^{p}} \frac{\partial e_{u}^{p}}{\partial ϕ}$ (13) where $ϕ_{u}^{p}$ is the adapted model parameter when given a user $u$ and semantic context $p$ , and $γ$ represents the semantic learning rate. The adaptation is achieved by a single-step gradient descent based on the gradient of the supervised loss computed on a support set (defined in Equation (12)) w.r.t the semantic-enhanced user representation $e_{u}^{p}$ while freezing the gradient to $ϕ$ .

3.3.2.2. Task-based adaptation

Based on the $ϕ_{u}^{p}, p \in P$ that we got before, task-based adaptation further adapts the global prior $ω$ to the task $T_{u}$ through multiple gradient descent, the prior parameter $ω$ of the rating prediction function $h_{ω}$ is also adapted by using given a user and semantic context in a similar way.

First, $e_{s}^{P}$ on the support set is updated through Equation (10) (14) $e_{s}^{P ⟨ S ⟩} = g_{ϕ_{u}^{p}} (u, S_{u}^{p})$ (14) next, we convert the global priors $ω$ into the same space which follows the below equation (15) $ω^{P} = ω ⊙ κ (e_{s}^{p ⟨ S ⟩})$ (15) where $⊙$ is the product of elements, $κ (\cdot)$ can be regarded as a transformation function realised by several full-connection layer networks. Then the $ω^{p}$ is adapted to task $T_{u}$ using gradient descent which follows the below equation (16) $ω_{u}^{p} = ω^{p} - δ \frac{\partial L_{T_{u}} (ω^{p}, e_{s}^{P ⟨ S ⟩}, S_{u}^{R})}{\partial ω^{p}}$ (16)

Finally, the main optimisation is achieved by the gradient descent of the sum of the task-specific loss over the users’ query sets.

3.4. Memory-augmented meta-learning

Referring to the single-initialisation problem in Dong et al.’s work (Citation2020), this paper introduced sentiment-specific Memory. That is, by semantic embedding memory $M_{S}$ and profile memory $M_{F}$ , the personality parameter $χ_{s}$ is effectively initialised. The information that profile memory $M_{F}$ associated with profile $F$ to provide the retrieval attention value $a_{s}$ . $a_{s}$ is used to extract key information from $M_{S}$ , where each row of $M_{S}$ holds the corresponding. These matrices of memory will aid with in construction of a customised bias term $b_{s}$ when initialising $χ_{*}$ , that is, $χ_{*} \leftarrow χ_{*} - τ b_{s}$ . $τ$ is a hyper-parameter that controls the degree of personalisation when initialising $χ_{*}$ , which is the set of $χ_{s}$ and $χ_{i}$ .

Specifically, given the semantic profile $f_{s}$ , which is the set of $e_{u}^{p}$ . Semantics are a two-dimensional vector representation, the first dimension of semantic vectors for different users is not necessarily consistent, therefore, consider the profile memory is represented by a three-dimensional vector, that is, $M_{F} \in R^{K \times ℧ \times d_{s}}$ , where $℧$ is the common dimension of the first dimension of different semantic vectors.

Then calculate semantic attention value $a_{s} \in R^{K}$ . Firstly, the semantic profile vector is extended to the same dimensional space as $M_{F}$ , that is, Equation (17) (17) $\hat{f_{s}} = EXPAND (f_{s}, R^{K \times ℧ \times d_{s}})$ (17)

Cosine similarity is used to calculate the degree of correlation between $\hat{f_{s}}$ and $M_{F}$ which follows the below equation (18) $cos(ϕ) = \frac{\sum (\hat{f_{s}} \times M_{F})}{\sqrt{\sum {(\hat{f_{s}})}^{2}} \times \sqrt{\sum {(M_{F})}^{2}}}$ (18)

Finally, the semantic attention value $a_{s}$ can be obtained by the normalisation of SoftMax function.

Semantic embedding memory $M_{S} \in R^{K \times {d_{θ_{s}}}}$ saves any of the rapid gradients which have the same shape as parameters that belong to semantic embedding model, $d_{θ_{s}}$ represents the dimension of parameters that belong to $M_{S}$ . Then get a personalised bias term $b_{s}$ , through the below equation (19) $b_{s} = a_{s}^{T} M_{s}$ (19)

During the initialisation phase, the two memories are randomly initialised and updated during the training process. $M_{F}$ will be updated with the following equation (20) $M_{F} = α \cdot (a_{s} f_{u}^{T}) + (1 - α) M_{F}$ (20) where $α$ is a hyper-parameter that controls how much new profile information is added. Similarly, $M_{S}$ will also be updated with the following equation (21) $M_{S} = β \cdot (a_{s} \nabla_{χ_{*}} (L (y_{\hat{u}, i}, y_{u, i}))) + (1 - β) M_{S}$ (21) where $L (y_{\hat{u}, i}, y_{u, i})$ represents the loss of training task, $α$ and $β$ are hyper-parameters that regulate what more new data are kept.

3.5. Optimisation

Optimise the global prior $θ = ϕ, ω, χ$ across different semantic tasks is the purpose of co-adaptation meta-learner, which will be optimised by the back propagation of the loss of query set for the meta-training task $T_{u}^{train}$ as follows: (22) ${min}_{θ} \sum_{T_{u} \in T^{train}} L_{T_{u}} (ω_{u}, ϕ_{u}, χ_{u}, Q_{u}^{R})$ (22)

There is no direct update the global prior $θ$ with the data of $T_{u} \in T^{test}$ , memory $M_{F}$ and $M_{S}$ will be updated by formulas (20) and (21), semantic prior $ϕ$ and task prior $ω$ will be updated by formulas (13) and (16). In addition, semantic embedding prior and item prior will also be updated through the following equation (23) $χ_{*} \leftarrow χ_{*} - λ \sum_{T_{u} \in T^{train}} \sum_{u \in Q_{u}} \nabla L_{T_{u}} (f_{\hat{χ_{*}}})$ (23) where $λ$ is a hyper-parameter, and $f$ represents the recommendation model.

4. Experiments

In this section, the performance of MAMP is verified by detailed experiments: through recommendation system commonly used three evaluation indicators, the model of this paper compared with the previous models.

4.1. Data sets

4.1.1. Data sets

This paper mainly conducts experiments and evaluations on two widely used standard data sets, DBook and MovieLens-1M, which contain both user information and item information from open-source data sets. MovieLens has about 1 million ratings, with over 3000 movies rated by about 6000 users, including attributes such as gender, age, occupation, zip-code and movies including attributes such as genre, with ratings ranging from 1 to 5. In addition, DBook contains about 650,000 ratings and 20,000 books are rated by about 10,000 users. Users include address attributes, and books contain information such as the year, the author, the publisher, with ratings ranging from 1 to 5. Unlike previous efforts (Lee et al., Citation2019; Lu et al., Citation2020) to add extra information for MovieLens, such as adding information about the director and actors for the movie, here we use a native data set, Table outlines some essential statistics for the two data sets.

Table 1. Essential statistics of the data sets. The underlined edge type is the recommended target item type.

Download CSV Display Table

4.1.2. Data preprocessing

For each data set, we divided users and items into two groups: existing and new, roughly based on user joining time (or first user operation time) and item release time. In particular, for DBook, since there is no time information of users, we randomly select 80% of users as existing users and the other 20% as new users according to Lu et al.’s work (2020). In addition, each data set is divided into meta-training and meta-testing. (1) Meta-training only contains the ratings of existing items by existing users, of which 10% that the verification set choose at random, and the corresponding task is to recommend existing items for existing users, that is, warm scenario. (2) The remaining is spent on meta-testing, which is broken into three parts to correspond to three different cold-start scenario. (CW) The corresponding task is to recommend existing items for new users; (WC) the corresponding task is to recommend new items for existing users; (CC) the corresponding task is to recommend the new item to the new user.

Rated in order to construct the sets of $S_{u}^{R}$ and $Q_{u}^{R}$ , we follow the previous work (Lee et al., Citation2019). Specifically, the users in the task should include the number of rated items between 13 and 100, that is, Equation (5). The items that a user has rated, will be randomly selected 10 items as $Q_{u}^{R}$ , and the rest of the items as $S_{u}^{R}$ . In addition, to construct the sets of $S_{u}^{p}$ and $Q_{u}^{p}$ , we consider any meta-paths $p \in P$ , beginning with user–item and concluding with items up to 2 in length.

4.2. Evaluation metrics

This paper mainly verifies the model performance under three evaluation indexes. The mean squared error (MAE) and root mean squared error (RMSE) and normalised discounted cumulative gain (NDCG) (23) $\begin{aligned} MAE & = \frac{1}{| U^{test} |} \sum_{u \in U^{test}} \frac{1}{| I_{u}^{Q} |} \sum_{i \in I_{u}^{Q}} | y_{u, i} - y_{\hat{u}, i} | \end{aligned}$ (23) (24) $\begin{aligned} RMSE & = \sqrt{\frac{1}{| U^{test} |} \sum_{i \in I_{u}^{Q}} {(y_{u, i} - y_{\hat{u}, i})}^{2}} \end{aligned}$ (24) (25) $\begin{aligned} DCG @ N & = \sum_{n = 1}^{N} \frac{2^{y_{u, n}} - 1}{\log (n + 1)} \end{aligned}$ (25) (26) $\begin{aligned} NDCG @ N & = \frac{1}{| U^{test} |} \sum_{u \in U^{test}} \frac{DCG @ N}{IDCG @ N} \end{aligned}$ (26)

MAE and RMSE are utilised to indicate the error between the predicted and actual values, with lower value indicates better performance of the model. N is the number of predictions for each user in the query set. NDCG@N accounts for the observed predictions sorting performance of the query set, with higher values indicating better performance of the model.

4.3. Comparison

4.3.1. Model comparison

To compare performance with MAMP, we select various representative advanced technologies. Among them, there is the traditional feature-based method, namely FM; and there are also HIN-based methods HEREC, MetaHIN; the rest are cold-start based methods, namely, MeLU, MAMO and MetaHIN.

FM (Rendle et al., Citation2011): a feature-based model that makes use of a variety of auxiliary information. FM can be used to solve the classification problem (to predict the probability of each rating) and can be used to solve the regression problem (to predict the size of each rating).

HEREC (Shi et al., Citation2018): a heterogeneous information network embedding method was proposed, which combined embeddings and matrix decomposition (MF) model to optimise the recommendation effect.

MeLU (Lee et al., Citation2019): a typical approach to applying meta-learning to cold-start recommendations. Rating predictions are obtained by providing user and item embedding links into a fully connected network, and MAML method is used to update the parameters locally and globally.

MAMO (Dong et al., Citation2020): the authors designed two memory matrices, which offer a customised bias item while initialising parameters of the model and assist the model to quickly predict user preferences.

MetaHIN (Lu et al., Citation2020): a method based on the thorough capture of HIN-based semantics allows learner to readily adapt to basic knowledge of multidimensional semantics in a meta-learning, combining meta-learning model at the model level with a heterogeneous information network at the data level.

4.3.2. Comparison results

4.3.2.1. Cold-start scenario

The performance indexes of various models under the three cold-start schemes are listed in detail in Table and Figure provides a more intuitive comparison. Overall, our meta-learning model has achieved relatively ideal performance in all indicators of the two data sets. For example, on the MovieLens-1M data set, our meta-learning model improved 1.65%, 2.48%, and 1.12% on nDCG@5, respectively, compared to the best model. Generally speaking, the performance of traditional methods such as FM is poor, mainly because it is difficult for traditional methods to deal with the high-order graph structure such as meta-path and fail to integrate richer semantic features. In contrast to traditional methods, HIN-based methods such as HEREC performs better because they include meta-paths and incorporate richer semantic features. However, the problem of sparse data interaction exists on various scenarios of cold-start. In the case of insufficient training data, these models cannot further improve their performance. Furthermore, MeLU, MAMO and MetaHIN using meta-learning can effectively deal with the problem of sparse data interaction. MeLU only integrates heterogeneous information into content characteristics, while MetaHIN captures multifaceted semantics from higher-order structures and performs co-adaptation of semantics and tasks, so MetaHIN outperforms MeLU in most performance metrics. In addition, MAMO relies on personalised bias terms to enhance the generalisation ability of meta-learning and alleviate the disadvantage of being unable to capture deeper semantic data to a certain extent. However, these meta-learning methods based on MAML variants are still lower than our model, mainly because the training set is prone to confrontation data, and the problem of gradient degradation often occurs when dealing with users with different gradient descent directions, and the poor robustness leads the model to enter the local optimal state. MAMP not only performs semantic and task co-adaptation, but also effectively improves the robustness of the model by designing a semantically specific memory that provides a personalised bias item when initialising model parameters.

Figure 3. Performance comparison of the six models under four recommendation scenarios.

Table 2. The performance of various representative models and MAMP in various recommendation scenarios.

Download CSV Display Table

4.3.2.2. Warm scenario

In the W–W section of Table , we also examine traditional recommendation scenarios. Statistically, the MAMP model is still ahead of other models in terms of performance. This is mainly due to the sparse samples and interactive data sparsity will still exist in a traditional recommendation scenario. MetaHIN and other meta-learning models need to be updated by combining the gradient results of the loss generated on the inner task of the test set of each task, so each iteration is targeted at a batch of data. MAMP, on the other hand, provides a personalised offset entry through semantic memory, which only updates a set of input parameters, effectively achieving fast adaptation.

4.4. Model analysis

To show the severity of the single-initialisation problem and the advantage of the proposed memory-based technique, we conducted ablation studies. Intuitively, we consider MAMP without memory-based initialisation as a trivial baseline, which is called MAMP-SI. We just present the performance in CC, which is the most difficult scenario, because the various cold-start scenarios showed similar results.

The performances of MetaHIN, MAMP-SI and MAMP in all data sets are readily seen in Figure . First, the performance of MAMP-SI is better than MetaHIN, which is mainly due to the better adaptability of meta-embedding to new tasks; second, due to the single-initialisation problem, the performance of MAMP-SI is worse than MAMP, which also proves the advantages of the proposed memory-based technology.

Figure 4. Performance comparison of the three models under user–item cold-start scenarios.

4.5. Parameter analysis

MAMP has two memories, including profile memory $M_{P}$ and semantic embedding memory $M_{S}$ , whose function is used to generate customised bias terms when initializing local parameters. During the construction of semantic memory, we predefined K user types and built a three-dimensional common semantic embedding. For predefined user types, the effect of appropriate values of K values on model performance is investigated, and for the Semantic Common Embedding, we discuss the impact of the specific value of the second dimension on the model performance. For a clearer display effect, we only show the results in the C–C scenario, as shown in Figures and .

Figure 5. Impact of memory embedding dimensions.

Figure 6. Impact of type K.

4.6. Limitations and future work

Data sparsity is a natural problem in cold-start recommendation, so auxiliary data are critical to solving the cold-start problem. However, in practice, auxiliary information is not always able to be imported successfully, in which case, data enhancement becomes an alternative. In this work, we use the fusion of multiple meta-paths in heterogeneous information network to enhance data, and some results have been obtained. However, we preliminarily find that meta-paths may not be the best way to describe rich semantics, because the construction of meta-path is cumbersome. In future work, we will use meta-graph (Zhao et al., Citation2017) to replace meta-path. Compared with meta-path, which requires continuous structure, meta-graph only requires one starting node and one ending node, and the intermediate structure is not restricted. But how to calculate the similarity of the meta-graph and how to integrate the meta-graph into meta-learning can be an interesting challenge. In addition, data fusion (Liang, Xiao, et al., Citation2021) is also a direction worth considering.

5. Conclusion

In this article, we propose MAMP, a new meta-learning recommendation method, for fast adaptation cold-start recommendation. Specifically, we use the idea of meta-optimisation to learn embedding to better fit the recommendation model and new tasks. In addition, a semantic-specific memory is proposed to assist the co-adaptation meta-learner, which generates a personalised bias term through the history record to greatly improve the performance index of the co-adaptation meta-learner. Experiments on DBook and MovieLens data sets show that our meta-learning method has significant advantages in terms of effectiveness and performance in various scenarios.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The data sets of this paper is available at https://book.douban.com and https://grouplens.org/datasets/movielens/.

Correction Statement

This article has been republished with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by National Natural Science Foundation of China [Grant Number 62072170]; Youth Project of Hunan Natural Science Foundation [Grant Number 2019JJ50673]; The Open Research Fund of Hunan Provincial Key Laboratory of Network Investigational Technology [Grant Number 2020WLZC001]; The key research and development plan of Hunan Province [Grant Number 2022SK2109]; Scientific Research Project of Hunan Education Department [Grant Number 19C1788]; The science and technology innovation leading plan for high and new technology industries of Hunan Province [Grant Number 2020GK2029]; Science and Technology Program of Hunan Province [Grant Number 2017SK1040].

References

Chang, F., Ge, L., Li, S., Wu, K., & Wang, Y. (2021). Self-adaptive spatial-temporal network based on heterogeneous data for air quality prediction. Connection Science, 33(3), 427–446. https://doi.org/10.1080/09540091.2020.1841095
Web of Science ®Google Scholar
Chen, F., Luo, M., Dong, Z., Li, Z., & He, X. (2018). Federated meta-learning with fast convergence and efficient communication. arXiv preprint arXiv:1802.07876.
Google Scholar
Chen, X., Xu, H., Zhang, Y., Tang, J., Cao, Y., Qin, Z., & Zha, H. (2018, February 5–9). Sequential recommendation with user memory networks. The 11th ACM International Conference on Web Search and Data Mining, Los Angeles, CA, pp.108–116.
Google Scholar
Dacrema, M. F., Cremonesi, P., & Jannach, D. (2019, September 16–20). Are we really making much progress? A worrying analysis of recent neural recommendation approaches. The 13th ACM Conference on Recommendaer Systems, Copenhagen, Denmark, pp.101–109.
Google Scholar
D'Angelo, G., Palmieri, F., Robustelli, A., & Castiglione, A. (2021). Effective classification of android malware families through dynamic features and neural networks. Connection Science, 33(3), 786–801. https://doi.org/10.1080/09540091.2021.1889977.
Web of Science ®Google Scholar
Dong, M., Yuan, F., Yao, L., Xu, X., & Zhu, L. (2020, August 23–27). MAMO: Memory-augmented meta-optimization for cold-start recommendation. The 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, USA, pp. 688–697.
Google Scholar
Finn, C., Abbeel, P., & Levine, S. (2017, July 6–11). Model-agnostic meta-learning for fast adaptation of deep networks. International Conference on Machine Learning (ICML), Sydney, NSW, pp. 1126–1135.
Google Scholar
Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv preprint arXiv:1410.5401.
Google Scholar
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2020). Meta-learning in neural networks: A survey. arXiv preprint arXiv:2004.05439.
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., & Wei, Y. (2018, June 18–23). Relation networks for object detection. The IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, pp. 3588–3597.
Google Scholar
Jiang, Y., Liang, W., Tang, J., Zhou, H., Li, K. C., & Gaudiot, J. L. (2021). A novel data representation framework based on nonnegative manifold regularisation. Connection Science, 33(2), 136–152. https://doi.org/10.1080/09540091.2020.1772722
Web of Science ®Google Scholar
Kumar, R., Bala, P. K., & Mukherjee, S. (2020). A new neighbourhood formation approach for solving cold-start user problem in collaborative filtering. International Journal of Applied Management Science, 12(2), 118–141. https://doi.org/10.1504/IJAMS.2020.106734
Google Scholar
Lee, H., Im, J., Jang, S., Cho, H., & Chung, S. (2019, July 4–8). MeLU: Meta-learned user preference estimator for cold-start recommendation. The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AL, pp. 1073–1082.
Google Scholar
Li, X., & She, J. (2017, August 13–17). Collaborative variational autoencoder for recommender systems. The 23th ACM ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, pp. 305–314.
Google Scholar
Lian, S., & Tang, M. (2021). API recommendation for mashup creation based on neural graph collaborative filtering. Connection Science. https://doi.org/10.1080/09540091.2021.1974819.
Google Scholar
Liang, W., Xiao, L., Zhang, K., Tang, M., He, D., & Li, K. C. (2021). Data fusion approach for collaborative anomaly intrusion detection in blockchain-based systems. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2021.3053842
PubMed Web of Science ®Google Scholar
Liang, W., Xie, S., Zhang, D., Li, X., & Li, K. C. (2020). A mutual security authentication method for RFID-PUF circuit based on deep learning. ACM Transactions on Internet Technology, 22(2), 1–20.
Web of Science ®Google Scholar
Lu, Y., Fang, Y., & Shi, C. (2020, August 23–27). Meta-learning on heterogeneous information networks for cold-start recommendation. The 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, California, USA, pp. 1563–1573.
Google Scholar
Munkhdalai, T., & Yu, H. (2017, July 6–11). Meta networks. International Conference on Machine Learning (ICML), Sydney, NSW, pp. 2554–2563.
Google Scholar
Pham, P., & Do, P. (2020). Topic-driven top-k similarity search by applying constrained meta-path based in content-based schema-enriched heterogeneous information network. International Journal of Business Intelligence and Data Mining, 17(3), 349–376. https://doi.org/10.1504/IJBIDM.2020.109295
Google Scholar
Rendle, S., Gantner, Z., Freudenthaler, C., & Schmidt-Thieme, L. (2011, July 24–28). Fast context-aware recommendations with factorization machines. The 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, pp. 635–644.
Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016, June 10–13). Meta-learning with memory-augmented neural networks. International Conference on Machine Learning (ICML), Jeju Island, South Korea, pp. 1842–1850.
Google Scholar
Shi, C., Hu, B., Zhao, W. X., & Yu, P. S. (2018). Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering, 31(2), 357–370. https://doi.org/10.1109/TKDE.2018.2833443
Web of Science ®Google Scholar
Shi, C., Li, Y., Zhang, J., Sun, Y., & Yu, P. S. (2016). A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 29(1), 17–37. https://doi.org/10.1109/TKDE.2016.2598561
Web of Science ®Google Scholar
Snell, J., Swersky, K., & Zemel, R. S. (2017). Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175.
Google Scholar
Sun, Y., Han, J., Yan, X., Yu, P. S., & Wu, T. (2011). Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11), 992–1003. https://doi.org/10.14778/3402707.3402736
Google Scholar
Tang, B., Tang, M., Xia, Y., & Hsieh, M. Y. (2021). Composition pattern-aware web service recommendation based on depth factorisation machine. Connection Science, 33(4), 1–21. https://doi.org/10.1080/09540091.2021.1911933.
Web of Science ®Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in Neural Information Processing Systems, 29, 3630–3638.
Google Scholar
Wang, X., He, X., Wang, M., Feng, F., & Chua, T. S. (2019, July 21–25). Neural graph collaborative filtering. The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, pp. 165–174.
Google Scholar
Wang, Y., & Yao, Q. (2019). Few-shot learning: A survey. arXiv preprint arXiv:1904.05046.
Google Scholar
Wei, J., He, J., Chen, K., Zhou, Y., & Tang, Z. (2016, August 8–12). Collaborative filtering and deep learning based hybrid recommendation for cold start problem. 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Auckland, New Zealand, pp. 874–877.
Google Scholar
Weston, J., Chopra, S., & Bordes, A. (2014). Memory networks. arXiv preprint arXiv:1410.3916.
Google Scholar
Xu, J., Xiao, L., Li, Y., Huang, M., Zhuang, Z., Weng, T. H., & Liang, W. (2021). NFMF: Neural fusion matrix factorisation for QoS prediction in service selection. Connection Science, 33(3), 1–16. https://doi.org/10.1080/09540091.2021.1889975.
Web of Science ®Google Scholar
Zhang, Y., Yin, G., & Chen, D. (2019). A dynamic cold-start recommendation method based on incremental graph pattern matching. International Journal of Computational Science and Engineering, 18(1), 89–100. https://doi.org/10.1504/IJCSE.2019.096948
Web of Science ®Google Scholar
Zhao, H., Yao, Q., Li, J., Song, Y., & Lee, D. L. (2017, August 13–17). Meta-graph based recommendation fusion over heterogeneous information networks. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Halifax, NS, pp. 635–644.
Google Scholar
Zhao, L., Wang, Y., Dong, D., & Tian, H. (2019). Learning to recommend via meta parameter partition. arXiv preprint arXiv:1912.04108.
Google Scholar
Zhu, Y., Lin, J., He, S., Wang, B., Guan, Z., Liu, H., & Cai, D. (2019). Addressing the item cold-start problem by attribute-driven active learning. IEEE Transactions on Knowledge and Data Engineering, 32(4), 631–644. https://doi.org/10.1109/TKDE.2019.2891530
Web of Science ®Google Scholar

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature