Knowledge Graph

We are building the enterprise-class knowledge base to enhance JD’s recommendation and search results with semantic-search knowledge gathered from a wide variety of information sources.

Our research directions for knowledge graph at the Data Science Lab include:

Relation Extraction

Entity Detection and Linking

Semantic Search in Recommender Systems

Knowledge Graph in E-Commerce

Information networks are ubiquitous in many applications. A popular way to facilitate the information in a network is to embed the network structure into low-dimension spaces where each node is represented as a vector. The learned representations have been proven to advance various network analysis tasks such as link prediction and node classification. The majority of existing embedding algorithms are designed for the networks with one type of nodes and one dimension of relations among nodes. However, many networks in the real-world complex systems have multiple types of nodes and multiple dimensions of relations. For example, an e-commerce network can have users and items, and items can be viewed or purchased by users, corresponding to two dimensions of relations. In addition, some types of nodes can present hierarchical structure. For example, authors in academic networks are associated to affiliations; and items in e-commerce networks belong to categories. Most of existing methods cannot be naturally applicable to these networks. In this paper, we aim to learn representations for networks with multiple dimensions and hierarchical structure. In particular, we provide an approach to capture independent information from each dimension and dependent information across dimensions and propose a framework MINES, which performs Multi-dimension Network Embedding with hierarchical Structure. Experimental results on a network from a real-world e-commerce website demonstrate the effectiveness of the proposed framework.

Figure 1 An illustrative example of a multi-dimensional net-work with hierarchical structure

A typical example of multi-dimensional networks with hierarchical structure is illustrated in Figure 1, where there are two types of nodes U and T. The nodes in U presents hierarchical structure with C. The relations of nodes in U, nodes in T and nodes between U and T are two-dimensional; while nodes in U are associated to the elements in C. The vast majority of existing embedding algorithms cannot be naturally applicable to multi-dimensional networks with hierarchical structure.

Discriminating Substitutable and Complementary Products in E-commerce Portals

Personalized recommender systems suggest products to users according to their interests and how to generate {personalized} product candidates, i.e., a retrieval process, play a crucial role. Among various retrieval strategies in recommendations, substitutes and complements are of the most importance. %Distinguishing substitutes and complements is very important. For example, when a user has browsed a t-shirt, it is reasonable to generate similar t-shirts, i.e., substitutes; while if the user has already purchased one, it would be better to retrieve trousers, hats or shoes, as complements of t-shirts.

In this paper, we propose a path-constrained framework (PMSC) for discriminating substitutes and complements. Specifically, for each product, we first learn its embedding representation in a general semantic space. Then, we project the embedding vector into two separate spaces via a novel mapping function. Thereafter, we incorporate each embedding with path-constraints to further boost the discriminative ability of the model. Extensive experiments on two datasets from two real e-commerce sites show the effectiveness of our proposed method.

Figure 2 Examples for Multi-Step Path Constraints

In addition to the category constraints, multi-step paths between product pairs can reflect much more complex patterns than the direct links. In the graph, a multi-step path is a sequence of nodes and relations. For instance, as in Figure 2, a t-shirt is a substitute to a polo shirt and both of them are complements to jeans (similar ideas are shown in Figure 2). We count the frequency of instances discovered by all the reasonable two-hop relations in the Amazon dataset, and find out that four patterns have dominated the data distribution. Therefore, to capture such information, multi-step path constraints is designed and incorporated.