Introduction

University Research Facility in Big Data Analytics (UBDA) is the university-level research facility in Big Data Analytics in Hong Kong for cross-disciplinary research collaborations, teaching, and learning, as well as a partnership with industry.

UBDA provides a dedicated, secure, and scalable 24/7 big data platform for building the analytic solutions.

UBDA offers several benefits, including:

GPUs and CPUs accelerated computing platform

Consultancy platform to formulate the big data related research problems

Opportunity to have joint labs, projects, and sponsorships

UBDA is a big data platform to store and analyze your data for finding the hidden patterns, exploring unknown correlation, improving prediction, supporting decision making, recommending services, and products and other analytic solutions.

UBDA_platform

The main objective of UBDA establishment is to meet the increasing demand for computing resources and expertise in big data analytics. It has significant value in promoting open innovation in all aspects of human, social, and technology development.

UBDA offers an advanced infrastructure including the computing platform, data repository, and data analytics tools, and libraries and provides a platform for cross-disciplinary collaboration among PolyU researchers and external partners to develop, support, service and sustain research into big data analytics.

The main features of the UBDA platform are as follows:

Running programs with multiple CPU cores among multiple computing nodes.

A user can execute their own developed applications like C, Fortran, Python, R, etc., in UBDA platform with multiple CPU cores and computing nodes via MPI/MPICH.

Perform data analytics and Deep Learning with nVidia GPU support.

The multiple GPU-cards enabled machines are available to perform the data analytics and Deep Learning with the tools, like Tensorflow with the nVidia GPU support.

Perform data analytics with Big Data Analytic solutions, like Hadoop and Spark.

A user can perform big data analytics via the provided Apache Hadoop and Spark solution with the support of multiple computing nodes.

5 Layered Architecture of UBDA Platform

The UBDA platform has five layers: Storage Layer, Networking Layer, Computing Cluster Layer, Application Layer, Service Layer.

Description of each layer are as follows:

Storage layer allows researchers to store and process their research data in a reliable environment.

Network Layer consists of internal and external networks (InfiniBand and Ethernet). Internal network (InfiniBand) is mainly for internal interconnection of computing nodes with low latency and non-blocking data transfer to support big data analysis. External network (Ethernet) is mainly for external connection to the campus network and public internet.

Computing Cluster Layer consists of a pool of various types CPU nodes, GPU nodes and MIC nodes which can be configured to form the required clusters for different kinds of big data analytics projects, supporting both data-intensive and computation intensive processing tasks.

The Application Layer provides modeling and programming support for developing applications of different areas and is composed of domain-specific models, languages, and algorithms, some of which are represented as software tools and libraries. Researchers can also install their own software to support their research.

The Service Layer is a common management layer providing the interface for accessing and using the underlying big data facility. It allows the users to log in to the UBDA system to manage their profile, access to the allocated resources, install and configure their applications, and manage their jobs through the job scheduler.

The UBDA Platform's Infrastructure is as follows:

Parallel and Hadoop File System (BeegFS, 981TB Usable)
Block, File and Object Storage System (CephFS, 200TB Usable)

Internal network with Infiniband EDR 100G

Internal network with Infiniband EDR 200G

10G External Ethernet Network

CPU	Dual CPU sockets and Quad CPU sockets Computing Nodes with 1592 CPU cores and over 9 TB memory
GPU	Computing Nodes with nVidia P100 GPU, Over 86,000 CUDA cores Computing Nodes with nVidia Lovelace architecture GPU, over 290,000 Tensor cores
MIC	Intel Xeon Phi CPU Computing Nodes with 136 CPU cores

Big Data Analytics	Apache Hadoop and Spark, Machine Learning/AI: CUDA, TensorFlow with GPU support
Programming/Scripting tools	Intel Compilers in “Intel Parallel Studio XE Cluster Edition ”, GNU C, C++, Fortran, Perl, Python, R
MPI Support	Intel MPI, OpenMPI, and MPICH2
Others	OpenFOAM, ANSYS (Fluent)
Container	Singularity

HPC Service, Cloud Service, GVM Service, JupyterLab Service