SDK Overview

A full-stack framework for seamless CXL
integration and efficient data-centric computing

SW Framework

XCENA provides multi-level APIs, emulation/simulation tools and drivers for multiple operating systems to facilitate integration with applications and workflows

SW Framework diagram
Application Examples
KV Cache Offloading
KV Cache Offloading

In LLM inference, the KV cache has emerged as a primary performance and cost bottleneck because its size grows rapidly with context length and batch size. Limited GPU memory forces frequent recomputation, cache eviction, or spilling to storage, while the lack of KV sharing across requests leads to redundant prefill computation and wasted memory. 

CXL solves this by introducing a shared memory pool that expands capacity beyond GPU memory, enabling KV reuse across workers. With CXL’s load/store semantics, KV data can be accessed with zero-copy, reducing recomputation, stabilizing latency, and significantly lowering cost per token.

RAG - Vector DB Acceleration
RAG - Vector DB Acceleration

Modern LLM applications increasingly depend on vector databases to retrieve up-to-date and external knowledge beyond what models learned during training. As model sizes and the number of embeddings grow, vector databases become more memory-intensive and harder to scale efficiently using only DRAM or GPU memory. Keeping vectors in slower storage tiers increases retrieval latency and limits throughput. 

Accelerating vector databases using CXL memory allows large embedding datasets and indices to stay in memory, reducing access latency and improving retrieval performance, which is critical for scalable and responsive RAG systems.

Data Analytics Acceleration
Data Analytics Acceleration

Large-scale data analytics requires processing massive volumes of data before any AI training or inference can even begin. Scale-out analytics frameworks such as Spark, Databricks, and Snowflake rely on clusters composed of many servers to handle memory-intensive ETL workloads, which leads to high infrastructure cost and inefficiencies from data movement and memory pressure. 

By offloading analytics execution to CXL-based computational memory like the MX1, intermediate data can be processed closer to where it resides, reducing memory bottlenecks and unnecessary data transfers.