Memory Allocator

This document describes the design and implementation of the LRU cache-based memory allocator used in the project.

Architecture

The memory allocator consists of the following components:

graph TD
    A[Memory Allocator] --> B[CPU Allocator]
    A --> C[CUDA Allocator]
    B --> D[LRU Cache]
    B --> E[Reference Counter]
    C --> F[CUDA LRU Cache]
    C --> G[CUDA Reference Counter]

Core Components

1. Allocator Trait

pub trait Allocator {
    fn allocate(&mut self, layout: Layout, device_id: usize) -> Result<*mut u8, TensorError>;
    fn deallocate(&mut self, ptr: *mut u8, layout: &Layout, device_id: usize);
    fn insert_ptr(&mut self, ptr: *mut u8, device_id: usize);
    fn clear(&mut self);
}

2. Storage Trait

pub trait Storage {
    fn increment_ref(&mut self, ptr: SafePtr);
    fn decrement_ref(&mut self, ptr: SafePtr) -> bool;
}

Memory Management Strategy

LRU Cache Strategy

Allocation Process:
- Check if memory with the same layout exists in cache
- If found, retrieve from cache
- If not found, allocate new memory
- If cache is full, free least recently used memory
Deallocation Process:
- Decrease reference count
- When reference count reaches 0:
  - Remove from allocated set
  - Put memory into cache
  - If cache is full, free least recently used memory

Reference Counting

Uses HashMap to store reference counts for each pointer
Manages references through increment_ref and decrement_ref
Automatically recycles memory when reference count reaches 0

Safety Considerations

Thread Safety:
- Global state protected by Mutex
- SafePtr implements Send and Sync traits
Memory Safety:
- Automatic null pointer checks
- Prevention of double-free
- Automatic cleanup of all memory on program exit

Usage Examples

CPU Memory Allocation

let mut allocator = CACHE.lock().unwrap();
let layout = Layout::from_size_align(size, align).unwrap();
let ptr = allocator.allocate(layout, 0)?;

CUDA Memory Allocation

let mut allocator = CUDA_CACHE.lock().unwrap();
let layout = Layout::from_size_align(size, align).unwrap();
let (ptr, device) = allocator.allocate(layout, device_id)?;

Implementation Details

Global Cache

The allocator uses a global cache to manage memory lifetime. See implementation here.

Memory Pool

Memory pools are used to reduce allocation overhead:

pub struct MemoryPool {
    cache: LRUCache<Layout, Vec<*mut u8>>,
    allocated: HashSet<*mut u8>,
    ref_count: HashMap<*mut u8, usize>,
}

Best Practices

Memory Leak Prevention:
- Ensure proper reference count management
- Use RAII pattern for resource management
- Check for leaks on program exit
Performance Optimization:
- Use LRU cache to reduce allocation overhead
- Minimize memory fragmentation
- Batch allocations when possible