Recent DaMRL Projects

UNDERSTANDING TENSOR LIFECYCLE

Abstract:

Modern deep learning models, particularly transformer-based architectures, demand substantial memory and computational resources, often resulting in inefficiencies during training. While previous research has primarily focused on tracking basic tensor metadata such as shape and size, gaining a deeper understanding of tensor lifecycles and similarity patterns can uncover inefficiencies in tensor management and guide more effective optimization strategies.

In this paper, we introduce a lightweight and extensible tensor tracking framework that captures the complete lifecycle of tensors, including their creation, usage, metadata, and destruction, while also measuring tensor similarity using configurable thresholds. Additionally, we investigate the potential for safe tensor de-duplication to minimize memory usage without degrading model performance.

Our framework integrates seamlessly with PyTorch and operates without requiring modifications to the model architecture, offering valuable insights into tensor behavior. To illustrate its utility, we apply the framework during the fine-tuning of BERT on the IMDB sentiment classification task, revealing key patterns in tensor reuse and redundancy.

This work lays the groundwork for future memory optimization strategies and offers actionable insights into tensor dynamics, enabling more efficient and transparent training in resource-constrained environments.

Publications:

Acknowledgments:

COMPUTING COURSES AND EMPLOYMENT OPPORTUNITIES

Abstract:

In today’s world, where higher education plays an increasingly critical role, aligning academic curricula with industry needs is essential. This paper investigates the contextual relationship between computing courses and technical job requirements by leveraging various transformer models to encode course syllabi and job descriptions into high-quality, fixed-size vector embeddings. These embeddings allow for efficient, nuanced comparisons that uncover deeper connections between academic content and workforce demands.

Our study offers several unique contributions that fill key gaps in existing literature. First, we compile a large and up-to-date dataset of 197,296 job postings across five technical domains. Second, we conduct a detailed analysis using advanced transformer-based models to assess how well computing courses align with job descriptions, providing rich insights into curriculum-industry relevance. Third, we examine salary trends to identify which courses and associated skills are linked to high-paying jobs. Fourth, we differentiate between core and elective courses to inform curriculum design and help students make more strategic elective choices in light of industry needs.

Our findings reveal that top-ranked courses often integrate both technical expertise and essential professional skills such as communication and teamwork. Additionally, skills like cloud computing and database technologies appear consistently across various job categories, underlining their value in today’s technical landscape. Core courses, which are required for all students, generally show stronger alignment with industry requirements than electives. Notably, undergraduate courses tend to have broader alignment with job postings, whereas graduate-level courses show more targeted alignment with higher-paying positions. This distinction emphasizes the importance of aligning academic paths with specific career goals when considering graduate education.

Overall, this paper presents a replicable methodology for curriculum analysis and demonstrates its effectiveness through a case study of a computing program at a single institution.

Publications:

Christopher Lukas Kverne*, Federico Monteverdi*, Agoritsa Polyzou, Christine Lisetti, and Janki Bhimani, Course-Job Fit: Understanding the Contextual Relationship Between Computing Courses and Employment Opportunities, 2025 ASEE Annual Conference (ASEE’25), Montreal, QC, Canada. Acceptance Rate: 20%.

Public Software:

https://github.com/Damrl-lab/Course-Job-Fit

Acknowledgments:

NSF

QUANTUM NOISE BENEFICIAL FOR QML

Abstract:

Traditionally, quantum computing has treated hardware noise as an unwanted artifact to be minimized through error mitigation and correction. However, we propose a shift in perspective, particularly for quantum machine learning (QML), suggesting that quantum noise should be embraced and leveraged as a potential advantage, rather than viewed solely as a hindrance. Recent studies on noisy intermediate-scale quantum (NISQ) devices demonstrate that intentionally introducing noise can improve training stability and maintain model performance over extended periods (e.g., 190 days of hardware drift) without retraining, outperforming models that ignore noise across different platforms.

These early results suggest a broader insight: algorithms that adapt to and exploit hardware noise are better suited to the constraints of real quantum systems, transforming what has long been seen as a limitation into a functional feature. Moreover, current noise mitigation techniques often come with high computational costs and limited benefits. In this paper, we analyze how noise affects QML algorithms, compare various noise injection techniques, identify beneficial noise patterns, and propose several novel methods, including a noise-adapted performance ratio, noise-assisted quantum subspace expansion, noise-driven natural-gradient approximation, and dynamic noise profiling.

By treating noise as a computational asset, these approaches have shown measurable gains: up to 25% improvement in convergence speed, 20% increases in classification accuracy, and a reduction of 0.05 in energy estimation error.

Publications:

Acknowledgments:

DYNAMIC PAGE ALLOCATION POLICIES

Abstract:

We argue that the default page allocation strategy used in state-of-the-art (SOTA) memory tiering systems, where new pages are always allocated to DRAM until it reaches capacity, and subsequent pages are assigned to slower tiers such as CXL memory and persistent memory (PMEM), is not suitable for all workloads or operating conditions. This rigid allocation approach can often be suboptimal, leading to inefficient memory utilization.

To support this claim, we analyze the impact of page allocation strategies on workload performance. We introduce two memory access tracking kernels: one that captures a high volume of memory access events with fine granularity, and a lightweight alternative seamlessly integrated into the kernel. These tracking mechanisms pave the way for the development of a dynamic allocation policy system capable of adjusting allocation strategies based on workload characteristics.

Publications:

Acknowledgments:

OPTIMIZING DATA SPILLING FOR SCALABLE QUERY EXECUTION IN DISTRIBUTED SYSTEMS

Abstract:

An essential challenge in distributed systems, including databases, big data frameworks, and large-scale computing platforms, is managing workloads that exceed available memory capacity. A common strategy to mitigate memory pressure is data spilling, where excess data is offloaded from memory to disk. This technique is widely used in distributed storage engines, stream processing frameworks, and computational clusters to maintain system stability and handle memory-intensive workloads. While data spilling enhances scalability, cost efficiency, and robustness, it also introduces performance bottlenecks due to increased I/O overhead, leading to slower execution. Our analysis reveals that existing implementations suffer from suboptimal parallelism and poor coordination, resulting in underutilized resources and execution delays.

Addressing these inefficiencies is crucial for optimizing data spilling mechanisms and improving overall system performance. Therefore, in this paper, we introduce PARADISE, a framework that enhances data spilling in distributed systems through two key techniques. First, we propose Parallel Spilling, which efficiently groups spillable data objects within an execution engine and spills them together in parallel. This approach opportunistically spills more than the minimum required, reducing recurrent spill triggers and improving efficiency. Second, we introduce Coordinated Spilling, which synchronizes spilling across all nodes in a distributed system to enhance overall performance. Our key insight is that when a task on one node is delayed due to spilling, the entire task experiences increased latency, regardless of how efficiently other nodes execute their portions. By enforcing synchronized data spilling across all nodes handling the same task, we aim to prevent uncoordinated spills that increase latency for multiple queries.

We implemented our innovative spilling techniques in Presto version 0.290 and evaluated them using diverse queries from established benchmarks like TPC-H. Our results demonstrate up to an 85% reduction in task latency with parallel spilling and 50% or more improvement with coordinated spilling, highlighting the effectiveness of our approach.

Publications:

Acknowledgments:

IN-MEMORY DNN MODEL VERSIONING

Abstract:

DNN model versions are used for various purposes, including fine-tuning for downstream tasks, explainability, and debugging. Numerous checkpointing solutions can be adapted to persist intermediate versions of a model during training at different storage locations. Additionally, version management tools enable logging, visualization, comparison, and querying of metadata related to machine learning models, allowing users to track changes made to previously built models.

However, the version creation process in existing methods often incurs high runtime and storage overhead. In this paper, we introduce LATTICE, a low-latency, direct persistence-based DNN versioning library designed for Non-Volatile Memory (NVM) expansion devices. LATTICE minimizes stalls during model versioning and reduces end-to-end versioning time by reorganizing the version creation workflow, streamlining memory allocation and deallocation for efficient snapshot creation, and leveraging multi-threaded parallelism. We also develop a user-friendly versioning API that transparently implements direct persistence.

Publications:

Acknowledgments:

DNN FAULT-TOLERANT AND EFFICIENT CHECKPOINTING

Abstract:

Training large Deep Neural Networks (DNNs) is inherently resource-intensive and time-consuming, driving significant research into high-frequency checkpointing to enhance fault tolerance. However, the overhead associated with checkpointing can prolong overall training times. State-of-the-art (SOTA) solutions break down the checkpointing process into smaller phases, such as snapshot and persist, and pipeline these phases alongside foreground training operations. Additionally, some strategies leverage faster dynamic or persistent memory devices to reduce overhead.

Despite these advancements, SOTA methods still struggle to eliminate training stalls (i.e., overheads), primarily due to two critical constraints: (1) a bandwidth-limited PCIe bus, and (2) consistency requirements for copying the entire model state from the GPU. Drawing inspiration from previous research on model state re-initialization, we explore the possibility of relaxing these consistency requirements during failure recovery.

Based on our findings, we propose Fragment, a novel checkpointing solution that strategically divides the model state into multiple independent pieces (referred to as fragments), reducing checkpointing overhead and increasing checkpointing frequency, all without significantly impacting model accuracy. Partitioning complex models into fragments introduces challenges, which we address by ensuring model state integrity during both creation and restoration, while also optimizing layer selection.

Fragment reduces checkpointing overheads by 15% to 94%, and the total data checkpointed by 43% to 72%, without compromising fault-tolerance compared to SOTA solutions. It can also reduce recovery time by up to 70% following a failure. Moreover, Fragment is orthogonal to most existing checkpointing methods and can be used to further optimize them.

Publications:

Acknowledgments:

QUANTUM NEUTRAL NETWORK CHECKPOINTING

Abstract:

Quantum Neural Networks (QNNs) harness quantum superposition and entanglement, offering promising advantages for machine learning tasks. However, noise in quantum computers frequently disrupts QNN training, leading to wasted computational resources and extended queue times. This project introduces the first QNN checkpointing framework to address this challenge. Through experiments on various quantum devices, we demonstrate that QNN behavior is fundamentally hardware-dependent, with the same model performing differently across platforms. This key finding highlights that quantum checkpoints require additional metadata about hardware specifics and shot counts unique to quantum systems. Our framework requires minimal storage, only 186.6 KB for a 100-qubit QNN, and incurs negligible overhead, enabling frequent checkpointing to enhance training resilience and reproducibility in the NISQ era.

Publications:

Christopher Lukas Kverne*, Mayur Akewar*, Yuqian Huo, Tirthak Patel, and Janki Bhimani, Quantum Neural Networks Need Checkpointing, 2025 ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’25), Boston.

Public Software:

https://github.com/Damrl-lab/Quantum-Checkpointing

Acknowledgments:

NSF

MODELING ENVIRONMENTAL FACTOR IMPACTS ON SSDS

Abstract:

Environmental stressors such as temperature, humidity, vibration, and radiation can severely impact the performance and reliability of SSDs, particularly in edge, automotive, aerospace, and datacenter deployments. Capturing sensor data in the field and conducting accelerated lab experiments are challenging, as they are time-consuming, resource-intensive, and often destructive to hardware. Specialized setups, such as thermal chambers or vibration rigs, are also required. As a result, few studies explore this area, and current storage management techniques, such as RAID, tiering, and deduplication, do not account for environmental factors.

Developing models to capture these impacts would open new research opportunities across various fields. However, accurately modeling these effects remains challenging due to: (1) the limited availability of experimental data; (2) the complex, domino-like impact of historical exposure; (3) the interrelated nature of environmental factors, such as temperature and humidity, which are often correlated; (4) the differing responses of NAND flash memory types (TLC, MLC, and SLC) to environmental stressors; and (5) the difficulty analytical and simple machine learning models face in generalizing across devices, environments, and unseen combinations of stressors.

We believe that large language models (LLMs) may offer a transformative alternative to this complex problem. With embedded domain knowledge and reasoning capabilities, LLMs can facilitate prompt-based natural language interaction. We propose a hybrid framework that combines Chain-of-Thought prompting and Retrieval-Augmented Generation to guide LLMs using physical principles and prior experiments. This approach enables interpretable “what-if” analyses of SSD behavior under varying environmental conditions.

Publications:

Mayur Akewar*, Gang Quan, Sandeep Madireddy, and Janki Bhimani, Can LLMs Model the Environmental Impact on SSD?, 2025 ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’25), Boston.

Public Software:

https://github.com/Damrl-lab/SSD_LLM

Acknowledgments:

NSF

MODELING THE SPACE-TIME TRADE-OFF FOR THE DSI PIPELINE IN ML TRAINING

Abstract:

Data storage and ingestion (DSI) pipelines in machine learning (ML) training are responsible for fetching raw data from storage and pre-processing them before loading the samples
into GPU for training. To improve DSI performance, raw or pre-processed data is cached closer to the accelerator. Although caching pre-processed data over raw is an effective technique to mitigate pre-processing stalls, it comes at a cost of up to 15× higher data footprint. In this work, we present a model that explores the trade-off between raw and pre-processed caching. We build an online system that uses the model to identify the optimal memory split for the raw and pre-processed caches based on the size of the data set and the system parameters. Our model helps improve GPU utilization by increasing the DSI throughput by up to 2×. Furthermore, we demonstrate the usefulness of our model by exploring the performance of hypothetical system designs, should GPU or CPU performance increase.

Publications:

Acknowledgments:

EVOLVING FLASH-BASED STORAGE

Abstract:

Flash-based solid-state drives (SSDs) are widely used to accelerate different applications because of their superior overall performance compared to hard-disk drives (HDDs). To achieve better performance with SSDs, the storage stack overhead imposed by the operating system (OS), rather than device speed, is now the bottleneck that must be addressed as a key research priority. It is critical to evolve new techniques to take full advantage of the unique characteristics of flash memory and flash-based persistent storage. However, our existing OS cannot take advantage of such techniques as it is designed in a very generic fashion to support the broad class of the storage devices. There is thus a critical need to rethink our system infrastructure to take advantage of the best and potentially unique aspects of flash-based memory and NVMe SSDs as persistent storage. The primary objective of our research is to design new system infrastructures, that take advantage of the unique flash characteristics exposed by new storage devices, for accelerating various applications.

Publications:

Mahsa Bayati, Janki Bhimani, Ronald Lee, Ningfang Mi. Exploring Benefits of NVMe SSDs for Big Data Processing in Enterprise Data Centers International Conference on Big Data Computing and Communication (BIGCOM19), Qingdao, China, 2019.

Acknowledgments:

NSF

DATACENTER SCHEDULING AND RESOURCE MANAGEMENT

Abstract:

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. High-performance and scalable frameworks have become the need of the hour for data-intensive processing and analytics in both industry and academia. More and more applications are using the new parallel-data computing techniques used as TensorFlow, and Apache Spark. It is an interesting research problem to maximize resource utilization and minimize big data processing time. However, given the limited resources in the cluster and a complex dependency in data flow, it is challenging to design scheduling and resource management techniques. Therefore, the primary focus of our research is to put significant efforts into developing new schemes for job scheduling and resource management for evolving parallel-data computing frameworks and applications.

Publications:

Danlin Jia, Janki Bhimani, Son Nam Nguyen, Bo Sheng, and Ningfang Mi, ATuMm: Auto-tuning Memory Manager in Apache Spark, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%.

Public Software:

https://github.com/DanlinJia/spark_core_ATMM

Acknowledgments:

NSF

I/O BEHAVIOR MODELING & PERSISTENT STORAGE DEVICE CONFIGURATION

Abstract:

This project makes empirical contributions to storage systems by addressing challenges issued by large-scale data-intensive applications. Specifically, it advances (1) how to analyze the impact of various system components while running multiple workloads on emerging storage systems; (2) how to design interactive frameworks that allow users to modify the internal algorithms and parameters of modern storage devices; (3) how to enable novices to configure storage systems with respect to their workloads and data processing requirements; and (4) how to derive I/O models to predict future I/O workload patterns and accordingly configure storage systems in advance for better performance.

This project will allow designing better storage systems with high performance and reliability. The outcome of this project will bring a significant impact on many areas that are dependent on processing a large amount of data. This project will share the findings with undergraduate and graduate students through computer science and engineering programs and open up career opportunities to female students, underrepresented minorities, and first-generation college students. This project will disseminate the proposed techniques into the industry and foster technology transfer through new industrial collaborations. The developed infrastructure will be available to the research community through a web-based portal.

Publications:

Danlin Jia, Manoj Pravakar Saha, Janki Bhimani, and Ningfang Mi, ”Performance and Consistency Analysis for Distributed Deep Learning Applications”, 2020 International PerformanceComputing and Communications Conference (IPCCC20), Virtual using Zoom, 2020. Acceptance Rate: 29.3%
Janki Bhimani, Ningfang Mi, Miriam Leeser, and Zhengyu Yang, New Performance Modeling Methods for Parallel Data Processing Applications, ACM Transactions on Modeling and computer simulation (TOMACS), 2019. DOI 10.1145/3309684.
Janki Bhimani, Rajinikanth Pandurangan, Ningfang Mi, and Vijay Balakrishnan, Emulate Processing of Assorted Database Server Applications on Flash-Based Storage in Datacenter Infrastructures, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%

Public Software:

Acknowledgments:

NSF

IMPACT OF ENVIRONMENTAL FACTORS ON FLASH STORAGE

Abstract:

Understanding the reliability of components is an important criterion for building robust systems. Data storage is one of the most critical components that is at the center of all emerging technologies. Thus, studying reliability and different types of faults for system storage components is important. Moreover, with fastly emerging flash-based storage technologies such as Solid State Drives (SSDs), the previous fault tolerance understandings for Hard Disk Drives (HDDs) are not directly applicable. We study the impacts of various environmental factors such as vibration, temperature, humidity, etc. on the performance of SSDs in data center infrastructures. We investigate the “short-term” and “long-term” impacts of exposure to SSDs. We also analyze the impacts of different types of application workloads.

Publications:

Janki Bhimani, Tirthak Patel, Ningfang Mi, and Devesh Tiwari, What does Vibration do to YourSSD?, 2019 Design Automation Conference (DAC19), Las Vegas, NV, 2019. Acceptance Rate:24.3%.

Public Software:

https://github.com/bhimanijanki/SSD_Vibration

Acknowledgments:

NSF

IMPROVING FLASH ENDURANCE IN DATA CENTERS

Abstract:

With the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind is to reduce internal movement of data — when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. We are working towards designing systems to perform the data placements for improving the flash endurance in data centers while running multi-tenant applications.

Publications:

Janki Bhimani, Zhengyu Yang, Jingpei Yang, Adnan Maruf, Ningfang Mi, Rajinikanth Panduran-gan, Changho Choi, Vijay Balakrishnan. Automatic Stream Identification to Improve Flash Endurance in Data Centers. ACM Transactions on Storage (TOS) 2021.
Janki Bhimani, Ningfang Mi, Zhengyu Yang, Jingpei Yang, Rajinikanth Pandurangan, Changho Choi, and Vijay Balakrishnan, FIOS: Feature-Based I/O Stream Identification for ImprovingEndurance of Multi-Stream SSDs, 2018 IEEE International Conference on Cloud Computing (CLOUD’18), San Francisco, CA, 2018. Acceptance Rate: 15%. (Best Paper Award)
Janki Bhimani, Jingpei Yang, Zhengyu Yang, Ningfang Mi, NHV Krishna Giri, RajinikanthPandurangan, Changho Choi, and Vijay Balakrishnan. Enhancing SSDs with multi-stream: What? why? how? IEEE International Performance Computing and Communications Conference (IPCCC17), San Diego, CA, 2017. (Short Paper)

Public Software:

https://github.com/bhimanijanki/ms_ssds_sim

Acknowledgments:

Samsung, NSF

UNDERSTANDING FLASH-BASED STORAGE I/O BEHAVIOUR OF GAMES

Abstract:

Computer games are an extremely popular but overlooked workload. Cloud gaming has been one of the biggest buzzwords in the gaming industry throughout 2020. The rapid growth of the video gaming industry and the diverse set of popular video games available today raises increasing concern to properly understand its I/O characteristics to improve their performance and design better gaming servers and consoles. We attempt to systematically measure, quantify, and characterize the organization of game data into files, back-end storage access patterns, and the performance of gaming workloads. We explore the I/O behavior of the recent and famous games, producing a series of observations coming from measurements done on a real setup.

Publications:

Adnan Maruf, Zhengyu Yang, Bridget Davis, Daniel Kim, Jeffrey Wong, Matthew Durand, and Janki Bhimani, Understanding Flash-Based Storage I/O Behavior of Games, 2021 IEEE International Conference on Cloud Computing (CLOUD’21), Online Virtual Congress, 2021. Acceptance Rate: 23.8%.

Acknowledgments:

Samsung, NSF

EMERGING KEY-VALUE BASED FLASH MEMORIES

Abstract:

An increasing concern that curbs the widespread adoption of KV-SSD is whether or not offloading host-side operations to the storage device changes device behavior, negatively affecting various applications’ overall performance. In this paper, we systematically measure, quantify, and understand the performance of KV-SSD by studying the impact of its distinct components such as indexing, data packing, and key handling on I/O concurrency, garbage collection, and space utilization. Our experiments and analysis uncover that KV-SSD’s behavior differs from well-known idiosyncrasies of block-SSD. A proper understanding of its characteristics will enable us to achieve better performance for random, read-heavy, and highly concurrent workloads.

Publications:

Manoj Pravakar Saha, Bryan Kim, and Janki Bhimani, KV-SSD: What is it Good For?, 2021 Design Automation Conference (DAC’21), San Francisco, CA, 2021. Acceptance Rate: 23%.

Public Software:

Acknowledgments:

Samsung, NSF

MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems

Abstract:

The rapid growth of in-memory computing by data-intensive applications today has increased the demand for DRAM in servers. However, a DRAM-based system can be limiting for modern workloads because of its capacity, cost, and power consumption characteristics. Hybrid memory systems, which consist of different types of memory, such as DRAM and persistent memory, can help address many of these limitations. Persistent memory devices are byte-addressable like DRAM but are also larger in capacity and consume less power relative to DRAM. One promising direction that has been explored in the recent literature is to introduce these persistent memory devices as a second memory tier that is directly exposed to the CPU. The resulting tiered memory design must address the fundamental challenge of placing the right data in the right memory tier at the right time with minimal real-system overhead. We present MULTI-CLOCK, an efficient, low-overhead hybrid memory system that relies on a unique page selection technique. MULTI-CLOCK’s careful page selection captures both page access recency and frequency, and move pages to appropriate tiers at the right time within hybrid memory systems. We implemented a Linux-based, NUMA Aware version of MULTI-CLOCK that is entirely transparent and backward compatible with any existing application. Our evaluation with diverse real-world applications such as graph processing and key-value stores shows that MULTICLOCK can improve the average throughput by as much as 352% when compared with several state-of-the-art techniques for tiered memory

Publications:

Adnan Maruf, Ashikee Ghosh, Janki Bhimani, Daniel Campello, Andy Rudoff, Raju Rangaswami, MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA’22), Seoul, South Korea, 2022. Acceptance Rate: 30%.

Public Software:

https://github.com/bhimanijanki/Multi-Clock

Acknowledgments:

NSF

TOWARDS LEVERAGING IN-STORAGE INDEXING DEVICES (ISIDs)

Abstract:

The overarching goal of this research project is to advance the capabilities of ISIDs to promote their widespread adoption in storage systems with better performance. The specific research objectives of the project are organized into four thrusts.
The first thrust focuses on modeling ISID performance and reliability by developing novel queuing models that capture dependencies among internal features and proposing solutions for dynamic model calibration. The second thrust aims to design new elastic index management techniques that effectively utilize the limited on-device resources, taking into account flash-specific constraints to optimize endurance and latency. The third thrust addresses the challenges of adaptive indexing in multi-tenant environments, including interference mitigation, optimized index updates, and reevaluation of wear-leveling techniques. Lastly, the fourth thrust aims to establish a host-device interface, conduct comprehensive case studies, develop productivity tools, and integrate programmable board-based SSDs into an open-source Linux code base. Through these research endeavors, the project aims to drive significant advancements in ISID technology and contribute to the overall improvement of performance and efficiency in storage systems.

Publications:

Janki Bhimani, Jingpei Yang, Ningfang Mi, Changho Choi, and Manoj Pravakar Saha, Fine-grained Control of Concurrency within KV-SSDs, 2021 14th ACM International Systems and Storage Conference (SYSTOR’21), Virtual. Acceptance Rate: 29.9%.
Ziyang Jiao, Janki Bhimani, Bryan S. Kim, Wear Leveling in SSDs Considered Harmful, 2022 ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’22), Virtual. (Best Paper Award)
Manoj Saha, Danlin Jia, Janki Bhimani and Ningfang Mi, MoKE: Modular Key-value Emulator for Realistic Studies on Emerging Storage Devices, 2023 IEEE International Conference on Cloud Computing (CLOUD’23), Hybrid Event, Chicago, IL, 2023.
Manoj P. Saha, Omkar Desai, Bryan S. Kim, Janki Bhimani. “Leveraging Keys In Key-Value SSD for Production Workloads” The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’23), Orlando, FL, 2023. (Short Paper)
Adnan Maruf, Daniel Carlson, Ashikee Ghosh, Manoj Saha, Janki Bhimani, Raju Rangaswami. “Allocation Policies Matter for Hybrid Memory Systems” The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’23), Orlando, FL, 2023. (Short Paper)
Manoj P. Saha, Bryan S. Kim, Haryadi S. Gunawi, Janki Bhimani. “RHIK – Re-configurable Hash-based Indexing for KVSSD” The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’23), Orlando, FL, 2023. (Short Paper)

Public Software:

Acknowledgments:

NSF

Recent DaMRL Projects

UNDERSTANDING TENSOR LIFECYCLE

QUANTUM NOISE BENEFICIAL FOR QML

DYNAMIC PAGE ALLOCATION POLICIES

OPTIMIZING DATA SPILLING FOR SCALABLE QUERY EXECUTION IN DISTRIBUTED SYSTEMS

IN-MEMORY DNN MODEL VERSIONING

DNN FAULT-TOLERANT AND EFFICIENT CHECKPOINTING

QUANTUM NEUTRAL NETWORK CHECKPOINTING

MODELING ENVIRONMENTAL FACTOR IMPACTS ON SSDS

MODELING THE SPACE-TIME TRADE-OFF FOR THE DSI PIPELINE IN ML TRAINING

EVOLVING FLASH-BASED STORAGE

DATACENTER SCHEDULING AND RESOURCE MANAGEMENT

I/O BEHAVIOR MODELING & PERSISTENT STORAGE DEVICE CONFIGURATION

IMPACT OF ENVIRONMENTAL FACTORS ON FLASH STORAGE

IMPROVING FLASH ENDURANCE IN DATA CENTERS

UNDERSTANDING FLASH-BASED STORAGE I/O BEHAVIOUR OF GAMES

EMERGING KEY-VALUE BASED FLASH MEMORIES

MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems

TOWARDS LEVERAGING IN-STORAGE INDEXING DEVICES (ISIDs)

Resources

Knight Foundation School of Computing and Information Sciences