DaMRL – Data Management Research Laboratory

In the Data Management Research Lab (DaMRL), we research how to manage and make sense out of petabytes of data that is generated each day. We focus on improving the performance and reliability of evolving data processing, memory management, and data storage infrastructures. We work with emerging technologies to develop holistic solutions by designing novel resource management, scheduling, and capacity planning techniques for large-scale data-intensive systems. We strive towards designing a workspace to flourish creativity, improving the experiences of those underrepresented in computer science such as women and minorities. DaMRL is always looking for new candidates inspired to challenge the norms and lead the transformation of our society into data-age.

News

2025
- DamRL proudly welcome a new Postdoc – Dr. Ziyang Jiao, who will be joining us after his Ph.D. from Syracuse University.
- Happy to share that this year we have two papers, “Can LLMs Model the Environmental Impact on SSD?” and “Quantum Neural Networks Need Checkpointing”, accepted at HotStorage 2025. Congratulations to the team, especially to Mayur and Christopher as lead authors of this work.
- Our research paper “Course-Job Fit: Understanding the Contextual Relationship Between Computing Courses and Employment Opportunities” has been accepted at the ASEE Annual Conference 2025. Congrats to the team and to Christopher as the lead author.
- Congratulations to Dr. Bhimani for receiving the In the Company of Women Award from Miami-Dade County in the Science and Technology category.
- Happy to share that our paper “Heimdall: Optimizing Storage I/O Admission with an Extensive Machine Learning Pipeline” has been accepted at EuroSys 2025, which had an 8% acceptance rate.
- We extend a warm welcome to Lorena Quincoso Lugones, who has been accepted as a Research Assistant at DaMRL.
- Our ACM Transactions on Storage (TOS) journal paper, titled “Storage Abstractions for SSDs: The Past, Present, and Future,” has been accepted. Congratulations to the team!
2024
- Congratulations to Mayur Akewar for being accepted as a Fully Funded Ph.D. student starting Fall 2024 at DaMRL!
- We extend a warm welcome to Christopher Lukas Kverne, who has been accepted as a Research Assistant at DaMRL.
- DaMRL proudly welcomes Bhuvan L and Muttahar Khalid, who have been accepted for Funded M.S. Thesis Opportunities.
- Congratulations to Manoj Saha for securing a prestigious Summer Internship at Samsung Research Lab!
- We are thrilled to announce that Ali has been accepted as a Ph.D. student starting in Summer 2024.
- Heartfelt congratulations to Prof. Janki Bhimani for receiving the prestigious NSF CAREER Award for “CAREER: Towards Efficient In-storage Indexing”.
- Dr. Bhimani receives further accolades with the NSF REU Award towards “CSR: Small: Learning and Management in Tiered Memory Systems”. Congratulations!
2023
- Congratulations to Prof. Janki Bhimani on a new $515K NSF award “CSR: Small: Learning and Management in Tiered Memory Systems” as a sole PI.
- Congratulations to Manoj P. Saha, Omkar Desai, Bryan S. Kim, and Janki Bhimani for their accepted short paper at HPDC’23 – “Leveraging Keys In Key-Value SSD for Production Workloads.”
- Congratulations to Adnan Maruf, Daniel Carlson, Ashikee Ghosh, Manoj Saha, Janki Bhimani, and Raju Rangaswami for their accepted short paper at HPDC’23 – “Allocation Policies Matter for Hybrid Memory Systems.”
- Congratulations to Manoj P. Saha, Bryan S. Kim, Haryadi S. Gunawi, and Janki Bhimani for their accepted short paper at HPDC’23 – “RHIK – Re-configurable Hash-based Indexing for KVSSD.”
- Congratulations to Dr. Janki Bhimani for receiving the Samsung Memory Solutions Lab (MSL) Research Award of $50K as the Principal Investigator for her project titled “Leveraging Disaggregated Servers for Large Scale AI Training Acceleration.”
- Congratulations to Manoj Saha for his success as the lead author of the accepted paper titled “MoKE: Modular Key-value Emulator for Realistic Studies on Emerging Storage Devices” at the 2023 IEEE International Conference on Cloud Computing (CLOUD’23) in Chicago, IL.
- Congratulations to Daniel for being accepted as a Ph.D. student with Graduate Assistantship starting in Fall 2023.
- Congratulations to Prof. Janki Bhimani for receiving the FIU Top Scholar Award in the category of Research, Creative Activities, and Award-Winning Publications. This prestigious award is one of the top honors across all colleges at FIU.
- Congratulations to Adnan for successfully defending his Ph.D. thesis and graduating to join as a Professor at Missouri State University (MSU).
2022
- Congratulations to Prof. Bhimani for receiving the Outstanding Applied Research Award by the Knight Foundation School of Computing and Information Science (KFSCIS), FIU.
- Well done on your PhD proposal, Adnan! Congratulations on taking another step closer to being called Dr.
- Congratulations to Dr. Bhimani for her NSF HSI award “HRD-2225201 – HSI Institutional Transformation Project Voces (Voices for Organizing Change in Educational Systems)” of $2,999,986 as a key personnel.
- Prof. Bhimani’s COP 3530 Data Structure course has received Quality Matters Certification. Well-deserved!
- Congratulations to Daniel for graduating with recognition as the “Best Student.”
- Once again, congratulations to Janki and all her collaborators for receiving the Best Paper Award at HotStorage’22 for their paper titled “Wear Leveling in SSDs Considered Harmful.”
- Congratulations to Lab Director Dr. Bhimani for her successful paper titled “Thermal Aware System-Wide Reliability Optimization for Automotive Distributed Computing Applications” published in the IEEE Transactions on Vehicular Technology 2022, a Tier 1 Journal with an impact factor of 2.243.
- Congratulations to Adnan and Sashri for their outstanding achievement with the paper titled “Do Temperature and Humidity Exposures Hurt or Benefit Your SSDs?” at the 2022 Design, Automation, and Test in Europe Conference. It is also a candidate for the Best Paper Award.
- Congratulations to Adnan and Ashikee for their diligent work on the research paper titled “MULTI-CLOCK: Dynamic Tiering for Hybrid Memory Systems,” accepted to be published at the highly selective IEEE International Symposium on High-Performance Computer Architecture (HPCA’22) in Seoul, South Korea, 2022, with an acceptance rate of 30%.
2021
- Invited to serve on TPC for ICDCS’22 and HotStorage’22.
- Recognized as Distinguished PC member for HotStorage ’21.
- Congratulations! to Adnan for his dedicated work towards the research paper “Understanding Flash-Based Storage I/O Behavior of Games”, accepted to be published at a highly selective conference of IEEE International Conference on Cloud Computing (CLOUD’21).
- Invited to serve as Session Chair for Flash Storage session at ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’21).
- Congratulations to the team (Janki Bhimani, Zhengyu Yang, Jingpei Yang, Adnan Maruf, Ningfang Mi, Rajinikanth Pandurangan, Changho Choi, Vijay Balakrishnan) for the great success of publishing an article on “Automatic Stream Identification to Improve Flash Endurance in Data Centers” in one of the high impact journals of ACM Transactions on Storage (TOS) 2021.
- Congratulations to the team (Janki Bhimani, Jingpei Yang, Ningfang Mi, Changho Choi, Manoj Saha) for the accepted research paper on “Fine-grained Control of Concurrency within KV-SSDs” to be published at the highly selective conference 14th ACM International Systems and Storage Conference (SYSTOR’21).
- Congratulations! to Manoj for his dedicated work towards the research paper accepted to be published in one of the world’s top conferences of DAC which is the oldest and largest conference in electronic design automation (EDA), started in 1964 with more than 6000 participants each year – Manoj Pravakar Saha, Bryan Kim, and Janki Bhimani, “KV-SSD: What is it Good For?”, 2021 Design Automation Conference (DAC’21), San Francisco, CA, 2021. Acceptance Rate: 23%.
- We welcome three new undergraduate students – Aris Duani Rojas, Christopher Meadows, and Kevin Nordman to join DaMRL as Research Assistants.
- Congratulations to DaMRL, and thanks to NSF CISE CNS for funding Research Experience For Undergraduate (REU) supplement to our project on “New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration”.
- We welcome undergraduate student Sashri Brahmakshatriya to join DaMRL as Research Assistant.
- Invited to serve on TPC for Hot Topics in Storage and File Systems (HotStorage) 2021.
2020
- Congratulations to the team (Janki Bhimani, Adnan Maruf, Ningfang Mi, Rajinikanth Pandurangan, and Vijay Balakrishnan.) for the great success of publishing article on “Auto-Tuning Parameters for Emerging Multi-Stream Flash-Based Storage Drives Through New I/O Pattern Generations.” in one of the high impact journals of IEEE Transactions on Computers (2020).
- Looking forward to serving as session chair at the top conference of USENIX Conference on File and Storage Technologies (Usenix FAST) 2021, for a session on “The SSD Revolution Is Not Over”.
- Congratulations to Manoj Pravakar Saha for his accepted paper – “Danlin Jia, Manoj Pravakar Saha, Janki Bhimani, and Ningfang Mi, Performance and Consistency Analysis for Distributed Deep Learning Applications, 2020 International Performance Computing and Communications Conference (IPCCC20), Virtual using Zoom, 2020.” Acceptance Rate: 29.3%.
- We welcome undergraduate student Natalia Valencia to join DaMRL as Research Assistant.
- Looking forward to serving as session chair of a session on “System Architecture and Applications” at IEEE International Conference on Workload Characterization (IISWC) 2020.
- Thank you to NSF CISE CNS for funding our project on “New Techniques for I/O Behavior Modeling and Persistent Storage Device Configuration”.
- Congratulations to Adnan Maruf for his summer 2020 internship at Samsung Semiconductors Inc, San Diego.
- Congratulations to Janki Bhimani on successfully completing training and becoming the Certified Hybrid Instructor.
- Looking forward to serving as “Subject area coordinator: BS-CS” for Programming: COP-2210, COP-3337, COP-3530, COP-4338, COP-4226, COP-4520.
- We welcome Ph.D. student Ashikee Ghosh to join DaMRL as Graduate Research Assistant.
- Looking forward to serving as a member of SCIS Tenure Track Faculty Hiring Committee 2020.
- Looking forward to serving as a member of the SCIS Graduate Committee 2020.
- Janki Bhimani is invited to serve on TPC for IEEE International Conference on Workload Characterization (IISWC) 2020.
- Janki Bhimani is invited to serve on TPC for IEEE International Performance Computing and Communications Conference (IPCCC) 2020.
- Janki Bhimani is invited to serve on TPC for USENIX Conference on File and Storage Technologies (Usenix FAST) 2021.
- Congratulations to Janki Bhimani upon receiving Grace Hopper Celebration of Women in Computing (GHC) Faculty Scholarship 2020.
2019
- Looking forward to serving on TPC for IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2020.
- Looking forward to serving on TPC for IEEE International Performance Computing and Communications Conference (IPCCC) 2019.
- Thank you, FIU Faculty Grantsmanship Development Program for you $25,000 funds to our project “Design, Development, and Testing of Distributed Computing Framework for globally coordinated data submission and accessibility of Mass Spectrometry Data” in collaboration with Dr. Fahad Saeed, Dr. Hadi Amini, and Dr. Alex Afanasyev.
- Looking forward to Volunteer as Affiliated Faculty at Center for Women and Gender Studies (CWGS), Florida International University, Miami FL, USA.
- Janki Bhimani had a great invited talk on New Techniques for Data Management in Evolving Storage Technologies, FloridaInternational University, Miami, FL, November 22, 2019.
- Congratulations! for Janki Bhimani, Rajinikanth Pandurangan, Ningfang Mi, and Vijay Balakrishnan, Emulate Pro-cessing of Assorted Database Server Applications on Flash-Based Storage in Datacenter Infrastructures, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%.
- Congratulations! for Danlin Jia, Janki Bhimani, Son Nam Nguyen, Bo Sheng, and Ningfang Mi, ATuMm: Auto-tuningMemory Manager in Apache Spark, 2019 International Performance Computing and Communications Conference (IPCCC19), London, UK, 2019. Acceptance Rate: 29.2%.
- We welcome two Ph.D. students Adnan Maruf and Manoj Pravakar Saha to join DaMRL as Graduate Research Assistants.
- Looking forward to serving as a member of FIU SCIS Graduate Committee.
- Congratulations! for Janki Bhimani, Tirthak Patel, Ningfang Mi, and Devesh Tiwari, What does Vibration do to Your SSD?
- Thank you, Samsung Semiconductors for the equipment grant of KV-SSDs’
- Congratulations! for Mahsa Bayati, Janki Bhimani, Ronald Lee, Ningfang Mi. Exploring Benefits of NVMe SSDs forBigData Processing in Enterprise Data Centers International Conference on Big Data Computing and Communication (BIGCOM19), Qingdao, China, 2019.
- Congratulations! for Janki Bhimani, Ningfang Mi, Miriam Leeser, and Zhengyu Yang, New Performance ModelingMethods for Parallel Data Processing Applications, ACM Transactions on Modeling and Computer Simulation (TOMACS), 2019. DOI 10.1145/3309684.’

Resources

How to write a good system paper (By Roy Levin and David D. Redell, USENIX), Presentation on tips to write system paper with examples (By Vincent Gramoli, University of Sydney)
How to prepare and give a good presentation (By Markus Puschel, Carnegie Mellon University)
How to get the most out of your first conference (By Geoff Kuenning, Harvey Mudd College)
How to keep your poster from resembling an “abstract painting” (By LiLynn Graves, Cornell University)
PhD Comics (By Jorge Cham)
A good research paper is like a good movie, it can always be summed up in a one-sentence thesis. For example, the one-liner for Titanic could be: ‘Too much pride can sink even the unsinkable.

Understanding Tensor Lifecycle

[Recruiting]

Modern deep learning models, particularly transformer-based architectures, demand substantial memory and computational resources, often resulting in inefficiencies during training. While previous research has primarily focused on tracking basic tensor metadata such as shape and size, gaining a deeper understanding of tensor lifecycles and similarity patterns can uncover inefficiencies in tensor management and guide more effective optimization strategies.

In this paper, we introduce a lightweight and extensible tensor tracking framework that captures the complete lifecycle of tensors—including their creation, usage, metadata, and destruction—while also measuring tensor similarity using configurable thresholds. Additionally, we investigate the potential for safe tensor de-duplication to minimize memory usage without degrading model performance.

Our framework integrates seamlessly with PyTorch and operates without requiring modifications to the model architecture, offering valuable insights into tensor behavior. To illustrate its utility, we apply the framework during the fine-tuning of BERT on the IMDB sentiment classification task, revealing key patterns in tensor reuse and redundancy.

This work lays the groundwork for future memory optimization strategies and offers actionable insights into tensor dynamics, enabling more efficient and transparent training in resource-constrained environments.

Computing Courses and Employment Opportunities

[Recruiting]

In today’s world, where higher education plays an increasingly critical role, aligning academic curricula with industry needs is essential. This paper investigates the contextual relationship between computing courses and technical job requirements by leveraging various transformer models to encode course syllabi and job descriptions into high-quality, fixed-size vector embeddings. These embeddings allow for efficient, nuanced comparisons that uncover deeper connections between academic content and workforce demands.

Our study offers several unique contributions that fill key gaps in existing literature. First, we compile a large and up-to-date dataset of 197,296 job postings across five technical domains. Second, we conduct a detailed analysis using advanced transformer-based models to assess how well computing courses align with job descriptions, providing rich insights into curriculum-industry relevance. Third, we examine salary trends to identify which courses and associated skills are linked to high-paying jobs. Fourth, we differentiate between core and elective courses to inform curriculum design and help students make more strategic elective choices in light of industry needs.

Our findings reveal that top-ranked courses often integrate both technical expertise and essential professional skills such as communication and teamwork. Additionally, skills like cloud computing and database technologies appear consistently across various job categories, underlining their value in today’s technical landscape. Core courses, which are required for all students, generally show stronger alignment with industry requirements than electives. Notably, undergraduate courses tend to have broader alignment with job postings, whereas graduate-level courses show more targeted alignment with higher-paying positions. This distinction emphasizes the importance of aligning academic paths with specific career goals when considering graduate education.

Overall, this paper presents a replicable methodology for curriculum analysis and demonstrates its effectiveness through a case study of a computing program at a single institution.

Quantum Noise Beneficial for QML

[Recruiting]

Traditionally, quantum computing has treated hardware noise as an unwanted artifact to be minimized through error mitigation and correction. However, we propose a shift in perspective, particularly for quantum machine learning (QML), suggesting that quantum noise should be embraced and leveraged as a potential advantage, rather than viewed solely as a hindrance. Recent studies on noisy intermediate-scale quantum (NISQ) devices demonstrate that intentionally introducing noise can improve training stability and maintain model performance over extended periods (e.g., 190 days of hardware drift) without retraining, outperforming models that ignore noise across different platforms.

These early results suggest a broader insight: algorithms that adapt to and exploit hardware noise are better suited to the constraints of real quantum systems, transforming what has long been seen as a limitation into a functional feature. Moreover, current noise mitigation techniques often come with high computational costs and limited benefits. In this paper, we analyze how noise affects QML algorithms, compare various noise injection techniques, identify beneficial noise patterns, and propose several novel methods—including a noise-adapted performance ratio, noise-assisted quantum subspace expansion, noise-driven natural-gradient approximation, and dynamic noise profiling.

By treating noise as a computational asset, these approaches have shown measurable gains: up to 25% improvement in convergence speed, 20% increases in classification accuracy, and a reduction of 0.05 in energy estimation error.

Dynamic Page Allocation Policies

[Recruiting]

We argue that the default page allocation strategy used in state-of-the-art (SOTA) memory tiering systems—where new pages are always allocated to DRAM until it reaches capacity, and subsequent pages are assigned to slower tiers such as CXL memory and persistent memory (PMEM)—is not suitable for all workloads or operating conditions. This rigid allocation approach can often be suboptimal, leading to inefficient memory utilization.

To support this claim, we analyze the impact of page allocation strategies on workload performance. We introduce two memory access tracking kernels: one that captures a high volume of memory access events with fine granularity, and a lightweight alternative seamlessly integrated into the kernel. These tracking mechanisms pave the way for the development of a dynamic allocation policy system capable of adjusting allocation strategies based on workload characteristics.

Optimizing Data SpIlling for Scalable Query Execution in Distributed Systems

[Recruiting]

An essential challenge in distributed systems—including databases, big data frameworks, and large-scale computing platforms—is managing workloads that exceed available memory capacity. A common strategy to mitigate memory pressure is data spilling, where excess data is offloaded from memory to disk. This technique is widely used in distributed storage engines, stream processing frameworks, and computational clusters to maintain system stability and handle memory-intensive workloads. While data spilling enhances scalability, cost efficiency, and robustness, it also introduces performance bottlenecks due to increased I/O overhead, leading to slower execution. Our analysis reveals that existing implementations suffer from suboptimal parallelism and poor coordination, resulting in underutilized resources and execution delays.

Addressing these inefficiencies is crucial for optimizing data spilling mechanisms and improving overall system performance. Therefore, in this paper, we introduce PARADISE, a framework that enhances data spilling in distributed systems through two key techniques. First, we propose Parallel Spilling, which efficiently groups spillable data objects within an execution engine and spills them together in parallel. This approach opportunistically spills more than the minimum required, reducing recurrent spill triggers and improving efficiency. Second, we introduce Coordinated Spilling, which synchronizes spilling across all nodes in a distributed system to enhance overall performance. Our key insight is that when a task on one node is delayed due to spilling, the entire task experiences increased latency, regardless of how efficiently other nodes execute their portions. By enforcing synchronized data spilling across all nodes handling the same task, we aim to prevent uncoordinated spills that increase latency for multiple queries.

We implemented our innovative spilling techniques in Presto version 0.290 and evaluated them using diverse queries from established benchmarks like TPC-H. Our results demonstrate up to an 85% reduction in task latency with parallel spilling and 50% or more improvement with coordinated spilling, highlighting the effectiveness of our approach.

In-Memory DNN Model Versioning

[Recruiting]

DNN model versions are used for various purposes, including fine-tuning for downstream tasks, explainability, and debugging. Numerous checkpointing solutions can be adapted to persist intermediate versions of a model during training at different storage locations. Additionally, version management tools enable logging, visualization, comparison, and querying of metadata related to machine learning models, allowing users to track changes made to previously built models.

However, the version creation process in existing methods often incurs high runtime and storage overhead. In this paper, we introduce LATTICE—a low-latency, direct persistence-based DNN versioning library designed for Non-Volatile Memory (NVM) expansion devices. LATTICE minimizes stalls during model versioning and reduces end-to-end versioning time by reorganizing the version creation workflow, streamlining memory allocation and deallocation for efficient snapshot creation, and leveraging multi-threaded parallelism. We also develop a user-friendly versioning API that transparently implements direct persistence.

DNN Fault-tolerant and Efficient Checkpointing

[Recruiting]

Training large Deep Neural Networks (DNNs) is inherently resource-intensive and time-consuming, driving significant research into high-frequency checkpointing to enhance fault tolerance. However, the overhead associated with checkpointing can prolong overall training times. State-of-the-art (SOTA) solutions break down the checkpointing process into smaller phases—such as snapshot and persist—and pipeline these phases alongside foreground training operations. Additionally, some strategies leverage faster dynamic or persistent memory devices to reduce overhead.

Despite these advancements, SOTA methods still struggle to eliminate training stalls (i.e., overheads), primarily due to two critical constraints: (1) a bandwidth-limited PCIe bus, and (2) consistency requirements for copying the entire model state from the GPU. Drawing inspiration from previous research on model state re-initialization, we explore the possibility of relaxing these consistency requirements during failure recovery.

Based on our findings, we propose Fragment—a novel checkpointing solution that strategically divides the model state into multiple independent pieces (referred to as fragments), reducing checkpointing overhead and increasing checkpointing frequency, all without significantly impacting model accuracy. Partitioning complex models into fragments introduces challenges, which we address by ensuring model state integrity during both creation and restoration, while also optimizing layer selection.

Fragment reduces checkpointing overheads by 15% to 94%, and the total data checkpointed by 43% to 72%, without compromising fault-tolerance compared to SOTA solutions. It can also reduce recovery time by up to 70% following a failure. Moreover, Fragment is orthogonal to most existing checkpointing methods and can be used to further optimize them.

Quantum Neutral Network Checkpointing

[Recruiting]

Quantum Neural Networks (QNNs) harness quantum superposition and entanglement, offering promising advantages for machine learning tasks. However, noise in quantum computers frequently disrupts QNN training, leading to wasted computational resources and extended queue times. This project introduces the first QNN checkpointing framework to address this challenge. Through experiments on various quantum devices, we demonstrate that QNN behavior is fundamentally hardware-dependent, with the same model performing differently across platforms. This key finding highlights that quantum checkpoints require additional metadata about hardware specifics and shot counts unique to quantum systems. Our framework requires minimal storage—only 186.6 KB for a 100-qubit QNN—and incurs negligible overhead, enabling frequent checkpointing to enhance training resilience and reproducibility in the NISQ era.

Modeling Environmental Factor Impacts on SSDs

Environmental stressors such as temperature, humidity, vibration, and radiation can severely impact the performance and reliability of SSDs, particularly in edge, automotive, aerospace, and datacenter deployments. Capturing sensor data in the field and conducting accelerated lab experiments are challenging, as they are time-consuming, resource-intensive, and often destructive to hardware. Specialized setups, such as thermal chambers or vibration rigs, are also required. As a result, few studies explore this area, and current storage management techniques—such as RAID, tiering, and deduplication—do not account for environmental factors.

Developing models to capture these impacts would open new research opportunities across various fields. However, accurately modeling these effects remains challenging due to: (1) the limited availability of experimental data; (2) the complex, domino-like impact of historical exposure; (3) the interrelated nature of environmental factors, such as temperature and humidity, which are often correlated; (4) the differing responses of NAND flash memory types (TLC, MLC, and SLC) to environmental stressors; and (5) the difficulty analytical and simple machine learning models face in generalizing across devices, environments, and unseen combinations of stressors.

We believe that large language models (LLMs) may offer a transformative alternative to this complex problem. With embedded domain knowledge and reasoning capabilities, LLMs can facilitate prompt-based natural language interaction. We propose a hybrid framework that combines Chain-of-Thought prompting and Retrieval-Augmented Generation to guide LLMs using physical principles and prior experiments. This approach enables interpretable “what-if” analyses of SSD behavior under varying environmental conditions.

Dynamic Tiering For Hybrid Memory Systems

[Recruiting]

The rapid growth of in-memory computing powered by data-intensive applications has increased the demand for DRAM in servers. However, a DRAM-based system can be limiting for modern workloads because of its capacity, cost, and power consumption characteristics. Hybrid memory systems, which consist of different types of memory, such as DRAM and persistent memory, can help address many of these limitations. One promising direction that has been explored in the recent literature involves introducing persistent memory devices as a second memory tier that is directly exposed to the CPU. The resulting tiered memory design must address the fundamental challenge of placing the right data in the right memory tier at the right time while minimizing overhead. We present Multi-Clock, an efficient, low-overhead hybrid memory system that relies on a unique page selection technique for tier placement. Multi-Clock’s page selection captures both page access recency and frequency, and enables moving pages to appropriate tiers at the right time within hybrid memory systems. We implemented a Linux-based, NUMA-aware version of Multi-Clock that is entirely transparent and backward compatible with any existing application. Our evaluation with diverse real-world applications such as graph processing and key-value stores shows that Multi-Clock can improve the average throughput by as much as 352% when compared with several state-of-the-art techniques for tiered memory.

Exploring Key Value Based Flash Devices

An increasing concern that curbs the widespread adoption of KV-SSD is whether or not offloading host-side operations to the storage device changes device behavior, negatively affecting various applications’ overall performance. In this paper, we systematically measure, quantify, and understand the performance of KV-SSD by studying the impact of its distinct components such as indexing, data packing, and key handling on I/O concurrency, garbage collection, and space utilization. Our experiments and analysis uncover that KV-SSD’s behavior differs from well-known idiosyncrasies of block-SSD. Proper understanding of its characteristics will enable us to achieve better performance for random, read-heavy, and highly concurrent workloads.

Modeling the space-time trade-off for the DSI pipeline in ML training

Data storage and ingestion (DSI) pipelines in machine learning (ML) training are responsible for fetching raw data from storage and pre-processing them before loading the samples
into GPU for training. To improve DSI performance, raw or pre-processed data is cached closer to the accelerator. Although caching pre-processed data over raw is an effective technique to mitigate pre-processing stalls, it comes at a cost of up to 15× higher data footprint. In this work, we present a model that explores the trade-off between raw and pre-processed caching. We build an online system that uses the model to identify the optimal memory split for the raw and pre-processed caches based on the size of the data set and the system parameters. Our model helps improve GPU utilization by increasing the DSI throughput by up to 2×. Furthermore, we demonstrate the usefulness of our model by exploring the performance of hypothetical system designs, should GPU or CPU performance increase.

Understanding Flash-Based Storage I/O Behavior of Games

Computer games are an extremely popular but overlooked workload. Cloud-gaming has been one of the biggest buzzwords in the gaming industry throughout 2020. The rapid growth of the video gaming industry and the diverse set of popular video games available today raises increasing concern to properly understand its I/O characteristics to improve their performance and design better gaming servers and consoles. We attempt to systematically measure, quantify, and characterize the organization of game data into files, back-end storage access patterns, and the performance of gaming workloads. We explore the I/O behavior of the recent and famous games, producing a series of observations coming from measurements done on a real setup.

Heading-Towards-High-Quality-Data-Storage-Systems (1)

Evolving Flash-Based Storage

Flash-based solid-state drives (SSDs) are widely used to accelerate different applications because of their superior overall performance compared to hard-disk drives (HDDs). To achieve better performance with SSDs, the storage stack overheard imposed by the operating system (OS), rather than device speed, is now the bottleneck that must be addressed as a key research priority. It is critical to evolve new techniques to take full advantage of unique characteristics of flash memory and flash-based persistent storage. However, our existing OS cannot take advantage of such techniques as it is designed in a very generic fashion to support broad class of the storage devices. There is thus a critical need to rethink our system infrastructure to take advantage of the best and potentially unique aspects of flash-based memory and NVMe SSDs as persistent storage. The primary objective of our research is to design new system infrastructures, that take advantage of the unique flash characteristics exposed by new storage devices, for accelerating various applications.

supercomputer_servers_data_center_by_maxiphoto_getty_images_1200x800-100776400-large

I/O Behavior Modeling and Persistent Storage Device Configuration

Due to the rapidly growing diversity in emerging data processing workloads and advancements in persistent storage technologies, it becomes imperative and critical to have new techniques for benchmarking and appropriately configuring storage systems in order to obtain the best possible performance and reliability. Therefore, this project proposes to derive new I/O models to accurately capture I/O behaviors when running multiple applications with different workloads on flash-based solid-state drives (SSDs) and develop new approaches to identify the most appropriate internal algorithm for different types of persistent storage devices and dynamically adjust the associated algorithm parameters according to I/O activities.

Operation-of-Multi-stream-SSDs-with-respect-to-I-O-stack_Q640

Improving Flash Endurance in Data Centers

With the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind is to reduce internal movement of data — when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. We are working towards designing systems to perform the data placements for improving the flash endurance in data centers while running multi-tenant applications.

Impacts of Environmental Factors on Flash Storage

Understanding the reliability of components is an important criterion for building robust systems. Data storage is one of the most critical component that is at the center of all emerging technologies. Thus, studying reliability and different types of faults for system storage components is important. Moreover, with fastly emerging flash-based storage technologies such as Solid State Drives (SSDs), the previous fault tolerance understandings for Hard Disk Drives (HDDs) are not directly applicable. We study the impacts of various environmental factors such as vibration, temperature, humidity, etc. on the performance of SSDs in data center infrastructures. We investigate the “short-term” and “long-term” impacts of exposure to SSDs. We also analyze the impacts of different types of application workloads.

Datacenter Scheduling and Resource Management

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. High-performance and scalable frameworks have become the need of the hour for data-intensive processing and analytics in both industry and academia. More and more applications are using the new parallel-data computing techniques used as TensorFlow, and Apache Spark. It is interesting reserach problem to maximize resource utilization and minimize big data processing time. However, given the limited resources in the cluster and a complex dependency in data flow, it is challenging to design scheduling and resource management techniques. Therefore, the primary focus of our research is to put significant efforts on developing new schemes for job scheduling and resource management for evolving parallel-data computing frameworks and applications.

DaMRL – Data Management Research Laboratory

News

Resources

Resources

Knight Foundation School of Computing and Information Sciences