Description
Large numerical forecast datasets are commonly used for atmospheric research with dataset sizes exceeding several terabytes. An efficient means of pattern matching within such datasets is commonly required during data analysis. This project aims to develop an efficient and yet flexible means for such pattern matching within an ensemble dataset. A description of the resulting algorithm and search methodology is presented, along with some of the technical challenges of working with multiple data types and formats. Performance comparisons of varying chunk sizes and system resource allocations will be compared. Finally, best practices for development will be presented.