Analobench Benchmarking The Identification Of Abstract And Long

By hairstyler On Nov 11, 2025

Benchmarking Analysis 1 | PDF

Benchmarking Analysis 1 | PDF View a pdf of the paper titled analobench: benchmarking the identification of abstract and long context analogies, by xiao ye and 8 other authors. Analobench: benchmarking the identification of abstract and long context analogies. in proceedings of the 2024 conference on empirical methods in natural language processing, pages 13060–13082, miami, florida, usa.

Benchmarking Identification Template In 2024

Benchmarking Identification Template In 2024 Analobench: benchmarking the identification of abstract and long context analogies this readme explains how to utilize our python script. the provided script is programmed to perform various tasks related to analogy, using openai's gpt 4. these tasks include sentence or story analogy classification. We are inspired by the remark able ability of humans to abstract over long and elaborate stories, and leverage such abstractions to identify analogies. by evaluating our proposed tasks on longer stories, we measure the extent lms can abstract over complexities of longer stories. To answer the above questions, we introduce analobench, a benchmark for analogical reasoning over natural language stories that convey abstract concepts with varying level of difficulty. This work proposes analobench, a benchmark to determine analogical reasoning ability in lms, and collects a set of 340 high quality, human written analogies for use in this benchmark, which constitutes the largest such collection to date.

GitHub - Aymanbegh/Benchmarking-performance

GitHub - Aymanbegh/Benchmarking-performance To answer the above questions, we introduce analobench, a benchmark for analogical reasoning over natural language stories that convey abstract concepts with varying level of difficulty. This work proposes analobench, a benchmark to determine analogical reasoning ability in lms, and collects a set of 340 high quality, human written analogies for use in this benchmark, which constitutes the largest such collection to date. Xiao ye author andrew wang author jacob choi author yining lu author shreya sharma author lingfeng shen author vijay murari tiyyala author nicholas andrews author daniel khashabi author 2024 11 text yaser al onaizan editor mohit bansal editor yun nung chen editor association for computational linguistics miami, florida, usa conference. Description: given a narrative, can llms identify the most analogous sentence from a set of 4 choices? sampled from outside of cluster. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long context language models.

GitHub - Aymanbegh/Benchmarking-performance

GitHub - Aymanbegh/Benchmarking-performance Xiao ye author andrew wang author jacob choi author yining lu author shreya sharma author lingfeng shen author vijay murari tiyyala author nicholas andrews author daniel khashabi author 2024 11 text yaser al onaizan editor mohit bansal editor yun nung chen editor association for computational linguistics miami, florida, usa conference. Description: given a narrative, can llms identify the most analogous sentence from a set of 4 choices? sampled from outside of cluster. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long context language models.

GitHub - Aymanbegh/Benchmarking-performance

GitHub - Aymanbegh/Benchmarking-performance Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long context language models.