Attention Is All You Need (2017.06)

#NLP Transformer

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data (2019.11)

#NLP

Scaling Laws for Neural Language Models (2020.01)

#NLP

Language Models are Few-Shot Learners (2020.05)

#NLP

The pile: An 800gb dataset of diverse text for language modeling (2020.12)

#NLP

Deep Learning on a Data Diet: Finding Important Examples Early in Training (2021.07)

#CV

Beyond neural scaling laws: beating power law scaling via data pruning (2022.06)

#CV SSL prototype

SemDeDup: Data-efficient learning at web-scale through semantic deduplication (2023.04)

#CV

Textbooks Are All You Need (2023.06)

#NLP phi-1

D4: Improving LLM Pretraining via Document De-Duplication and Diversification (2023.08)

#NLP SemDedDup + SSL prototype

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale (2023.09)

#NLP

Pretraining on the Test Set Is All You Need (2023.09)

#NLP

Textbooks Are All You Need II: phi-1.5 technical report (2023.11)

#NLP


비슷한 연구