Natural language processing - language-related AI - is one of my biggest passions. I've enjoyed learning languages (or attempting to) for most of my life - something about the structures and little quirks has always delighted me. Naturally, I tremendously enjoy the opportunity to combine this with my passion for AI and making machines that seem to think.
In the course of CU's NLP class, I had the opportunity to work on one of the standard NLP evaluation tasks, the Choice of Plausible Alternatives (COPA) task. This task is intended to test how well an AI system can understand logical entailment. The idea is to provide an AI model with three sentences: one "premise", and two alternative "hypotheses". The model's task is to figure out which hypothesis logically follows the premise. The general theory is that logic is a tougher linguistic task than something like determining what part of speech a word is; to perform logic, the model needs to know something about the world at large, to understand concepts rather than just letters and words.
As far as the technical details go, we used the PyTorch, Transformers, and Pandas Python libraries to fine-tune pretrained embedding models (RoBERTa and DeBERTa) on data particular to this task. The end performance our models achieved wasn't stellar, but it gave us the opportunity to learn a lot about using these sorts of models. A big part of the process was learning to use GPU acceleration via CUDA - we hypothesized that one of the main blockers for our training was not being able to fit a large enough training batch size on our single GPUs. According to some of the research we did, batch size matters a lot when training transformer-based models like the ones we used.
(For the less technical - we essentially borrowed AI systems that other academics had trained on huge datasets, which gave them a nebulous understanding of English. We then used a smaller number of examples of the task that we wanted performed, hoping that their general understanding of English would let them learn the COPA task faster.)
A post-GPT-4 update: this task is, to a large degree, solved now; large language models like PaLM 540B are able to achieve 100% accuracy when fine-tuned on this task. We knew when we attempted it that large pretrained models were something special (the first paper to indicate that, Brown et al.'s Language Models are Few-Shot Learners, had been out for a year at this point), but we had no idea how important they would become. Such is life, I suppose.