Emergent abilities // magic, scam, false positives ?) // nothing personal, just business
AI, ML,DS, NNs operate within a probability space, which is advantageous on one side but problematic on the other. While it excels in identifying occidental relations in specific datasets, it may falter on others, leading to false positives and hallucinations.
The current AI boom — the convincingly human-sounding chatbots, the artwork that can be generated from simple prompts, and the multibillion-dollar valuations of the companies behind these technologies — began with an unprecedented feat of tedious and repetitive labor.
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.
Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).
Here is a summary of the key points from the research paper:
- The paper proposes a semi-supervised approach for language understanding tasks using unsupervised pre-training of a neural network language model followed by supervised fine-tuning on downstream tasks.
- They use a transformer architecture for the language model, trained on a large corpus of books. This captures long-range dependencies in text.
- For transfer learning, they adapt the model using task-specific input transformations rather than changing the model architecture. This allows effective transfer with minimal changes.
- They evaluate on a diverse set of language understanding tasks including natural language inference, question answering, semantic similarity, and classification.
- The approach achieves state-of-the-art results on 9 out of 12 tasks, outperforming task-specific models and ensembles in many cases.
- Analysis shows the benefit of transferring all layers of the pre-trained model. The transformer also outperforms LSTM models, demonstrating the value of its structured memory.
- Zero-shot probing suggests the model acquires linguistic knowledge useful for downstream tasks through pre-training.
In summary, the paper shows the effectiveness of generative pre-training of transformers for transfer learning on language understanding tasks, through both quantitative results and analysis. The transfer learning approach allows a single model to perform very well across diverse tasks.
The “magic” begins when nobody can prove it (obviously too expensive). Quality of results doesn’t depend on amount but the quality of data.
In a recent interview, the professor reflects on an article discussing artificial intelligence (AI) and its potential dangers, particularly in the context of job displacement and the creation of a more perilous world. The professor dismisses the hype surrounding chatbot technologies like ChatGPT, describing them as sophisticated programming that essentially amounts to high-tech plagiarism. He highlights the danger of people taking these systems seriously, leading to delusions and misinformation. The professor expresses concerns about the use of AI in generating fake images and the potential for massive defamation and disinformation campaigns. Despite acknowledging the possibility of AI replacing some jobs in the long term, the professor emphasizes the threatening and dangerous nature of these developments, calling for a careful consideration of the risks associated with unchecked AI advancements. The interviewer underscores the importance of discussing these issues and recognizing the potential dangers posed by AI technologies without appropriate checks and balances.
Here are a few key points from the discussion between the panelists:
- Deep neural network models have been very useful for cognitive science and engineering, but they are still limited as models of biological neural systems. Key aspects like cell types, connectivity, dynamics, and learning mechanisms are missing.
- These models have helped shift neuroscience towards more quantitative thinking and exploring new experiments. For example, they have provided insights into emergent properties like modularity and face selectivity arising through unsupervised learning. They are guiding new studies on neural tuning and organization.
- Going forward, we need a diversity of models suited for different questions and capacities, not just one monolithic model. This includes detailed brain “emulators” as well as simplified “control models”.
- Important directions for future models include incorporating recurrent dynamics, diverse cell types, lateral and top-down connectivity, and more biologically-realistic learning rules. The goal is to model broader capacities beyond feedforward sensory classification, like attention, prediction, abstraction and planning.
- Overall there is optimism about the interplay between neuroscience and AI, but also an acknowledgement of the limitations of current models and the need to keep improving them as tools for understanding natural intelligence. The virtuous cycle of experiments, models, and new predictions must continue.
The issue lies in people using analogies from natural intelligence, such as ‘learning’ in machine learning. There is a fundamental distinction between human intelligence learning and technological machine learning. But they use such tricks for business purposes.