Lone banana problem // the statistics suggested that bananas only appear in twos (or more) and so the AI could not imagine a single banana
In the lone banana problem, the statistics suggested that bananas only appear in twos (or more) and so the AI could not imagine a single banana, because the data and parametric tuning that had gone on didn’t allow it to consider that approach, on average. [source]
Have you heard of the ‘lone banana problem’? And what it reveals about generative AI?
The lone banana problem refers to the seeming inability of even the latest image generators (such as Midjourney or Leonard.ai) in creating an image of a single, lone banana. Instead, what you get is bunches, or at least two bananas.
Why is this interesting? Well, it reveals an important truth about generative AI models — that these models represent the world (or more precisely their data) in a way that is very different from how we understand the world.
They do not have an object ontology, they do not learn what objects are, what properties objects possess, or what kind of relationships between objects exist in our world. If they did, a single banana would be the base case, from which you would add more bananas if you wanted two or a bunch.
What the model encodes instead is patterns in its training data (text or images). Everything is encoded as a pattern, they learn “banana-ness”. So, what GenAI models learn (or rather encode) is relationships between patterns (tokens/words or visual patterns) — they create a purely relational representational body of patterns.
This very alien form of representation is immensely useful. It allows us to invoke these patterns and combine them creatively and endlessly. Banana-ness, cat-ness, anything becomes, what we term “a style”. As “style engines” GenAI models are able to apply these patterns or styles, in text or visually, and produce the most interesting outcomes. In a very real sense, the models are creative or generative.
They are not built for accuracy (as we all known), given their lack of any real understanding of objects or causal or material relationships (all they have to go by is relations between words/tokens or between visual patterns). But as such, they present an interesting, novel and alien form of intelligence that we should cherish and explore (rather than chasing the mirage of AGI or human-ness in machines).
Read more in the paper Sandra Peter and I presented at ACIS 2023 in Wellington in December; it has some pretty pictures, too 😃.
Generative AIs have mushroomed, exhibiting astonishing text and image generation abilities. At the same time, these systems also show surprising weaknesses in terms of information accuracy, ability to solve simple maths problems, or the visualisation of simple objects. In this essay we take the so-called “lone banana problem”, the inability of Midjourney to generate a picture of a single banana (instead of a bunch), as our starting point to problematise our common-sense understanding, and related expectations, of generative AI. Through problematisation we highlight that many of the traditional IS assumptions of how systems represent real-world phenomena, and how algorithms work, need to be set aside for understanding generative AI more authentically. We suggest conceiving of generative AI as style engines that encode all aspects of the world — objects, properties, appearances — as styles available for creation. We discuss what this alternative conception affords, and implications for the IS discipline.
Here is a summary of the key points from the paper:
- Generative AI systems like Midjourney and GPT don’t store factual representations of the world. Rather they encode probabilistic patterns learned from data into complex neural network structures.
- They transform all aspects of data into “styles” or “likenesses” that capture statistical relationships and characteristics. Objects are encoded as “thingness” styles.
- This means generative AIs differ radically from traditional computing with deterministic algorithms and factual data representations. They are probabilistic systems without knowledge or reasoning.
- Understanding them as “style engines” rather than expecting factual accuracy explains phenomena like the lone banana problem. Their usefulness lies in novel creativity from combining styles, not veridical representation.
- Generative AIs represent a very different form of computing that challenges assumptions in IS and AI about faithfully representing and modelling the world. New perspectives are needed focused on their generative capabilities, not deficiencies vs human cognition.
- Their alien nature opens up new possibilities when combined with traditional systems, bringing creative abilities to information retrieval, business systems etc. This offers many new research avenues for IS.
Boom🤖
The lone banana problem is basically like the one-pixel attack, showing how AI/ML has its limits. Models can only use patterns from their training data and can’t come up with something totally new..🍌